TriTISA |
|||||||||||||
Contents |
|||||||||||||
GeneralTriTISA is an TIS post-processor to refine annotation/prediction of translation initiation site (TIS) from an existing system for microbial genomes. The current version provides options for post-processing genome annotations from public databases such as GenBank and RefSeq, gene predictions from widely used gene finders such as GeneMark and Glimmer. Hu, G.-Q., Zheng, X., Zhu, H.-Q. and She, Z.-S. (2009) Prediction of translation initiation site with TriTISA, Bioinformatics, 25(1):123-125. |
|||||||||||||
TriTISA classifies all candidate TISs into tri-category based on evolutionary property, and characterizes them in terms of Markov models. Then, it employs a Bayesian methodology for the selection of true TIS with a non-supervised, iterative procedure. To train parameters, it adopts an iterative self-learning strategy. |
|||||||||||||
PerformanceWe applied the method to post-process the RefSeq annotation, and evaluated the predictions with experimentally verified TISs currently available for five genomes, including two GC-rich genomes, namely N. pharaonis (63.1%) and H. salinarum (68.1%). The accuracies of TriTISA and other state-of-the-art tools are listed in Table 1.
In addition, a large-scale assessment was carried out with the method proposed by Hu et al., (2008b). Briefly, with a species-specific reference set, the method estimates the positional weight matrix of true TIS, then calculates its contribution to the observed one from annotation via a generalized least square estimator, and finally derives the accuracy. To obtain high-quality TISs as well as not to create any bias to any tool, the reference sets were generated by the common predictions of GeneMarkS, Glimmer 3, GS-finder, MED-StartPlus, TiCo, and TriTISA. The assessment was carried out on a total of 532 genomes and TriTISA reported an average accuracy of 91%. Post-processing TIS annotations in an unsupervised manner is often criticized for potential dependency on the quality of initial annotation (Makita et al., 2007). With experimentally verified TISs from EcoGene (Rudd, 2000), we have rigorously examined our system on this dependency by creating a series of artificial inputs with an accuracy a ranging from 10% to 100% at a step of 5% by replacing 100(1-a)% of the true TISs with randomly chosen false TISs. It found that TriTISA is extremely robust against the quality of input, with a typical accuracy of 95% and a fluctuation less than 1%, even when the initial annotations are mostly wrong (Figure 2). Contrary to TriTISA, the recently proposed post-processor TiCo is input sensitive, with a linear correlation between the prediction accuracy and the quality of the input.
|
|||||||||||||
DownloadSource code (GNU GPL licence), in addition an executable file for Windows User guidelines (MS WORD 2003) Sample inputs Experimental confirmed TISs (in MED format, see user guidelines) |
|||||||||||||
References
|
|||||||||||||
Contact
|
|||||||||||||
Last update on Dec 2008, Copyright(C)2008, All Rights Reserved |