Tuesday, February 24, 2009

UNIPOP: A universal operon predictor for prokaryotic genomes

The Journal of Bioinformatics and Computational Biology has published on article on a tool called UNIPOP. The operon prediction tool uses graph theory to figure out operons by mapping areas of chromosomes that experienced less shuffling between species. Because it's not a machine learning based method, there is no retraining that has to be done for different types of organisms. But it does require multiple related genome to detect signals.

You can find the source code and results on their website.  Input is the ppt files available from NCBI (for example the all.ptt.gz file for bacterial genomes).

Identification of operons at the genome scale of prokaryotic organisms represents a key step in deciphering of their transcriptional regulation machinery, biological pathways and networks. While numerous computational methods have been shown to be effective in predicting operons for well-studied organisms such as Escherichia coli (E. coli) K12 and Bacillus subtilis (B. subtilis) 168, these methods generally do not generalize well to genomes other than the ones used to train the methods because they rely heavily on organism-specific information. Several methods have been explored to address this problem through utilizing only genomic structural information conserved across multiple organisms, but they all suffer from the issue of low prediction sensitivity. In this paper, we report a novel operon prediction method that is applicable to any prokaryotic genome with accurate prediction accuracy. The key idea of the method is to predict operons through identification of conserved gene clusters across multiple genomes and through deriving a key parameter relevant to the distribution of intergenic distances in genomes. We have implemented this method using a graph-theoretic approach, called a maximum cardinality bipartite matching algorithm. Computational results have shown that our method has higher prediction sensitivity as well as specificity than any published method. We have carried out a preliminary study on operons unique to archaea and bacteria, respectively, and derived a number of interesting new insights about operons between these two kingdoms. The software and predicted operons of 365 prokaryotic genomes are available at http://csbl.bmb.uga.edu/~dongsheng/UNIPOP.

...Read more

No comments: