|
PROCRUSTES gene recognition software
Introduction
PROCRUSTES 4.0 is geared towards computational support of experimental gene identification and annotation-quality gene predictions. Release 97.0 of GenBank (Oct. 96) contains 651,972,984 bases in 1,021,211 records. Till very recently gene recognition algorithms ignored this vast source of information. PROCRUSTES responds to the challenge of using related proteins and cDNAs for gene prediction. PROCRUSTES is based on the spliced alignment algorithm which explores all possible exon assemblies and finds the multi-exon structure with the best fit to a related protein (M.S.Gelfand, A.A.Mironov and P.A.Pevzner (1996), Gene recognition via spliced sequence alignment, Proc. Natl. Sci. USA, 93: 9061-9066). Unlike other existing methods, PROCRUSTES successfully recognises genes with short exons as well as complicated genes with more than 20 exons. Test results demonstrate that the spliced alignment algorithm provides 99% accurate recognition of a mammalian gene if a related gene from another mammalian species is known. Moreover, PROCRUSTES significantly outperforms conventional gene recognition algorithms if even a distantly related protein is available.
The new distinctive features of PROCRUSTES 4.01 are:
- identification of genes and exons for which the predictions are guaranteed to be correct (Las Vegas gene predictions)
- error-tolerant gene recognition
- construction of primer cover for PCR-based gene identification in large-scale sequencing projects (GenePrimer software)
- highly specific recognition of exons for selection of probes and PCR primers to cDNA (CASSANDRA software)
- recognition of incomplete genes in unfinished cosmid-size genomic sequences via local spliced alignment
- new graphical outputs for multiple gene predictions and experimental gene identification
- assignment of confidence levels to gene predictions
- multiple gene predictions via suboptimal spliced alignment
- gene recognition based on similar domains rather than entire proteins
- gene recognition in different species.
Availability
PROCRUSTES is now available in commmand line version for incorporation in DNA sequence analysis pipelines and as a WWW server.
Information provided by: Pavel Pevzner
Resources and further information
University of Southern California http://www.usc.edu/
Center for Computational and Experimental Genomics http://www-hto.usc.edu/
Procrustes homepage http://www-hto.usc.edu/software/procrustes/
Gelfand, M.S., Mironov, A.A.,Pevzner, P.A. "Gene recognition via spliced sequence alignment", Proc. Natl. Acad. Sci. USA (1996), 93, 9061-9066
U.S. patent application "Combinatorial Gene Recognition" 60/035,720
External sites are not endorsed by EMBL-EBI |