Home
 Content
 Lead article
 Industry
 EBI
 Bio-eye
 Events
BioInformer Logo -- click for homepage

A publication of EMBL - Outstation Hinxton, The European Bioinformatics Institute

EBI logo -- click for homepage
biobrddwn

PROCRUSTES gene recognition software

Introduction

PROCRUSTES 4.0 is geared towards computational support of experimental gene identification and annotation-quality gene predictions. Release 97.0 of GenBank (Oct. 96) contains 651,972,984 bases in 1,021,211 records. Till very recently gene recognition algorithms ignored this vast source of information. PROCRUSTES responds to the challenge of using related proteins and cDNAs for gene prediction.
PROCRUSTES is based on the spliced alignment algorithm which explores all possible exon assemblies and finds the multi-exon structure with the best fit to a related protein (
M.S.Gelfand, A.A.Mironov and P.A.Pevzner (1996), Gene recognition via spliced sequence alignment, Proc. Natl. Sci. USA, 93: 9061-9066). Unlike other existing methods, PROCRUSTES successfully recognises genes with short exons as well as complicated genes with more than 20 exons. Test results demonstrate that the spliced alignment algorithm provides 99% accurate recognition of a mammalian gene if a related gene from another mammalian species is known. Moreover, PROCRUSTES significantly outperforms conventional gene recognition algorithms if even a distantly related protein is available.

The new distinctive features of PROCRUSTES 4.01 are:

  • identification of genes and exons for which the predictions are guaranteed to be correct (Las Vegas gene predictions)
  • error-tolerant gene recognition
  • construction of primer cover for PCR-based gene identification in large-scale sequencing projects (GenePrimer software)
  • highly specific recognition of exons for selection of probes and PCR primers to cDNA (CASSANDRA software)
  • recognition of incomplete genes in unfinished cosmid-size genomic sequences via local spliced alignment
  • new graphical outputs for multiple gene predictions and experimental gene identification
  • assignment of confidence levels to gene predictions
  • multiple gene predictions via suboptimal spliced alignment
  • gene recognition based on similar domains rather than entire proteins
  • gene recognition in different species.

Availability

PROCRUSTES is now available in commmand line version for incorporation in DNA sequence analysis pipelines and as a WWW server.

Information provided by: Pavel Pevzner


 

Resources and further information

External sites are not endorsed by EMBL-EBI

biobrddwn

Direct questions or comments to Bioinformer Editor. This page last modified Friday, 16 July, 1999.
ISSN 1462-1363.
More information about the BioInformer.

(c) 1997-1999 EMBL-EBI. All Rights Reserved.