Home
 Content
 Lead article
 EBI
 Bio-eye
 Events
BioInformer Logo -- click for homepage

A publication of EMBL - Outstation Hinxton, The European Bioinformatics Institute

EBI logo -- click for homepage
biobrddwn

INFOGENE

DataBases of Genome Known and Predicted Genes and Proteins

INFOGENE was designed by Victor Solovyev and Asaf Salamov
The Sanger Centre, Wellcome Trust Genome Campus,
Hinxton, Cambridge, CB10 1SA, United Kingdom

 

Summary and Perspective

Recently a broad agreement has been reached amongst genome centres in the US, the Sanger Centre, the Wellcome Trust and the US funding agencies to go ahead with a plan that will deliver all of the human sequence, part finished and part in draft, into the public domain by the end of 2001.

Using gene prediction the scientific community can start to  work experimentally with any human gene during the next 3 years, because gene finding programs usually predict accurately at least the major part of exons in a gene sequence. Our experience shows that the accuracy of predictions is significantly less for long genomic sequences if you compare that with the  usually presented tests (decreasing 10-20%). However, if the same exons are predicted by two programs that are based on different approaches, it is much more likely to be a real exon than if the prediction was made by a single program. Our programs Fgenes (Solovyev,1997), based on a pattern recognition approach,  and Genescan based on  a probabilistic approach (Burge,Karlin,1997) were used for the presented predictions. The new Fgenes-H (Salamov, Solovyev,1998) program will be used for the future updates.

We present new databases:

  • Known GENES Structure and Functioning Database (INFOGENE Rel 1.)
    • Sections: HUMAN_G, MOUSE_G, DROSM_G, and ARABT_G
      These databases include the structure of known genes and their functional sites such as start of transcription; TATA-box, and poly-A signal (if known)
    • Nucleotide and Protein sequences of INFOGENE genes
      IG_NUC and IG_PRO
  • Predicted GENES Structure and protein Databases (INFOGENEP Rel 1.)

Currently this DB includes genes predicted for finished and unfinished sequences from the Sanger Centre. There are 1500 locuses and 18000 protein sequences corresponding with predicted genes (by Fgenes and Genescan programs) in the database.

Known Protein and EST similarity is included in the data and it will be possible to make key word searches in SRS to find the protein of your interest.

If you find some interesting similarity with your sequence you can use the INFORGENP ID to check the gene structure of this protein in the INFOGENP DB and get corresponding clone name and sequence.

Because it will not possible in the near future to experimentally verify all genes in sequences of genome sequencing projects,  computational prediction might have a great significance for study of new genes and proteins.

For example, you can (using a Blast search or key word search) find predicted genes of your current interest. Because most of predicted exons should be accurate (in average) you can use them to get corresponding cDNA and verify the exact gene structure.

This DB includes all predicted genes and proteins for the Human genome draft as well as genes and proteins predicted for other model organisms such as Drosophila and Arabidopsis.

We plan to make links between similar genes and connect the genes with known regulatory information in collaboration with the TRRD database developers from IC&G, Novosibirsk (Russia).

 Information by: Victor Solovyev


 

Resources and further information

 

External sites are not endorsed by EMBL-EBI

 

biobrddwn

Direct questions or comments to Bioinformer Editor. This page last modified Friday, 16 July, 1999.
ISSN 1462-1363.
More information about the BioInformer.

(c) 1997-1999 EMBL-EBI. All Rights Reserved.