Home
 Content
 Lead article
 EBI
 Bio-eye
 Events
BioInformer Logo -- click for homepage

A publication of EMBL - Outstation Hinxton, The European Bioinformatics Institute

EBI logo -- click for homepage
biobrddwn

Where are the EuroGeneIndex Flat Files?

Patricia Rodriguez-Tomé
EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

This is a common question from users discovering the EBI EuroGeneIndexes (EGI). So where are those files? - Nowhere, they do not exist! For an explanation why they do not, let us explore the EGI and the philosophy behind their development.

The EuroGeneIndexes are based on the JESAM package1). The JESAM tools consist of a set of routines to first find alignments between sequences and then to build clusters based upon these alignments. Since the clusters are built independently of the sequence alignments, this means that different clustering algorithms can be used for building sequence clusters from the same set of sequence alignments. The clustering routine is a CORBA client of a CORBA alignment server - thus there is independence from operating system, language and location when using the algorithms.  What this means also, is that various clustering clients can use the same set of alignments to build their own sequence clusters databases, and hence reflect the precise needs of a users.

So what do the EGI provide?

  • Access to all the information that served to build the alignments  (including the version of the sequences used at that time);
  • All the alignment information (hits, alignments themselves, etc.);
  • The clusters formed by analysing these alignments.

The results are stored in two separate databases: alignmentDB and clusterDB, each of which can then be accessed though a CORBA server at the EBI. Thus each step of the procedure may be reviewed by the user by using the CORBA servers to access the information in each database.

So why not provide flat files? Because why provide huge files in a format that is difficult to parse and probably annoying to most users, when any user can use a client program written by the EBI (or even write their own specialised flat file client), and extract the part of the information they need, and transform it into the format most appropriate for their use?

The EGI web pages show various examples of CORBA clients to the EuroGeneIndexes CORBA servers. The home page itself uses the 'GetClusterStat' client to provide up-to-date statistics about EGI. The EGI statistics web pages  use the same client, to provide the full statistics for each species. On each species page, the cluster sizes are linked to another client ('GetClusterDef') that provides full information on the sequences in each cluster, accessing at the same time the EMEST CORBA server to provide up to date library information2). The cluster frequency starts the 'GetClusterName' client to give only the cluster names corresponding to that link. These CORBA clients are used again in some other parts of the EGI web site, to provide dynamic information about the alignments and clusters.

Other CORBA clients allow users to graphically browse both databases, and to analyse these data. All the CORBA clients written at the EBI are freely distributed. The source code is provided, which can then be used as a starting point to write new CORBA clients more suited for a user's specialised needs.

Article by: Patricia Rodriguez-Tomé


Resources and further information

External sites are not endorsed by EMBL-EBI

 

biobrddwn

Direct questions or comments to Bioinformer Editor. This page last modified Monday, 07 August, 2000.
ISSN 1462-1363.
More information about the BioInformer.

© 1997-2000 EMBL-EBI. All Rights Reserved.