Home
 Content
 Lead article
 EBI
 Bio-eye
 Events
BioInformer Logo -- click for homepage

A publication of EMBL - Outstation Hinxton, The European Bioinformatics Institute

EBI logo -- click for homepage
biobrddwn

GenomeBuilder

Juha Muilu, Patricia Rodriguez-Tomé and Alan Robinson
EMBL Outstation - Hinxton, the European Bioinformatics Institute,
Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Introduction

The EST databases contain a potential wealth of valuable information about expressed genes; for example their different splice variants and expression profiles. A problem with collections of EST sequences is that these data are highly redundant and often low in quality. The informatics of the data is also not trivial. Contextual information of the EST sequences is not easy to visualise; also, users have different needs and information must be gathered from multiple sources and combined with other results in many ways.

Since 1996, the EBI has been using CORBA (Common Object Request Broker Architecture) to resolve some of the current IT problems in bioinformatics. CORBA provides a standard means to connect computational resources on different machines over the Internet. Computer applications can access these resources if they implement the programming interface to a CORBA server (specified using the Interface Definition Language - IDL). CORBA is very well suited to use in scientific visualisation and analysis applications where the integration of the multiple databases and legacy applications is important.

The GenomeBuilder is a Java tool designed to visualise and further process the EST and high-throughput genomic sequence assemblies. As an example, a user can use the tool to read in an EST cluster which is based on pair-wise comparison and then build a multiple alignment and consensus contig from the cluster using external program(s). Users can then visualise similarities between the assembled sequences and fetch additional property information for the sequences from other data sources.  The property information can then be shown as normal text or as colour coded text.

The tool uses CORBA to integrate and access the different databases and computing resources. Here is a brief summary of basic features:

  • Properties of the sequences can be colour-coded or shown as a text on the sequences.
  • Regions of similarity between the sequences can be highlighted.
  • Linked windows mean an overall view of the assembly can be maintained in one window while working at a higher resolution in another.
  • Sequences in an assembly can be edited and moved.

Currently the sequence and additional property information can be obtained from the EST cluster1, EST2, Radiation Hybrid3 and -allocation4 databases. New databases can be added dynamically to the GenomeBuilder if they provide an IDL and a CORBA server.

Within the Genome Builder, the AppLab5 program is used to generate client and server side components of external command line applications. Presently, two applications have been added: CAP36 for building multiple alignments and CLEANUP7 to remove redundant sequences.

Application overview

The main window is shown in Figure 1. The window has menus and scroll bars for zooming (top most) and scrolling. It is possible to open an another window (Figure 2), which can be used to view a part of the main window. The canvas of the new window is shown as a rectangle on the main window. The user can move the rectangle by using mouse and the new window is updated accordingly.

Figure 1 - click to enlarge
Figure 1 - Main window. The rectangle indicates the area of the window shown in Figure 2.
 

Figure 2 - click to enlarge
Figure 2 - A linked window, which is used to view part of the main window.

EST and mRNA sequence clusters can be read from the EST cluster database1 using a browser interface shown in Figure 3. After the cluster has been read in, it is possible to retrieve additional annotations from databases and/or submit the data to external analysis tools. Figure 4 shows a multiple alignment calculated using CAP3 program6. In the Figure, sequences are colour coded according to a clone library and those sequences, which are derived from the same clone, are positioned on same line and connected by a thin line. The library and the clone information were retrieved from the EST database.

Figure 3
Figure 3 - Cluster database browser.
 

Figure 4 - click to enlarge
Figure 4 - Sequence assembly made by using CAP3. Sequences are colour coded according to clone library. Sequences derived from same clone are connected together.

Figure 5 shows the graphical user interface to the CAP3 program. The interface has commands for starting and cancelling the application execution (Action menu). It is also possible to set command line parameters (Options menu) of the program as show in Figure 6. These AppLab components are composed using the information obtained from the AppLab server.

Figure 5 - click to enlarge
Figure 5 - Graphical user interface of AppLab client side component. The component provides interface for the CAP3 program.
 

Figure 6
Figure 6 - The AppLab client component for changing the command line parameters.

Implementation

The program is written in Java and the CORBA interface is implemented using the ORBacus ORB. The program should work on all platforms, which have the JDK 1.1.7 or higher version.

In this version (1.0.0), information about the existing databases is read from a configuration file. That information is then used in the database browser and user interface menus. New databases can be added to the configuration file if they have a CORBA server which implements the same IDLs. Currently the interfaces are made for the EST cluster, EST, RHdb and RHalloc databases

The AppLab wrapper is used to access external command line applications. The program provides tools for generating server code for the applications, based on the GCG compatible description file, and Java Bean components pertaining to the client. The client components take care of program execution and communication with the server.

The AppLab client implementation is independent of the server side implementation. However to be able to use the external applications, one needs to specify a data format which applications can understand. In the GenomeBuilder this is done by using a simple FASTA format, where the positional information of padded sequences are stored in a header along with the accession number. The FASTA data stream is then wrapped into an AppLab data event. New data formats can be added using the mime-type specification provided by the data event.

New applications can be added on the server if they use the known data format. During the program startup, the GenomeBuilder asks from the AppLab server what applications are available and builds the user interface menus accordingly.

Availability

The GenomeBuilder program is freely available. More information can be found at http://industry.ebi.ac.uk/~muilu/GBuilder.

Article by: Juha Muilu


 

Resources and further information

External sites are not endorsed by EMBL-EBI

 

biobrddwn

Direct questions or comments to Bioinformer Editor. This page last modified Monday, 24 July, 2000.
ISSN 1462-1363.
More information about the BioInformer.

© 1997-2000 EMBL-EBI. All Rights Reserved.