|
Data Mining and Visualisation in Bioinformatics
Since March 1997, EMBL-EBI and Quadstone Ltd have been using the data mining software tool DecisionHouseŽ to solve practical problems in bioinformatics. This work is an associated activity within the BIOTITAN Technology Transfer Node, which is co-ordinated by EMBL-EBI (Esprit project 24718: BIOVIS). It benefits from the active participation of SmithKline Beecham, British Biotech Pharmaceuticals Ltd, Zeneca Pharmaceuticals, Pfizer Ltd, and the European Chemistry Technology Centre of Silicon Graphics Inc, which comprise an end-user forum to steer the project and ensure its industrial relevance.
Effort has concentrated mainly on applying DecisionHouse to several different data sets selected by the end-user forum. The specific tasks have included:
- Data mining in P53 mutation database, visualising the database and finding correlations between different entities;
- Visualisation of all-against-all FASTA comparison results for complete genomes (H.Influenzae and S.Cerevisiae) and the data derived from such comparisons;
- Decision-tree based analysis and prediction of gene splice sites for selected organisms;
- Prediction of protein sub-cellular location based on their amino-acid composition;
- Gene expression data visualisation and analysis on a genomic scale (S.Cerevisiae).
The advantages of DecisionHouse have been demonstrated in all the selected areas. It has proved useful for all stages of knowledge discovery - data selection, transformation, cleaning, mining and visualisation. In the case of P53, DecisionHouse enabled non-experts to quickly and easily discover and visualise knowledge that had been earlier been known only to experts. In other cases, such as gene splice site analysis and gene expression data analysis, DecisionHouse helped refine or discover new knowledge in these fields.
Throughout the project, pre-existing and custom-designed software tools were used together with DecisionHouse. That DecisionHouse provided an integrated tool supporting all stages of knowledge discovery, was a clear strength of this product.
The results of this project will be presented at the workshop "Data Mining and Bioinformatics", held at EMBL-EBI on March 17-18, 1998. Representatives from the pharmaceutical and biotechnology industries are especially encouraged to attend. Contact the authors for details.
Written by: Alvis Brazma, David Starks-Browning
Resources and further information
European Bioinformatics Institute http://www.ebi.ac.uk/
BIOTITAN Technology Transfer Node http://industry.ebi.ac.uk/BioTitan/
BIOVIS Project http://industry.ebi.ac.uk/BioTitan/activities/biovis.html
Quadstone Ltd http://www.quadstone.com/
DecisionHouseŽ http://www.quadstone.com/dh/
External sites are not endorsed by EMBL-EBI |