News from the Macromolecular Structure Database (PDB) Group

Kim Henrick, Peter Keller, John Ionides, John Irwin and Geoff Barton
EMBL Outstation - Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

The EBI Macromolecular Structure Database is the European project for the collection, management and distribution of data about the three dimensional structure of macromolecules. The project is the European counterpart of the RCSB-PDB in the USA and is responsible for the management of data submitted to the Protein Data Bank (PDB) in Europe. The goals of the EBI-MSD are to provide high quality services for the deposition, organisation, integration and search of data about three dimensional structures. Work is in progress in all three areas and some highlights are summarised in the following sections.

517 Submissions to PDB at EBI in 1999

EBI-MSD started accepting depositions to the Protein Data Bank at the beginning of January 1998. Since then, over 1,000 structures have been deposited at EBI using the AutoDep deposition system. Until 15th June 1999 all structures submitted at EBI were forwarded to the USA for further processing and follow-up with authors. Since 15th June, in order to provide a faster and more complete service at EBI, structures submitted at EBI have had all follow-up done by EBI staff. A PDB ID code is issued by EBI as soon as the deposition with AutoDep is finished. The final PDB entry is now only forwarded to the RCSB when the release date for the structure is reached.
If submitted within the normal weekday, initial comments on the submission are usually returned within 24 hours. Once the author has approved the submission for release, it is added to the PDB format file archive the following Wednesday. At the moment, an average of 21% of all structures submitted to the PDB are submitted at EBI.

Relational database of macromolecular structure at EBI

A major project of the EBI-MSD group over the last 3 years has been the development of a new data model for the representation of the three dimensional structures of proteins and other macromolecules. The new model also allows for the storage of extensive information about the experimental methods used to determine the structure. The model has been designed to minimise the possibility of inconsistent data being entered into the database. The relational database system that has been developed based on this model forms the core of new services that will be provided by EBI-MSD. It also provides the database for storage of the results of the extensive 'clean-up' of the PDB data that has been performed at EBI.
The first service based on the new database system that will be made available, is a new deposition system that can fully exploit the concept of 'data harvesting', or 'direct submission' from the software used to determine structures. This will be simpler for the depositor to use and allow more information to be captured more accurately than is possible with current systems.

Search database of macromolecular structure

The core relational database system has been designed with the integrity of the data in mind rather than speed of access. For search services a 'data warehouse' is under development and will be populated from the core database. The warehouse will allow simpler access to the data currently contained in the PDB flat-files as well as access to additional data that are currently difficult to represent in a PDB file. Work is in progress to allow distribution of remote copies and updates to this warehouse. If you are interested in being an early user and tester of this new technology, then please email Geoff Barton.

New Hardware for EBI-MSD

In August, new, dedicated computer hardware was installed to support the AutoDep deposition service and current search services based on the 3DB Browser. This dramatically improved the interactive response of AutoDep which had previously been running on a shared machine. Further new hardware was also acquired to support the new database development work and deposition system.

Clean-up continues

As loading of the new core database began, further clean-up of the PDB files was necessary. This has included improving the uniformity of ligand and atom naming as well as many other smaller issues.

Development of data exchange protocols with RCSB

With two database centres (RCSB and EBI) now managing the deposition to and archive of the data in the PDB, an effective method of transferring data between the two sites is necessary. At present this is done by exchange of PDB format flat-files and ancillary data. However, the core database sytems at EBI and at RCSB both capture more information than can readily be expressed in a PDB format flat-file. For this reason, a more sophisticated dictionary-driven data exchange mechanism based on the mmCIF data format is under development. Work at EBI has concentrated on mapping between the EBI data model and mmCIF so that loss-free transmission can occur. Once in place and tested, the data exchange mechanism will be used for weekly exchange of data as well as to allow the exchange of data about the clean-up of legacy PDB format files. In the future, this exchange mechanism could be extended to allow an alternative to the data warehouse for access by other groups to the rich data representation offered by the mmCIF files.

CCP4 - Support for position at EBI-MSD

The UK research councils, Collaborative Computational Project 4 focuses on the development of software tools for X-ray crystallography. The CCP4 package is used worldwide to solve structures by X-ray methods. The EBI-MSD have had a long-standing collaboration with CCP4 on many issues of data processing and representation. This relationship has recently been reinforced by new CCP4 funding for a position in the EBI-MSD group to develop database interfaces and formats for the CCP4 package.

CCPN - EBI-MSD and the new CCP for NMR spectroscopy

The UK research council BBSRC recently awarded a grant to the Department of Biochemstry at University of Cambridge to establish a Collaborative Computational Project for NMR analagous to CCP4 for X-ray crystallography. Co-applicants on this grant are Ernest Laue and Andy Raine from Cambridge and John Ionides of the EBI-MSD group. The goal of CCPN is to promote the development of software standards and data formats in NMR structure determination. An inaugural conference organised at EBI in February 2000 brought together database software specialists, NMR computing resarchers and equipment manufacturers to discuss data exchange formats and data harvesting. A further meeting is scheduled for May 2000 to follow up on the discussions held at EBI.

