The EMBL Nucleotide Sequence Database: major new developments

Autor: Rasko Leinonen, Rodrigo Lopez, Tamara Kulikova, Renato Mancuso, Carola Kanz, Maria Garcia-Pastor, Vincent Lombard, Francesco Nardone, Robert Vaughan, Quan Lin, Katerina Tzouvara, Wendy Baker, Mary Ann Tuli, Peter Stoehr, Guenter Stoesser, Alexandra van den Broek
Přispěvatelé: Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
Rok vydání: 2003
Předmět:
Zdroj: Nucleic Acids Research
Nucleic Acids Research, Oxford University Press, 2003, 31, pp.17-22
Nucleic Acids Research, 2003, 31, pp.17-22
ISSN: 1362-4962
0305-1048
Popis: TheEMBLNucleotideSequenceDatabase(http://www.ebi.ac.uk/embl/)incorporates,organizesanddistributesnucleotidesequencesfromallavailablepublicsources.Thedatabaseislocatedandmain-tainedattheEuropeanBioinformaticsInstitute(EBI)nearCambridge,UK.Inaninternationalcollabora-tionwithDDBJ(Japan)andGenBank(USA),dataareexchangedamongstthecollaboratingdatabasesonadailybasistoachieveoptimalsynchronization.Webin is the preferred web-based submissionsystemforindividualsubmitters,whileautomaticproceduresallowincorporationofsequencedatafromlarge-scalegenomesequencingcentresandfromtheEuropeanPatentOffice(EPO).Databasereleasesareproducedquarterly.Networkservicesallow free access to the most up-to-date datacollection viaFTP, Emailand WorldWide Webinterfaces.EBI’sSequenceRetrievalSystem(SRS)integratesandlinksthemainnucleotideandproteindatabasesplusmanyotherspecializedmolecularbiologydatabases.Forsequencesimilaritysearch-ing, a variety of tools (e.g. Fasta, BLAST) areavailablewhichallowexternaluserstocomparetheirownsequencesagainstthelatestdataintheEMBLNucleotideSequenceDatabaseandSWISS-PROT.AllresourcescanbeaccessedviatheEBIhomepageathttp://www.ebi.ac.uk.INTRODUCTIONThe European Bioinformatics Institute (EBI), an Outstation ofthe European Molecular Biology Laboratory (EMBL) inHeidelberg (Germany), is located on the Wellcome TrustGenome Campus near Cambridge (UK), together with theSanger Institute and the Human Genome Mapping ResourceCentre (HGMP-RC). Building, maintaining and providingbiological databases and information services to support datadeposition and data exploitation are the main missions of theService Programme of the EBI (1). Databases operated at theEBI include the EMBL Nucleotide Sequence Database (aka asEMBL-Bank), protein databases SWISS-PROT & TrEMBL(2), InterPro (3), the Macromolecular Structure Database(E-MSD) (4), ArrayExpress for gene expression data (5),ENSEMBL (6) for automatic genome annotation plus severalother databases many of which are produced in collaborationwith external groups.In Europe, the vast majority of all nucleotide sequence datagenerated and published are collected, organized and dis-tributed by the EMBL Nucleotide Sequence Database, theEuropean member of the tri-partide International NucleotideSequence Database Collaboration DDBJ/EMBL/GenBank,managing sequence data worldwide since 1982. Main sourcesof data are large-scale genome sequencing projects, directsubmissions by individual scientists plus sequence dataextracted from BIOTECH patent applications to theEuropean Patent Office. To achieve optimal synchronization,all new and updated database records are exchanged on adialy basis between EMBL, DDBJ (Japan) (7) and GenBank(USA) (8).Within a 12 month period the database size has increasedfrom about 12.9 million entries comprising 13.8 Gigabases(Release 68, September 2001) to 18.3 million entries and over23 Gigabases (Release 72, September 2002). The databasegrowth has been a direct consequence of ongoing collabora-tions with sequencing projects like the Mouse GenomeSequencing Consortium (MGSC), the InternationalAnopheles Genome Project and a growing number of othergenome sequencing groups producing large quantities of newsequence data. During the same period the number oforganisms represented in the database has risen by 25% toover 100000 species.Major new developments during 2002 (described in detailbelow) include the creation of the CON(struct) or CON(tig)
Databáze: OpenAIRE