Technical comment to 'Database verification studies of SWISS-PROT and GenBank' by Karp et al

Autor: Rolf Apweiler, Vivien Junker, Paul J. Kersey, Amos Marc Bairoch
Předmět:
Zdroj: Scopus-Elsevier
Bioinformatics, Vol. 17, No 6 (2001) pp. 533-534
ISSN: 1367-4803
Popis: In their paper “Database verification studies of SWISS-PROT and GenBank” Karp et al. (2001) conclude:(1) “SWISS-PROT is more incomplete than we ex-pected...”; (2) “Even if wecombine SWISS-PROTand TrEMBL, some sequences from the full genomesare missing from the combined dataset”; (3) “In manycases, translated GenBank genes do not exactly matchthe corresponding SWISS-PROT sequences, ...”; and(4) “...that SWISS-PROT does not identify a significantnumber of experimentally characterized proteins”.These results, and the approach used to arrive at theseresults, are in our opinion somewhat misleading. Herein,we only focus on four major points.First, there has never been a claim that SWISS-PROTis comprehensive. Thus, it is surprising that Karp et al.found that “SWISS-PROT is more incomplete than weexpected...”. To makesequences available as quickly aspossible without diluting the quality of SWISS-PROT,the supplemental database TrEMBL was introducedin 1996 and contains the translation of all coding se-quences (CDS) in the DDBJ/EMBL/GenBank nucleotidesequence database, except those already included inSWISS-PROT. Snapshots of the SWISS-PROT, TrEMBLand TrEMBLnew databases are released weekly, syn-chronised with the DDBJ/EMBL/GenBank nucleotidesequence database and provide comprehensive cover-age (ftp://ftp.ebi.ac.uk/pub/databases/sp tr nrdb/). Theweekly comprehensive SWISS-PROT/TrEMBL nonre-dundant database (SPTR) has been widely publicisedon the EBI and ExPASy web-servers and in variouspublications (e.g. Apweiler, 2000).Second, the authors’ assertions that “Even if wecombine SWISS-PROT and TrEMBL, some sequencesfrom the full genomes are missing from the com-bined dataset.” and “SWISS-PROT curators apparentlychose not to replace existing SWISS-PROT sequenceswith sequences from complete-genome projects” arerather inaccurate. Karp et al. tried to establish corre-sponding sets of SWISS-PROT/TrEMBL proteins and
Databáze: OpenAIRE