A General Data Format for Summarizing Taxonomic Information

Autor: Paul B. Hamel, James A. Peters, Larry E. Morse
Rok vydání: 1971
Předmět:
Zdroj: BioScience. 21:174-180
ISSN: 1525-3244
0006-3568
DOI: 10.2307/1295764
Popis: The increasing use of computers and information systems in systematic biology now makes desirable the development of a general data format for summarizing and communicating taxonomic information. Such a format should be acceptable to the biologist and to the computer. It must be compatible with the major programming languages and with commonly used equipment including card-readers and teletypes. The early use of computers by biologists for numerical taxonomy or phenetic clustering is well known, but in recent years a number of other exciting computer methods have been developed, many of which are noted by Ledley (1965), Crovello and MacDonald (1970), and Furlow et al. (in press). General information storage and retrieval systems for biological data are also being studied, such as the one being developed for the Flora North America Program (Shetler, in press). Although not computerized, other imaginative methods for publishing information have also appeared in recent years; several are mentioned by Leenhouts (1966) and Morse (in press). Modern information-processing technology allows convenient data exchange between researchers as well as the usual processing of one's own data, as discussed by Peters and Collette (1968). In some commercially available systems, a central computer serves the entire nation through a network of datacommunication links and a user can permit others access to his data and programs. These systems allow development of a centralized taxonomic data bank of data files prepared by the various workers in a project and released by their authors for general use. For effective utilization of such an information network, standardized machineindependent data formats are necessary. Preliminary data-sharing experiments between Michigan State University and the Smithsonian Institution and between the Smithsonian and the American Museum of Natural History have already shown the potential convenience and practicality of such a system. For those who would choose to use it, a standardized data format would offer several advantages over independent and individual preparation of taxonomic data for different computers and information systems. The standard format would make one investigator's data immediately available and comprehensible to all others using the same format. It would also make any information in the proper format available for one's own use without local modifications. Furthermore, a data standard facilities development of packages or libraries of programs for processing any data so encoded. Once written and tested, these routines could be used easily by others who had no knowledge of computer programming itself. Without the standard, they would have to modify the programs to accept their own data, or reformat the data for the programs, no easy task for a computer programmer, much less the average taxonomist. Early versions of the data format described here were, in fact, developed for use with program packages being prepared by the authors, and have also been used by others for numerical phenetics and for phyletic studies. A standard data format can also provide a basis for theoretical models of taxon and character concepts for various studies, as noted by Heywood (1968). Possibly ideas such as "unit character" and "polythetic taxon" could be expressed in terms of the data format, or vice versa, providing a common base for discussion of various algorithms and procedures. New theoretical results, in turn, could be incorporated rapidly into existing programs if xplicable in terms of the data base involved. Finally, standardization of the taxonomic data format for computer use would offer theoreticians an extensive base of biologically acceptable test data for checking new hypotheses. More complex formats, of course, may be needed for general taxonomic information-retrieval, and we fully recognize the need for a "background matrix" of detailed information to support the data summary presented in the matrix format.
Databáze: OpenAIRE