Topics on the standardisation of chloroplast DNA sequence data

Autor: Turudić, Ante, Liber, Zlatko, Grdiša, Martina, Jakše, Jernej, Varga, Filip, Šatović, Zlatko
Přispěvatelé: Goreta Ban, Smiljana, Šatović, Zlatko
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Popis: The number of available DNA sequences in public genetic databases is constantly increasing. In plants, this is particularly evident in the amount of available complete chloroplast genomes, which are widely used in phylogenetic research. Chloroplast DNA genomes are circular and most have a four-part structure caused by two copies of a large inverted repeat (IR). We investigated inconsistencies in publicly available chloroplast genome sequence data regarding how stored public data account for structure. Our results show that there is no standardization in the storage of chloroplast genome sequences with respect to the structure of inverted repeats, as sequences are stored in different orders. Furthermore, there are many sequences in the public data without annotated inverted repeats, although these repeats are expected. In reviewing specialized chloroplast annotation tools, we found that there is no uniform method for identifying inverted repeats. Each tool analyzed takes a different approach and covers different specific situations. These results show that there is a need to standardize formats when it comes to storing data of specific types such as chloroplast sequences. Our results suggest that the existing public chloroplast data should be revised in terms of standard storage format and missing data. In addition to stored data, we found that specialized chloroplast annotation tools need improvement regarding the detection of inverted repeats.
Databáze: OpenAIRE