Popis: |
A powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank. info:eu-repo/semantics/publishedVersion |