Phylogenetic-based methods for fine-scale classification of PRRSV-2 ORF5 sequences: a comparison of their robustness and reproducibility

Autor: Kimberly VanderWaal, Nakarin Pamornchainavakul, Mariana Kikuti, Daniel C. L. Linhares, Giovani Trevisan, Jianqiang Zhang, Tavis K. Anderson, Michael Zeller, Stephanie Rossow, Derald J. Holtkamp, Dennis N. Makau, Cesar A. Corzo, Igor A. D. Paploski
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Frontiers in Virology, Vol 4 (2024)
Druh dokumentu: article
ISSN: 2673-818X
DOI: 10.3389/fviro.2024.1433931
Popis: Disease management and epidemiological investigations of porcine reproductive and respiratory syndrome virus-type 2 (PRRSV-2) often rely on grouping together highly related sequences. In the USA, the last five years have seen a major shift within the swine industry when classifying PRRSV-2, beginning to move away from RFLP (restriction fragment length polymorphisms)-typing and adopting the use of phylogenetic lineage-based classification. However, lineages and sub-lineages are large and genetically diverse, making them insufficient for identifying new and emerging variants. Thus, within the lineage system, a dynamic fine-scale classification scheme is needed to provide better resolution on the relatedness of PRRSV-2 viruses to inform disease management and monitoring efforts and facilitate research and communication surrounding circulating PRRSV viruses. Here, we compare fine-scale systems for classifying PRRSV-2 variants (i.e., genetic clusters of closely related ORF5 sequences at finer scales than sub-lineage) using a database of 28,730 sequences from 2010 to 2021, representing >55% of the U.S. pig population. In total, we compared 140 approaches that differed in their tree-building method, criteria, and thresholds for defining variants within phylogenetic trees. Three approaches resulted in variant classifications that were reproducible and robust even when the input data or input phylogenies were changed. For these approaches, the average genetic distance among sequences belonging to the same variant was 2.1–2.5%, and the genetic divergence between variants was 2.5–2.7%. Machine learning classification algorithms were trained to assign new sequences to an existing variant with >95% accuracy, which shows that newly generated sequences can be assigned to a variant without repeating the phylogenetic and clustering analyses. Finally, we identified 73 sequence-clusters (dated
Databáze: Directory of Open Access Journals