Abstrakt: |
Introduction: Genomic selection has provided the dairy industry with a powerful tool to increase genetic gain in economically important traits such as milk production (Taylor et al. 2016). One way to identify new loci and confirm existing QTL is genome-wide association analysis (GWAA). Furthermore, the identification of gene loci with major impacts on economically important traits is one of the most important goals of dairy cattle breeding. It was hypothesized that QTL-assisted selection and genomic regions affecting production traits increase the efficiency of selection and improve production output. Genome-wide association studies typically focus on genetic markers with the strongest evidence of association. However, individual markers often explain only a small component of genetic variance and therefore provide a limited understanding of the trait under study (Dadousis et al., 2017). One solution to address the above issues and deepen the understanding of the genetic background of complex traits is to move the analysis from the SNP to the gene and gene-set level. In a gene set analysis, a group of related genes harboring significant SNPs previously identified in GWAS are tested for over-representation in a particular signaling pathway. Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. It reduces the complexity of molecular data and improves the interpretability of biological insights (Peñagaricano et al., 2016). Material and methods: The present study aimed to perform a genome-wide association study (GWAS) based on gene set enrichment analysis to identify the loci associated with milk protein composition traits. For each cow, a total of eight traits including protein yield, protein percentage, αs1-casein, αs2-casein, β-casein, κ-casein, α-lactalbumin and β-lactoglobulin were recorded using plink software and no any correction to adjust the error rate. Gene set analysis essentially consists of three distinct steps: (1) the assignment of SNPs to genes, (2) the assignment of genes to functional categories, and finally (3) the association analysis between each functional category and the phenotype of interest. Briefly, nominal P-values < 0.05 from the GWAS analyzes were used for each trait to identify significant SNPs. Using the biomaRt2 R package, the SNPs were mapped to genes when located within the genomic sequence of the gene or within a 15 kb flanking region upstream and downstream of the gene to include SNPs located in regulatory regions. The Pathway databases Gene Ontology and Kyoto Encyclopedia of Genes and Genomes were used to assign genes to functional categories. The GO database labels biological descriptors for genes based on attributes of their encoded products and is further divided into 3 components: biological process, molecular function and cellular component. The KEGG Pathway Database contains metabolic and regulatory pathways that represent the current state of knowledge about molecular interactions and reaction networks. Finally, Fisher's exact test was performed to test the over-representation of the significant genes for each gene set. The gene enrichment analysis was performed with the KOBAS platform. to identify over-represented biological processes. In the next step, a bioinformatic analysis was performed to identify the biological pathways performed in the BioMart, DAVID and GeneCards databases Results and discussion: Gene-set enrichment analysis has proven to be an excellent complement to genome-wide association analysis (Gambra et al., 2013; Abdalla et al., 2016). Among the available geneset databases, GO is probably the most popular, while KEGG is a relatively new tool gaining ground in livestock genomics (Morota et al., 2015, 2016). We hypothesized that using genset information could improve prediction. However, none of the SNP classes of the gene sets outperformed the standard whole genome approach. Gene sets have been developed primarily using data from model organisms such as mice and flies, so it is possible that some of the genes included in these terms are irrelevant to milk production. It is likely that a better understanding of the biology underlying milk production in particular, as well as advances in bovine genome annotation, may provide new opportunities for predicting production using gene set information. According to the gene set enrichment analysis, 20 categories from gene ontology and the KEGG pathway were associated with the associated traits (P0.05). These categories include oxytocin signaling, glycerolipid metabolism, response to progesterone, calcium ion detection, complement and coagulation cascades, and amino acid binding, including the significant association of candidate genes CDH13, P4HTM, SPP1, CSN1S1, CSN2, SERPINA1, SLC35A3, ODC1, and PAEP with protein yield and content, phosphorylation of glycoproteins, coagulation and curd solidification of milk and lactose synthesis. Some of these regulatory regions, such as B. enhancers, are far removed from the genes. Therefore, although the gene could be part of the analysis, the relevant variant would likely not be included in the SNP class of the gene set. Finally, a linkage disequilibrium disrupts the use of biological information in prediction, since irrelevant regions (regions with no biological role) capture some of the information encoded in relevant regions, giving both regions similar predictive abilities. Using very high-density SNP data or even whole genome sequence data could alleviate some of these problems. Finally, it is worth noting that our gene-set enrichment analysis was performed using a panel of SNPs obtained from a single marker regression GWAS based on a simplified theory of the genomic background of traits, e.g. Ignoring the collective effect of SNP. Therefore, other approaches (e.g. GWAS, which studies SNP through SNP interactions) might provide a better basis for analyzing the biological pathway. Conclusion: Our result showed a potential for genetic selection to improve milk quality in terms of milk protein composition per animal. Because genetic improvements are heritable, cumulative, and permanent, that improvement would be permanent and beneficial. [ABSTRACT FROM AUTHOR] |