A hierarchical birdsong feature extraction architecture combining static and dynamic modeling

Autor:	Yanan Wang, Aibin Chen, Huaicheng Li, Guoxiong Zhou, Jizheng Yi, Zhiqiang Zhang
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Birdsong classification Context modeling Hierarchical refinement model Pearson correlation coefficient Self-built dataset Complex background noise Ecology QH540-549.5
Zdroj:	Ecological Indicators, Vol 150, Iss , Pp 110258- (2023)
Druh dokumentu:	article
ISSN:	1470-160X
DOI:	10.1016/j.ecolind.2023.110258
Popis:	To conserve bird biodiversity and monitor the distribution of species in the region, it is of tremendous necessity to identify birds by their songs and explore the rich ecological information birdsong contains. The audios recorded in the monitoring area generally have complex background noise, the characteristics of the song are not prominent and the biological spectrum information is not comprehensive, which brings some challenges to the identification of birds. This study proposes a hierarchical birdsong feature extraction architecture combining dynamic and static modeling to cope with complex environments as a modeling context. Firstly, six common speech features were extracted for the characteristics of birdsong. The Pearson correlation coefficient is then used to analyze the correlations between birdsong and human speech, examining the correlations between each feature in the presence and absence of environmental noise interference. Combined with the scatter plot matrix analysis, we conclude that Mel Frequency Cepstral Coefficient (MFCC) is more suitable comparing with other features when dealing with birdsong and can superiorly cope with a complex background noise. Secondly, a feature extraction architecture is built, which integrates static and dynamic modeling to fully explore the contextual relationship, to solve the problem of ignoring the internal structure information of the patch and losing some spatial information in the Transformer-type model. Finally, a hierarchical refinement module is designed to help extract more detailed features, as well as to optimize the computational cost of the Transformer-type model that requires many training data and has high complexity. The performance of the model can be detected with 93.67 % accuracy on the self-built birdsong dataset, 95.19 % accuracy on the public birdsong dataset Birdsdata and 97.02 % accuracy on the public environmental dataset UrbanSound8k.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/702de82923c0428f9d2070408daaca08 Zobrazit plný text záznamu Full Text from ScienceDirect View record in DOAJ