鸟类音频数据预处理方法

doi:10.11871/jfdc.issn.2096-742X.2021.05.010

数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (5): 130-140.

doi: 10.11871/jfdc.issn.2096-742X.2021.05.010

鸟类音频数据预处理方法

张猛^1,^2,^*(),李健¹()

1.中国科学院计算机网络信息中心,北京 100190
2.中国科学院大学,北京 100049

收稿日期:2021-03-08 出版日期:2021-10-20 发布日期:2021-11-24
通讯作者: 张猛
作者简介:张猛, 中国科学院计算机网络信息中心,硕士研究生,主要研究方向为机器学习、大数据分析。
本文承担工作为：文中负责频谱图筛选算法和分类模型的实验和测试。
ZHANG Meng is a master student at CNIC. His research interests include machine learning and big data analysis.
In this paper, he is mainly responsible for experiment and test of spectrum filter algorithm and the classification model.
E-mail: zhangmeng@cnic.cn|李健,中国科学院计算机网络信息中心,博士,高级工程师,硕士生导师,中国科学院青年创新促进会成员。在国内外核心期刊及会议发表论文20余篇。主要研究方向为科研应用集成、数据挖掘和机器学习、e-Science应用。
本文主要承担工作为：基于卷积神经网络和聚类的音频频谱图筛选算法总体设计。
LI Jian, Ph.D., is a senior engineer, master supervisor, and the member of the Youth Innovation Promotion Association of the Chinese Academy of Sciences, Computer Network Information Center of the Chinese Academy of Sciences. He has published more than 20 papers in core journals and conferences at home and abroad. His main research directions are scientific research application integration, data mining and machine learning, and e-Science applications.
In this paper, he is mainly responsible for the overall design of audio spectrum image filtering algorithm based on convol-utional neural network and clustering.
E-mail: lijian@cnic.cn
基金资助:
国家重点研发计划(2019YFC0507405)

Bird Audio Data Preprocessing Method

ZHANG Meng^1,^2,^*(),LI Jian¹()

1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
2. University of Chinese Academy of Sciences, Beijing 100049, China

Received:2021-03-08 Online:2021-10-20 Published:2021-11-24
Contact: ZHANG Meng

摘要/Abstract

摘要：

【目的】从原始鸟类音频频谱图样本集中自动筛选并剔除噪音频谱图,可以提升鸟类物种分类的准确率。【方法】本文基于卷积神经网络,对频谱图提取特征向量,借助Faiss算法库计算特征向量的距离矩阵,然后使用DBSCAN (Density-Based Spatial Clustering of Applications with Noise)聚类算法筛选出噪音频谱图,最后将经过筛选后的频谱图样本集输入到分类模型中进行鸟类物种分类。【结果】通过本方法,从频谱图样本集中剔除了大量噪音频谱图,使得后续的鸟类物种的分类准确率得到了提升。【局限】由于DBSCAN算法聚类的效果受到邻域阈值(ε)和密度阈值(MinPts)参数的影响比较大,因此未来应该去探索自适应的方法获得参数值。【结论】本文将卷积神经网络和数据挖掘中的密度聚类算法相结合,提出了一种鸟类音频数据预处理方法,该方法可以自动筛选噪音频谱图,为后续的鸟类物种识别提供了高质量的频谱图样本集。

关键词: 鸟类音频, 频谱图, 数据筛选, 卷积神经网络, 聚类

Abstract:

[Objective] The accuracy of bird species classification can be improved by noise spectrogram filtering and removing from the sample set of original bird audio spectrograms. [Methods] Based on the convolutional neural network, this paper extracts the feature vector from the spectrogram, calculates the distance matrix of the feature vector with the Faiss algorithm library, and then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm to filter out the noise spectrogram. Finally, the filtered spectrogram sample set is input into the classification model for bird species classification. [Results]Through this method, a large number of noise spectrograms are removed from the spectrogram sample set so that the accuracy of subsequent bird species classification has been improved. [Limitations] Because the clustering effect of the DBSCAN algorithm is greatly affected by the neighborhood threshold (ε) and density threshold (MinPts) parameters, we should explore adaptive methods to obtain parameter values in the future. [Conclusions] This paper combines the convolutional neural network and the density clustering algorithm in data mining and proposes a bird audio data preprocessing method for automatically noise spectrogram filtering, which provides a high-quality spectrogram sample set for subsequent bird species identification.

Key words: bird audio, spectrogram, data filtering, convolutional neural network, clustering

张猛,李健. 鸟类音频数据预处理方法[J]. 数据与计算发展前沿, 2021, 3(5): 130-140.

ZHANG Meng,LI Jian. Bird Audio Data Preprocessing Method[J]. Frontiers of Data and Computing, 2021, 3(5): 130-140.

图/表 13

图1

图2.

图3

图4

图5

图6

图7

图8

图9

表1

表2

表3

图10

参考文献 17

[1]	范宗骥, 董大颖, 郑然, 等. 北京静福寺侧柏古树林鸟类群落多样性研究[J]. 北京林业大学学报, 2013, 35(5):46-55.
[2]	Bardeli R, Wolff D, Kurth F, et al. Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring[J]. Pattern Recognition Letters, 2010, 31(12):1524-1534. doi: 10.1016/j.patrec.2009.09.014
[3]	任芳. 鸟类鸣声特征提取及音素分类研究[D]. 首都师范大学, 2012.
[4]	谢将剑, 李文彬, 张军国, 丁长青. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(03):122-127.
[5]	冯郁茜. 基于深度学习的双模态特征融合鸟类物种识别算法[D]. 北京林业大学, 2019.
[6]	董雪. 基于卷积神经网络的自动昆虫声音识别系统[D]. 山东大学, 2018.
[7]	王恩泽. 基于鸣声的鸟类智能识别方法研究[D]. 西北农林科技大学, 2014.
[8]	路青起, 白燕燕基于双门限两级判决的语音端点检测方法[J]. 电子科技, 2012, 25(01):13-15+19.
[9]	谢将剑, 杨俊, 邢照亮, 张卓, 陈新. 多特征融合的鸟类物种识别方法[J]. 应用声学, 2020, 39(02):199-206.
[10]	王诗佳. 基于深度学习的声音事件识别研究[D]. 东南大学, 2018.
[11]	李媛媛. 卷积神经网络优化及其在图像识别中的应用[D]. 沈阳工业大学, 2016.
[12]	Simonyan K, Very deep convolutional net-works for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[13]	Johnson J, Douze M, Jégou H. Billion-scale similarity search with gpus[J]. IEEE Transactions on Big Data, 2019.
[14]	章永来, 周耀鉴. 聚类算法综述[J]. 计算机应用, 2019, 39(07):1869-1882.
[15]	Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial data-bases with noise[C]// Kdd. 1996, 96(34):226-231.
[16]	Bradley P S, Fayyad U M. Refining initial points for k-means clustering[C]// ICML. 1998, 98:91-99.
[17]	贾春福, 李瑞琪, 王雅飞. 基于同态加密的DBSCAN聚类隐私保护方案[J]. 通信学报, 2021, 42(02):1-11.

物种	训练集频谱图数量	测试集频谱图数量
云雀楼燕红额金翅雀属绿金翅锡嘴雀斑尾林鸽	1839 1601 1809 1641 1010 878	743 270 801 655 209 275
小嘴乌鸦秃鼻乌鸦毛脚燕大斑啄木鸟黄鹀欧亚鸲苍头燕雀松鸦欧歌鸲白鹡鸰大山雀家麻雀树麻雀赭红尾鸲欧亚红尾鸲叽咋柳莺欧柳莺喜鹊普通? 灰斑鸠紫翅椋鸟缔鹪鹩黑鸫欧歌鸫田鸫	979 1111 1502 1057 1366 1540 1531 1233 1754 1337 1608 1704 1109 1401 1568 1464 1660 1029 1542 1588 1636 1738 1519 1410 1610	734 765 378 426 534 777 747 302 773 680 716 860 680 726 785 753 680 643 755 516 644 820 595 802 641

物种	筛选前Top-1 MAP	筛选后Top-1 MAP
云雀楼燕红额金翅雀属绿金翅锡嘴雀斑尾林鸽	0.7692 0.6 0.7692 0.5333 0.625 0.3913	1.0 0.75 0.8181 0.6154 0.625 0.4667
小嘴乌鸦秃鼻乌鸦毛脚燕大斑啄木鸟黄鹀欧亚鸲苍头燕雀松鸦欧歌鸲白鹡鸰大山雀家麻雀树麻雀赭红尾鸲欧亚红尾鸲叽咋柳莺欧柳莺喜鹊普通? 灰斑鸠紫翅椋鸟缔鹪鹩黑鸫欧歌鸫田鸫	1.0 1.0 1.0 0.4783 0.5455 0.5714 0.6818 0.3333 0.5 0.3181 0.5556 0.5 0.6667 0.7692 0.4286 0.8 0.4286 0.7647 0.4545 0.2143 0.1 0.5652 0.6 0.4722 0.6667	1.0 1.0 1.0 0.4348 0.4444 0.5714 0.75 0.2857 0.5556 0.35 0.5556 0.5556 0.6667 0.75 0.6 0.85 0.5789 0.7778 0.3158 0.3333 0.1111 0.5909 0.6316 0.4848 1.0

鸟类音频数据预处理方法

Bird Audio Data Preprocessing Method

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 17

相关文章 10

编辑推荐

Metrics

本文评价

[1]	石雪梅,朱克亮,张祥民,张树涛,陈良锋. 基于生成对抗网络的有遮挡人脸修复方法[J]. 数据与计算发展前沿, 2022, 4(4): 123-131.
[2]	肖楠,周明珠,邢军,罗泽,李晓辉. 基于高分辨率网络和注意力机制的真伪卷烟包装鉴别[J]. 数据与计算发展前沿, 2021, 3(5): 118-129.
[3]	陈子健,李俊,岳兆娟,赵泽方. 基于自编码器与属性信息的混合推荐模型[J]. 数据与计算发展前沿, 2021, 3(3): 148-155.
[4]	祁荣苓,焦文彬,汪洋. 基于句子向量表示和模糊C均值的电子政务文档自动摘要技术[J]. 数据与计算发展前沿, 2021, 3(2): 103-111.
[5]	李言,陈远平. 科研信息门户的资源推荐技术研究[J]. 数据与计算发展前沿, 2021, 3(2): 112-119.
[6]	翟擎辰,周园春,宋秋成,王建伟,孟珍,张艳玲. 数据降维及聚类算法在烟叶相似性分析中的应用[J]. 数据与计算发展前沿, 2021, 3(1): 112-121.
[7]	杨润佳,刘泽三. 一种工业报警相关性数据挖掘算法[J]. 数据与计算发展前沿, 2020, 2(5): 110-121.
[8]	葛胤池,张辉,宋文燕,王轩. 基于领域本体的科技资源聚类方法研究[J]. 数据与计算发展前沿, 2020, 2(5): 13-22.
[9]	刘晓东,倪浩然. 深度学习技术在学科融合研究中的应用[J]. 数据与计算发展前沿, 2020, 2(5): 99-109.
[10]	欧阳与点,谢鲲. 网络性能数据恢复算法[J]. 数据与计算发展前沿, 2020, 2(3): 55-65.