基于深度学习的小目标检测与识别

doi:10.11871/jfdc.issn.2096-742X.2020.02.010

数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (2): 120-135.

doi: 10.11871/jfdc.issn.2096-742X.2020.02.010

所属专题： “数据分析技术与应用”专刊

• 专刊: 数据分析技术与应用 • 上一篇下一篇

基于深度学习的小目标检测与识别

冷佳旭^1,²,刘莹^1,^2,^*()

^1. 中国科学院大学计算机科学与技术学院,北京 100089
^2. 中国科学院大学数据挖掘与高性能计算实验室,北京 101400

出版日期:2020-04-20 发布日期:2020-06-03
通讯作者: 刘莹
作者简介:冷佳旭,博士生,目前就读于中国科学院大学。主要研究方向包括：计算机视觉、深度学习、目标检测、目标跟踪和双目立体视觉。
本文主要负责算法设计与实验验证部分。
Leng Jiaxu is currently pursuing his Ph.D. degree in School of Computer Science and Tecnology in University of Chinese Academy of Sciences. His current research interests include computer vision, deep learning, object detection, object tracking, and stereo vision.
In this paper, he is responsible for the design and experimental analysis of the proposed algorithms.
E-mail: lengjiaxu17@mails.ucas.ac.cn|刘莹,中国科学院大学教授,中国科学院数据挖掘与高性能计算实验室负责人。主要研究方向包括数据挖掘、人工智能、并行计算等。
本文中完成了论文的国内外现状分析、方法原理和结论展望。
Liu Ying is currently a professor of School of Computer Science and Tecnology in University of Chinese Academy of Sciences, and the Dean of the Data Mining and High Performanle Computing Lab. Her research interests include data mining, artificial intelligence, parallel computing, etc.
In this paper, she is responsible for the literature review, principles and conclusions.
基金资助:
国家自然科学基金(71671178);国家自然科学基金(91546201);中国科学院大学优秀青年教师科研能力提升重点项目

Small Object Detection and Recognition Based onDeep Learning

Leng Jiaxu^1,²,Liu Ying^1,^2,^*()

^1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100089, China
^2. Data Mining and High Performance Computing Lab, University of Chinese Academy of Sciences, Beijing 101400, China

Online:2020-04-20 Published:2020-06-03
Contact: Ying Liu

摘要/Abstract

摘要：

【目的】目前,现有的基于深度学习的检测算法针对小目标的检测效果较差。本文旨在通过充分考虑小目标的特点来提升小目标的检测与识别性能。【方法】本文从不同方面来提升小目标检测与识别,其中包括特征融合、上下文学习和注意力机制。针对小目标特征难以提取问题,提出一种双向特征融合的方法。另外,鉴于小目标特征不明显问题,提出一种利用上下文信息来提升检测性能的方法。更进一步,为了更好地识别小目标的类别,提出一种注意力转移的方法。【结果】实验结果表明,我们提出的方法在公共数据集上均显著地提高了小目标的检测和识别性能。【结论】研究特征融合、上下文利用和注意力机制的方法对于提升小目标检测与识别是非常有价值的。

关键词: 小目标检测, 特征融合, 上下文学习, 注意力机制

Abstract:

[Objective] In this paper, we aim to improve the detection performance for small objects by considering the characteristics of small objects under deep learning-based detection frameworks. [Methods] This paper improves small object detection and recognition performance from different aspects, including feature fusion, context learning and attention mechanism. Since the features of the small object are not evident, a bidirectional feature fusion method is proposed to improve the feature expression capability for small objects. In addition, a novel method is proposed to improve the detection performance by using the context information of small objects. Furthermore, to better identify the categories of small objects, an attention transfer method is proposed to improve the recognition rate. [Results] Experimental results show that the three proposed methods can significantly improve the detection and recognition performance for small objects on public datasets. [Conclusions] The research on feature fusion, context utilization and attention mechanism is very valuable for improving small object detection in complex scenes.

Key words: small object detection, feature fusion, context learning, attention mechanism

冷佳旭,刘莹. 基于深度学习的小目标检测与识别[J]. 数据与计算发展前沿, 2020, 2(2): 120-135.

Leng Jiaxu,Liu Ying. Small Object Detection and Recognition Based onDeep Learning[J]. Frontiers of Data and Computing, 2020, 2(2): 120-135.

图/表 13

图1

图2

图3

图4

表1

图5

图6

图7

图8

图9

表2

表3

图10

参考文献 62

[1]	Z. Cai and N. Vasconcelos . Cascade r-cnn: delving into high quality object detection [C]. in IEEE CVPR, 2018.
[2]	K. He, G. Gkioxari, P. Dolla $\acute{r}$, and R. Girshick . Mask r-cnn [C]. in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980-2988.
[3]	S. Ren, K. He, R. Girshick, J. Sun . Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 1137-1149, 2017.
[4]	W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. -Y. Fu, and A. C. Berg . Ssd: Single shot multibox detector[J]. in European conference on computer vision. Springer, 2016, pp. 21-37.
[5]	J. Redmon and A. Farhadi . Yolo9000: better, faster, stronger [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[6]	T. Kong, A. Yao, Y. Chen, F. Sun . “Hypernet: Towards accurate region proposal generation and joint object detection [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853.
[7]	W. Liu, A. Rabinovich, A. C. Berg . Parsenet: Looking wider to see better[J]. arXiv preprint arXiv:1506.04579, 2015.
[8]	J. Long, E. Shelhamer, T. Darrell . Fully convolutional networks for semantic segmentation [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[9]	T. -Y. Lin, P. Dolla $\acute{r}$, R. Girshick, K. He, B. Hariharan, and S. Belongie . Feature pyramid networks for object detection [C]. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[10]	J. Jeong, H. Park, N. Kwak . Enhancement of ssd by concatenating feature maps for object detection. 2017.
[11]	K. He, X. Zhang, S. Ren, J. Sun . Deep residual learning for image recognition[C]. in: CVPR, 2016.
[12]	W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang C. -C. Loy , et al. Deepid-net: Deformable deep convolutional neural networks for object detection[C] in: CVPR, 2015.
[13]	W. Chu, D. Cai. Deep feature based contextual model for object detection[J]. in: Neurocomputing, 2018.
[14]	Y. Zhu, R. Urtasun, R. Salakhutdinov, S. Fidler . segdeepm: Exploiting segmentation and context in deep neural networks for object detection[C]. in: CVPR, 2015.
[15]	X. Chen, A. Gupta. Spatial memory for context reasoning in object detection[C]. in: ICCV, 2017.
[16]	K. Hara, M.-Y. Liu, O. Tuzel, and A.-m Farahmand . Attentionalnetwork for visual object detection[J]. arXiv preprint arXiv:1702.01478, 2016.
[17]	J. Li, Y. Wei, X. Liang, J. Dong, T. Xu, J. Feng, S. Yan . Attentive contexts for object detection[J]. IEEE Transactions on Multimedia, 19(5):944-954, 2017.
[18]	K. He, X. Zhang, S. Ren, and J. Sun . Identity mappings in deep residual networks[J]. In European conference on computer vision, pages 630-645. Springer, 2016.
[19]	X. Liu, T. Xia, J. Wang, Y. Lin . Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. CoRR, abs/1603.06765, 2016.
[20]	Fu J, Zheng H, Mei T . Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition [C]//CVPR. 2017,2:3.
[21]	T. -Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dolla ́r, and C. L. Zitnick . Microsoft coco: Common objects in context[J]. In European conference on computer vision, pages 740-755. Springer, 2014.
[22]	S. Bell, C. Lawrence Zitnick, K. Bala, R. Girshick . Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2874-2883.
[23]	T. Kong, A. Yao, Y. Chen, F. Sun . Hypernet: Towards accurate region proposal generation and joint object detection [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853.
[24]	Wang H, Wang Q, Gao M , et al. Multi-scale location-aware kernel representation for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1248-1257.
[25]	J. Long, E. Shelhamer, T. Darrell . Fully convolutional networks for semantic segmentation [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[26]	T. -Y. Lin, P. Dolla $\acute{r}$, R. Girshick, K. He, B. Hariharan, and S. Belongie . Feature pyramid networks for object detection [C]. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[27]	J. Jeong, H. Park, N. Kwak . Enhancement of ssd by concatenating feature maps for object detection. 2017.
[28]	S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, M. Hebert . An empirical study of context in object detection [C]. In CVPR 2009. IEEE Conference on, pages 1271-1278. IEEE, 2009.
[29]	R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille . The role of context for object detection and semantic segmentation in the wild[J]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 891-898, 2014.
[30]	R. Yu, X. Chen, V. I. Morariu, L. S. Davis . The role of context selection in object detection[J]. arXiv preprint arXiv:1609.02948, 2016.
[31]	S. Gidaris and N. Komodakis . Object detection via a multi-region and semantic segmentation-aware cnn model[C]. In Proceedings of the IEEE International Conference on Computer Vision, pages 1134-1142, 2015.
[32]	W. Ouyang, K. Wang, X. Zhu, X. Wang . Learning chained deep features and classifiers for cascade in object detection[J]. arXiv preprint arXiv:1702.07054, 2017.
[33]	X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang , et al. Crafting gbd-net for object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 40(9):2109-2123,2018.
[34]	Hu R., Xu H., Rohrbach M., Feng J., Saenko K., Darrell T. Natural language object retrieval[C]. In: CVPR. (2016).
[35]	Mao J., Huang J., Toshev A., Camburu O., Yuille A.L., Murphy K. Generation and comprehension of unambiguous object descriptions[C]. In: CVPR. (2016).
[36]	X. Chen and A. Gupta . Spatial memory for context reasoning in object detection[J]. arXiv preprint arXiv:1704.04224, 2017.
[37]	X. Chen, L.-J. Li, L. Fei-Fei, A. Gupta . Iterative visual reasoning beyond convolutions[J]. arXiv preprint arXiv:1803.11189, 2018.
[38]	Ji Y, Zhang H, Wu QMJ . Salient object detection via multi-scale attention CNN[J]. Neurocomputing 322:130-140, 2018.
[39]	Zhang H, Ji Y, Huang W et al. Sitcom-star-based clothing retrieval for video advertising: a deep learning framework[J]. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x. 2018.
[40]	Xu K, Ba J, Kiros R et al. Show, attend and tell: Neural image caption generation with visual attention[C]. In: International conference on machine learning, pp 2048-2057. 2015.
[41]	Chen L, Zhang H, Xiao J et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659-5667,2017.
[42]	Seo PH, Lin Z, Cohen S et al. Progressive attention net- works for visual attribute prediction[J]. arXiv preprint arXiv:1606.02393. 2016.
[43]	Das D, George Lee CS . Sample-to-sample correspondence for unsupervised domain adaptation[J]. Eng Appl Artif Intell 73:80-91. 2018.
[44]	Das D, George Lee CS. Unsupervised domain adaptation using regularized hyper-graph matching[C]. In: 2018 25th IEEE international conference on image processing (ICIP).
[45]	Larochelle H, Hinton GE . Learning to combine foveal glimpses with a third-order Boltzmann machine[J]. In: Advances in neural information processing systems, pp 1243-1251, 2010.
[46]	Hochreiter S, Schmidhuber J . Long short-term memory[J]. Neural Comput 9(8):1735-1780,1997.
[47]	Kim JH, Lee SW, Kwak D et al. Multimodal residual learning for visual QA[J]. In: Advances in neural information pro-cessing systems, pp 361-369, 2016.
[48]	Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation[C]. In: Proceedings of the IEEE interna- tional conference on computer vision, pp 1520-1528,2015.
[49]	Srivastava RK, Greff K, Schmidhuber J . Training very deep networks[J]. In: Advances in neural information processing systems, pp 2377-2385,2015.
[50]	Mnih V, Heess N, Graves A et al. Recurrent models of visual attention[C]. In: NIPS. 2014.
[51]	Jaderberg M, Simonyan K, Zisserman A . Spatial transformer networks[J]. In: Advances in neural information processing systems, pp 2017-2025,2015.
[52]	Xiao T, Xu Y, Yang K et al. The application of two-level attention models in deep convolutional neural network for fine- grained image classification[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842-850,2015.
[53]	Zhang Y, Qiu Z, Yao T , et al. Fully convolutional adaptation networks for semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6810-6818.
[54]	R. Yu, X. Chen, V. I. Morariu, L. S. Davis . The role of context selection in object detection[J]. arXiv preprint arXiv:1609.02948, 2016.
[55]	S. Zagoruyko, A. Lerer, T.-Y. Lin, P. O. Pinheiro, S. Gross, S. Chintala, P. Dolla r . A multipath network for object detection[J]. arXiv preprint arXiv:1604.02135, 2016.
[56]	X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang , et al. Crafting gbd-net for object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 40(9):2109-2123,2018.
[57]	Radford A, Metz L, Chintala S . Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015.
[58]	Brock A, Donahue J, Simonyan K . Large scale gan training for high fidelity natural image synjournal[J]. arXiv preprint arXiv:1809.11096, 2018.
[59]	Li J, Liang X, Wei Y , et al. Perceptual generative adversarial networks for small object detection [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1222-1230.
[60]	Wang X, Shrivastava A, Gupta A . A-fast-rcnn: Hard positive generation via adversary for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2606-2615.
[61]	Law H, Deng J . Cornernet: Detecting objects as paired keypoints [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 734-750.
[62]	Duan K, Bai S, Xie L , et al. Centernet: Keypoint triplets for object detection [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6569-6578.

方法	输入	训练数据	测试数据	mAP	FPS
YOLO	448	VOC2007 + 2012	VOC2007	63.4	45
YOLOV2	416	VOC2007 + 2012	VOC2007	76.8	67
Faster R-CNN		VOC2007 + 2012	VOC2007	73.2	5
R-FCN		VOC2007 + 2012	VOC2007	80.5	5.9
SSD	300	VOC2007 + 2012	VOC2007	77.7	61
DSSD	321	VOC2007 + 2012	VOC2007	78.6	9
ESSD	300	VOC2007 + 2012	VOC2007	79.2	52
SSD	512	VOC2007 + 2012	VOC2007	79.8	25
DSSD	513	VOC2007 + 2012	VOC2007	81.5	6
ESSD	512	VOC2007 + 2012	VOC2007	82.4	18

方法	基础网络	mAP
Faster R-CNN	VGG16	73.2
Faster R-CNN	Residual-101	76.4
YOLOv2	Darknet-19	78.6
DSSD	Residual-101	81.5
Context-Aware Faster R-CNN	VGG16	82.1
Context-Aware Faster R-CNN	Residual-101	84.8

方法	CIFAR-100	Caltech-256	CUB-200
TLAN	72.88	68.82	77.90
FCAN	95.80	76.40	82.04
RA-CNN	97.21	79.24	85.31
ATM	97.68	80.32	86.12

基于深度学习的小目标检测与识别

Small Object Detection and Recognition Based onDeep Learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 62

相关文章 3

编辑推荐

Metrics

本文评价

[1]	刘琦玮,李俊,顾蓓蓓,赵泽方. TSAIE：图像增强文本的多模态情感分析模型[J]. 数据与计算发展前沿, 2022, 4(3): 131-140.
[2]	肖楠,周明珠,邢军,罗泽,李晓辉. 基于高分辨率网络和注意力机制的真伪卷烟包装鉴别[J]. 数据与计算发展前沿, 2021, 3(5): 118-129.
[3]	陈涛,安俊秀. 基于特征融合的微博短文本情感分类研究[J]. 数据与计算发展前沿, 2020, 2(6): 21-29.