Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (2): 120-135.
doi: 10.11871/jfdc.issn.2096-742X.2020.02.010
Special Issue: “数据分析技术与应用”专刊
• Special Issue: Data Analysis Technology & Application • Previous Articles Next Articles
Online:
2020-04-20
Published:
2020-06-03
Contact:
Ying Liu
E-mail:yingliu@ucas.ac.cn
Leng Jiaxu,Liu Ying. Small Object Detection and Recognition Based onDeep Learning[J]. Frontiers of Data and Computing, 2020, 2(2): 120-135.
Table 1
Detection results of our ESSD and state-of-the-art detectors on PASCAL VOC 2007"
方法 | 输入 | 训练数据 | 测试数据 | mAP | FPS |
---|---|---|---|---|---|
YOLO | 448 | VOC2007 + 2012 | VOC2007 | 63.4 | 45 |
YOLOV2 | 416 | VOC2007 + 2012 | VOC2007 | 76.8 | 67 |
Faster R-CNN | VOC2007 + 2012 | VOC2007 | 73.2 | 5 | |
R-FCN | VOC2007 + 2012 | VOC2007 | 80.5 | 5.9 | |
SSD | 300 | VOC2007 + 2012 | VOC2007 | 77.7 | 61 |
DSSD | 321 | VOC2007 + 2012 | VOC2007 | 78.6 | 9 |
ESSD | 300 | VOC2007 + 2012 | VOC2007 | 79.2 | 52 |
SSD | 512 | VOC2007 + 2012 | VOC2007 | 79.8 | 25 |
DSSD | 513 | VOC2007 + 2012 | VOC2007 | 81.5 | 6 |
ESSD | 512 | VOC2007 + 2012 | VOC2007 | 82.4 | 18 |
[1] | Z. Cai and N. Vasconcelos . Cascade r-cnn: delving into high quality object detection [C]. in IEEE CVPR, 2018. |
[2] | K. He, G. Gkioxari, P. Dolla $\acute{r}$, and R. Girshick . Mask r-cnn [C]. in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980-2988. |
[3] | S. Ren, K. He, R. Girshick, J. Sun . Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 1137-1149, 2017. |
[4] | W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. -Y. Fu, and A. C. Berg . Ssd: Single shot multibox detector[J]. in European conference on computer vision. Springer, 2016, pp. 21-37. |
[5] | J. Redmon and A. Farhadi . Yolo9000: better, faster, stronger [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271. |
[6] | T. Kong, A. Yao, Y. Chen, F. Sun . “Hypernet: Towards accurate region proposal generation and joint object detection [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853. |
[7] | W. Liu, A. Rabinovich, A. C. Berg . Parsenet: Looking wider to see better[J]. arXiv preprint arXiv:1506.04579, 2015. |
[8] | J. Long, E. Shelhamer, T. Darrell . Fully convolutional networks for semantic segmentation [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. |
[9] | T. -Y. Lin, P. Dolla $\acute{r}$, R. Girshick, K. He, B. Hariharan, and S. Belongie . Feature pyramid networks for object detection [C]. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125. |
[10] | J. Jeong, H. Park, N. Kwak . Enhancement of ssd by concatenating feature maps for object detection. 2017. |
[11] | K. He, X. Zhang, S. Ren, J. Sun . Deep residual learning for image recognition[C]. in: CVPR, 2016. |
[12] | W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang C. -C. Loy , et al. Deepid-net: Deformable deep convolutional neural networks for object detection[C] in: CVPR, 2015. |
[13] | W. Chu, D. Cai. Deep feature based contextual model for object detection[J]. in: Neurocomputing, 2018. |
[14] | Y. Zhu, R. Urtasun, R. Salakhutdinov, S. Fidler . segdeepm: Exploiting segmentation and context in deep neural networks for object detection[C]. in: CVPR, 2015. |
[15] | X. Chen, A. Gupta. Spatial memory for context reasoning in object detection[C]. in: ICCV, 2017. |
[16] | K. Hara, M.-Y. Liu, O. Tuzel, and A.-m Farahmand . Attentionalnetwork for visual object detection[J]. arXiv preprint arXiv:1702.01478, 2016. |
[17] | J. Li, Y. Wei, X. Liang, J. Dong, T. Xu, J. Feng, S. Yan . Attentive contexts for object detection[J]. IEEE Transactions on Multimedia, 19(5):944-954, 2017. |
[18] | K. He, X. Zhang, S. Ren, and J. Sun . Identity mappings in deep residual networks[J]. In European conference on computer vision, pages 630-645. Springer, 2016. |
[19] | X. Liu, T. Xia, J. Wang, Y. Lin . Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. CoRR, abs/1603.06765, 2016. |
[20] | Fu J, Zheng H, Mei T . Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition [C]//CVPR. 2017,2:3. |
[21] | T. -Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dolla ́r, and C. L. Zitnick . Microsoft coco: Common objects in context[J]. In European conference on computer vision, pages 740-755. Springer, 2014. |
[22] | S. Bell, C. Lawrence Zitnick, K. Bala, R. Girshick . Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2874-2883. |
[23] | T. Kong, A. Yao, Y. Chen, F. Sun . Hypernet: Towards accurate region proposal generation and joint object detection [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853. |
[24] | Wang H, Wang Q, Gao M , et al. Multi-scale location-aware kernel representation for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1248-1257. |
[25] | J. Long, E. Shelhamer, T. Darrell . Fully convolutional networks for semantic segmentation [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. |
[26] | T. -Y. Lin, P. Dolla $\acute{r}$, R. Girshick, K. He, B. Hariharan, and S. Belongie . Feature pyramid networks for object detection [C]. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125. |
[27] | J. Jeong, H. Park, N. Kwak . Enhancement of ssd by concatenating feature maps for object detection. 2017. |
[28] | S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, M. Hebert . An empirical study of context in object detection [C]. In CVPR 2009. IEEE Conference on, pages 1271-1278. IEEE, 2009. |
[29] | R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille . The role of context for object detection and semantic segmentation in the wild[J]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 891-898, 2014. |
[30] | R. Yu, X. Chen, V. I. Morariu, L. S. Davis . The role of context selection in object detection[J]. arXiv preprint arXiv:1609.02948, 2016. |
[31] | S. Gidaris and N. Komodakis . Object detection via a multi-region and semantic segmentation-aware cnn model[C]. In Proceedings of the IEEE International Conference on Computer Vision, pages 1134-1142, 2015. |
[32] | W. Ouyang, K. Wang, X. Zhu, X. Wang . Learning chained deep features and classifiers for cascade in object detection[J]. arXiv preprint arXiv:1702.07054, 2017. |
[33] | X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang , et al. Crafting gbd-net for object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 40(9):2109-2123,2018. |
[34] | Hu R., Xu H., Rohrbach M., Feng J., Saenko K., Darrell T. Natural language object retrieval[C]. In: CVPR. (2016). |
[35] | Mao J., Huang J., Toshev A., Camburu O., Yuille A.L., Murphy K. Generation and comprehension of unambiguous object descriptions[C]. In: CVPR. (2016). |
[36] | X. Chen and A. Gupta . Spatial memory for context reasoning in object detection[J]. arXiv preprint arXiv:1704.04224, 2017. |
[37] | X. Chen, L.-J. Li, L. Fei-Fei, A. Gupta . Iterative visual reasoning beyond convolutions[J]. arXiv preprint arXiv:1803.11189, 2018. |
[38] | Ji Y, Zhang H, Wu QMJ . Salient object detection via multi-scale attention CNN[J]. Neurocomputing 322:130-140, 2018. |
[39] | Zhang H, Ji Y, Huang W et al. Sitcom-star-based clothing retrieval for video advertising: a deep learning framework[J]. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x. 2018. |
[40] | Xu K, Ba J, Kiros R et al. Show, attend and tell: Neural image caption generation with visual attention[C]. In: International conference on machine learning, pp 2048-2057. 2015. |
[41] | Chen L, Zhang H, Xiao J et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659-5667,2017. |
[42] | Seo PH, Lin Z, Cohen S et al. Progressive attention net- works for visual attribute prediction[J]. arXiv preprint arXiv:1606.02393. 2016. |
[43] | Das D, George Lee CS . Sample-to-sample correspondence for unsupervised domain adaptation[J]. Eng Appl Artif Intell 73:80-91. 2018. |
[44] | Das D, George Lee CS. Unsupervised domain adaptation using regularized hyper-graph matching[C]. In: 2018 25th IEEE international conference on image processing (ICIP). |
[45] | Larochelle H, Hinton GE . Learning to combine foveal glimpses with a third-order Boltzmann machine[J]. In: Advances in neural information processing systems, pp 1243-1251, 2010. |
[46] | Hochreiter S, Schmidhuber J . Long short-term memory[J]. Neural Comput 9(8):1735-1780,1997. |
[47] | Kim JH, Lee SW, Kwak D et al. Multimodal residual learning for visual QA[J]. In: Advances in neural information pro-cessing systems, pp 361-369, 2016. |
[48] | Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation[C]. In: Proceedings of the IEEE interna- tional conference on computer vision, pp 1520-1528,2015. |
[49] | Srivastava RK, Greff K, Schmidhuber J . Training very deep networks[J]. In: Advances in neural information processing systems, pp 2377-2385,2015. |
[50] | Mnih V, Heess N, Graves A et al. Recurrent models of visual attention[C]. In: NIPS. 2014. |
[51] | Jaderberg M, Simonyan K, Zisserman A . Spatial transformer networks[J]. In: Advances in neural information processing systems, pp 2017-2025,2015. |
[52] | Xiao T, Xu Y, Yang K et al. The application of two-level attention models in deep convolutional neural network for fine- grained image classification[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842-850,2015. |
[53] | Zhang Y, Qiu Z, Yao T , et al. Fully convolutional adaptation networks for semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6810-6818. |
[54] | R. Yu, X. Chen, V. I. Morariu, L. S. Davis . The role of context selection in object detection[J]. arXiv preprint arXiv:1609.02948, 2016. |
[55] | S. Zagoruyko, A. Lerer, T.-Y. Lin, P. O. Pinheiro, S. Gross, S. Chintala, P. Dolla r . A multipath network for object detection[J]. arXiv preprint arXiv:1604.02135, 2016. |
[56] | X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang , et al. Crafting gbd-net for object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 40(9):2109-2123,2018. |
[57] | Radford A, Metz L, Chintala S . Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015. |
[58] | Brock A, Donahue J, Simonyan K . Large scale gan training for high fidelity natural image synjournal[J]. arXiv preprint arXiv:1809.11096, 2018. |
[59] | Li J, Liang X, Wei Y , et al. Perceptual generative adversarial networks for small object detection [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1222-1230. |
[60] | Wang X, Shrivastava A, Gupta A . A-fast-rcnn: Hard positive generation via adversary for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2606-2615. |
[61] | Law H, Deng J . Cornernet: Detecting objects as paired keypoints [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 734-750. |
[62] | Duan K, Bai S, Xie L , et al. Centernet: Keypoint triplets for object detection [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6569-6578. |
[1] | LIU Qiwei,LI Jun,GU Beibei,ZHAO Zefang. TSAIE: Text Sentiment Analysis Model Based on Image Enhancement [J]. Frontiers of Data and Computing, 2022, 4(3): 131-140. |
[2] | XIAO Nan,ZHOU Mingzhu,XING Jun,LUO Ze,LI Xiaohui. Authenticity Identification of Cigarettes Based on Attention Mechanism and High-resolution Network [J]. Frontiers of Data and Computing, 2021, 3(5): 118-129. |
[3] | CHEN Tao,AN Junxiu. Sentiment Classification of Microblog Short Text Based on Feature Fusion [J]. Frontiers of Data and Computing, 2020, 2(6): 21-29. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||