| [1] | 
																						 
											  Z. Cai and N. Vasconcelos . Cascade r-cnn: delving into high quality object detection [C]. in IEEE CVPR, 2018.
											 											 | 
										
																													
																						| [2] | 
																						 
											  K. He, G. Gkioxari, P. Dolla $\acute{r}$, and R. Girshick . Mask r-cnn [C]. in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980-2988.
											 											 | 
										
																													
																						| [3] | 
																						 
											  S. Ren, K. He, R. Girshick, J. Sun .  Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 1137-1149, 2017.
											 											 | 
										
																													
																						| [4] | 
																						 
											  W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. -Y. Fu, and A. C. Berg . Ssd: Single shot multibox detector[J]. in European conference on computer vision. Springer, 2016, pp. 21-37.
											 											 | 
										
																													
																						| [5] | 
																						 
											  J. Redmon and A. Farhadi . Yolo9000: better, faster, stronger [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
											 											 | 
										
																													
																						| [6] | 
																						 
											  T. Kong, A. Yao, Y. Chen, F. Sun . “Hypernet: Towards accurate region proposal generation and joint object detection [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853.
											 											 | 
										
																													
																						| [7] | 
																						 
											  W. Liu, A. Rabinovich, A. C. Berg . Parsenet: Looking wider to see better[J]. arXiv preprint arXiv:1506.04579, 2015.
											 											 | 
										
																													
																						| [8] | 
																						 
											  J. Long, E. Shelhamer, T. Darrell . Fully convolutional networks for semantic segmentation [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
											 											 | 
										
																													
																						| [9] | 
																						 
											  T. -Y. Lin, P. Dolla $\acute{r}$, R. Girshick, K. He, B. Hariharan, and S. Belongie . Feature pyramid networks for object detection [C]. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
											 											 | 
										
																													
																						| [10] | 
																						 
											  J. Jeong, H. Park, N. Kwak . Enhancement of ssd by concatenating feature maps for object detection. 2017.
											 											 | 
										
																													
																						| [11] | 
																						 
											  K. He, X. Zhang, S. Ren, J. Sun . Deep residual learning for image recognition[C]. in: CVPR, 2016.
											 											 | 
										
																													
																						| [12] | 
																						 
											  W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang C. -C. Loy , et al. Deepid-net: Deformable deep convolutional neural networks for object detection[C] in: CVPR, 2015.
											 											 | 
										
																													
																						| [13] | 
																						 
											  W. Chu, D. Cai.  Deep feature based contextual model for object detection[J]. in: Neurocomputing, 2018.
											 											 | 
										
																													
																						| [14] | 
																						 
											  Y. Zhu, R. Urtasun, R. Salakhutdinov, S. Fidler . segdeepm: Exploiting segmentation and context in deep neural networks for object detection[C]. in: CVPR, 2015.
											 											 | 
										
																													
																						| [15] | 
																						 
											  X. Chen, A. Gupta.  Spatial memory for context reasoning in object detection[C]. in: ICCV, 2017.
											 											 | 
										
																													
																						| [16] | 
																						 
											  K. Hara, M.-Y. Liu, O. Tuzel, and A.-m Farahmand . Attentionalnetwork for visual object detection[J]. arXiv preprint arXiv:1702.01478, 2016.
											 											 | 
										
																													
																						| [17] | 
																						 
											  J. Li, Y. Wei, X. Liang, J. Dong, T. Xu, J. Feng, S. Yan . Attentive contexts for object detection[J]. IEEE Transactions on Multimedia, 19(5):944-954, 2017.
											 											 | 
										
																													
																						| [18] | 
																						 
											  K. He, X. Zhang, S. Ren, and J. Sun . Identity mappings in deep residual networks[J]. In European conference on computer vision, pages 630-645. Springer, 2016.
											 											 | 
										
																													
																						| [19] | 
																						 
											  X. Liu, T. Xia, J. Wang, Y. Lin . Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. CoRR, abs/1603.06765, 2016.
											 											 | 
										
																													
																						| [20] | 
																						 
											  Fu J, Zheng H, Mei T . Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition [C]//CVPR. 2017,2:3.
											 											 | 
										
																													
																						| [21] | 
																						 
											  T. -Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dolla ́r, and C. L. Zitnick . Microsoft coco: Common objects in context[J]. In European conference on computer vision, pages 740-755. Springer, 2014.
											 											 | 
										
																													
																						| [22] | 
																						 
											  S. Bell, C. Lawrence Zitnick, K. Bala, R. Girshick . Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2874-2883.
											 											 | 
										
																													
																						| [23] | 
																						 
											  T. Kong, A. Yao, Y. Chen, F. Sun . Hypernet: Towards accurate region proposal generation and joint object detection [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 845-853.
											 											 | 
										
																													
																						| [24] | 
																						 
											  Wang H, Wang Q, Gao M , et al. Multi-scale location-aware kernel representation for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1248-1257.
											 											 | 
										
																													
																						| [25] | 
																						 
											  J. Long, E. Shelhamer, T. Darrell . Fully convolutional networks for semantic segmentation [C]. in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
											 											 | 
										
																													
																						| [26] | 
																						 
											  T. -Y. Lin, P. Dolla $\acute{r}$, R. Girshick, K. He, B. Hariharan, and S. Belongie . Feature pyramid networks for object detection [C]. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
											 											 | 
										
																													
																						| [27] | 
																						 
											  J. Jeong, H. Park, N. Kwak . Enhancement of ssd by concatenating feature maps for object detection. 2017.
											 											 | 
										
																													
																						| [28] | 
																						 
											  S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, M. Hebert . An empirical study of context in object detection [C]. In CVPR 2009. IEEE Conference on, pages 1271-1278. IEEE, 2009.
											 											 | 
										
																													
																						| [29] | 
																						 
											  R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille . The role of context for object detection and semantic segmentation in the wild[J]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 891-898, 2014.
											 											 | 
										
																													
																						| [30] | 
																						 
											  R. Yu, X. Chen, V. I. Morariu, L. S. Davis . The role of context selection in object detection[J]. arXiv preprint arXiv:1609.02948, 2016.
											 											 | 
										
																													
																						| [31] | 
																						 
											  S. Gidaris and N. Komodakis . Object detection via a multi-region and semantic segmentation-aware cnn model[C]. In Proceedings of the IEEE International Conference on Computer Vision, pages 1134-1142, 2015.
											 											 | 
										
																													
																						| [32] | 
																						 
											  W. Ouyang, K. Wang, X. Zhu, X. Wang . Learning chained deep features and classifiers for cascade in object detection[J]. arXiv preprint arXiv:1702.07054, 2017.
											 											 | 
										
																													
																						| [33] | 
																						 
											  X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang , et al. Crafting gbd-net for object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 40(9):2109-2123,2018.
											 											 | 
										
																													
																						| [34] | 
																						 
											  Hu R., Xu H., Rohrbach M., Feng J., Saenko K., Darrell T.  Natural language object retrieval[C]. In: CVPR. (2016).
											 											 | 
										
																													
																						| [35] | 
																						 
											  Mao J., Huang J., Toshev A., Camburu O., Yuille A.L., Murphy K.  Generation and comprehension of unambiguous object descriptions[C]. In: CVPR. (2016).
											 											 | 
										
																													
																						| [36] | 
																						 
											  X. Chen and A. Gupta . Spatial memory for context reasoning in object detection[J]. arXiv preprint arXiv:1704.04224, 2017.
											 											 | 
										
																													
																						| [37] | 
																						 
											  X. Chen, L.-J. Li, L. Fei-Fei, A. Gupta . Iterative visual reasoning beyond convolutions[J]. arXiv preprint arXiv:1803.11189, 2018.
											 											 | 
										
																													
																						| [38] | 
																						 
											  Ji Y, Zhang H, Wu QMJ  . Salient object detection via multi-scale attention CNN[J]. Neurocomputing 322:130-140, 2018.
											 											 | 
										
																													
																						| [39] | 
																						 
											  Zhang H, Ji Y, Huang W  et al. Sitcom-star-based clothing retrieval for video advertising: a deep learning framework[J]. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x. 2018. 
											 											 | 
										
																													
																						| [40] | 
																						 
											  Xu K, Ba J, Kiros R  et al. Show, attend and tell: Neural image caption generation with visual attention[C]. In: International conference on machine learning, pp 2048-2057. 2015.
											 											 | 
										
																													
																						| [41] | 
																						 
											  Chen L, Zhang H, Xiao J  et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659-5667,2017.
											 											 | 
										
																													
																						| [42] | 
																						 
											  Seo PH, Lin Z, Cohen S  et al. Progressive attention net- works for visual attribute prediction[J]. arXiv preprint arXiv:1606.02393. 2016.
											 											 | 
										
																													
																						| [43] | 
																						 
											  Das D, George Lee CS . Sample-to-sample correspondence for unsupervised domain adaptation[J]. Eng Appl Artif Intell 73:80-91. 2018.
											 											 | 
										
																													
																						| [44] | 
																						 
											  Das D, George Lee CS.  Unsupervised domain adaptation using regularized hyper-graph matching[C]. In: 2018 25th IEEE international conference on image processing (ICIP).
											 											 | 
										
																													
																						| [45] | 
																						 
											  Larochelle H, Hinton GE .  Learning to combine foveal glimpses with a third-order Boltzmann machine[J]. In: Advances in neural information processing systems, pp 1243-1251, 2010.
											 											 | 
										
																													
																						| [46] | 
																						 
											  Hochreiter S, Schmidhuber J . Long short-term memory[J]. Neural Comput 9(8):1735-1780,1997.
											 											 | 
										
																													
																						| [47] | 
																						 
											  Kim JH, Lee SW, Kwak D  et al. Multimodal residual learning for visual QA[J]. In: Advances in neural information pro-cessing systems, pp 361-369, 2016.
											 											 | 
										
																													
																						| [48] | 
																						 
											  Noh H, Hong S, Han B.  Learning deconvolution network for semantic segmentation[C]. In: Proceedings of the IEEE interna- tional conference on computer vision, pp 1520-1528,2015.
											 											 | 
										
																													
																						| [49] | 
																						 
											  Srivastava RK, Greff K, Schmidhuber J .  Training very deep networks[J]. In: Advances in neural information processing systems, pp 2377-2385,2015.
											 											 | 
										
																													
																						| [50] | 
																						 
											  Mnih V, Heess N, Graves A  et al. Recurrent models of visual attention[C]. In: NIPS. 2014.
											 											 | 
										
																													
																						| [51] | 
																						 
											  Jaderberg M, Simonyan K, Zisserman A .  Spatial transformer networks[J]. In: Advances in neural information processing systems, pp 2017-2025,2015.
											 											 | 
										
																													
																						| [52] | 
																						 
											  Xiao T, Xu Y, Yang K  et al. The application of two-level attention models in deep convolutional neural network for fine- grained image classification[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842-850,2015.
											 											 | 
										
																													
																						| [53] | 
																						 
											  Zhang Y, Qiu Z, Yao T , et al. Fully convolutional adaptation networks for semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6810-6818.
											 											 | 
										
																													
																						| [54] | 
																						 
											  R. Yu, X. Chen, V. I. Morariu, L. S. Davis . The role of context selection in object detection[J]. arXiv preprint arXiv:1609.02948, 2016.
											 											 | 
										
																													
																						| [55] | 
																						 
											  S. Zagoruyko, A. Lerer, T.-Y. Lin, P. O. Pinheiro, S. Gross, S. Chintala, P. Dolla r . A multipath network for object detection[J]. arXiv preprint arXiv:1604.02135, 2016.
											 											 | 
										
																													
																						| [56] | 
																						 
											  X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou, B. Yang, Z. Wang , et al. Crafting gbd-net for object detection[J]. IEEE transactions on pattern analysis and machine intelligence, 40(9):2109-2123,2018.
											 											 | 
										
																													
																						| [57] | 
																						 
											  Radford A, Metz L, Chintala S . Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015.
											 											 | 
										
																													
																						| [58] | 
																						 
											  Brock A, Donahue J, Simonyan K . Large scale gan training for high fidelity natural image synjournal[J]. arXiv preprint arXiv:1809.11096, 2018.
											 											 | 
										
																													
																						| [59] | 
																						 
											  Li J, Liang X, Wei Y , et al. Perceptual generative adversarial networks for small object detection [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1222-1230.
											 											 | 
										
																													
																						| [60] | 
																						 
											  Wang X, Shrivastava A, Gupta A . A-fast-rcnn: Hard positive generation via adversary for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2606-2615.
											 											 | 
										
																													
																						| [61] | 
																						 
											  Law H, Deng J . Cornernet: Detecting objects as paired keypoints [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 734-750.
											 											 | 
										
																													
																						| [62] | 
																						 
											  Duan K, Bai S, Xie L , et al. Centernet: Keypoint triplets for object detection [C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6569-6578.
											 											 |