改进的轻量级YOLOv5算法在行人检测的应用

doi:10.11871/jfdc.issn.2096-742X.2023.06.015

数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (6): 161-172.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.06.015

doi: 10.11871/jfdc.issn.2096-742X.2023.06.015

改进的轻量级YOLOv5算法在行人检测的应用

王子元(),王国中^*()

上海工程技术大学，电子电气工程学院，上海 201620

收稿日期:2022-07-11 出版日期:2023-12-20 发布日期:2023-12-25
通讯作者: *王国中（E-mail: wanggz@sues.edu.cn）
作者简介:王子元，上海工程技术大学，电子电气工程学院控制工程专业，硕士研究生，主要研究方向为计算机视觉、深度学习。
本文中负责论文初稿撰写与实验论证。
WANG Ziyuan is a master's student of control engineering at the School of Electrical and Electronic Engineering, Shanghai University of Engineering Science. His main research interests are computer vision and deep learning.
In this paper, he is responsible for the writing of the draft of the paper and the experimental demonstration.
E-mail: wangziyuansues@qq.com|王国中，上海工程技术大学，电子电气工程学院，教授，博士，主要研究方向为视频编解码、图像处理、机器学习。
本文中负责制定论文框架，提出修改意见。
WANG Guozhong. Ph.D., is a professor in the School of Electrical and Electronic Engineering, Shanghai University of Engineering Science. His main research interests are video encoding and decoding, image processing, and machine learning.
In this paper, he is responsible for formulating the framework and making suggestions for revision.
E-mail: wanggz@sues.edu.cn
基金资助:
国家重点研发计划“宽带通信和新型网络”(2019YFB1802702)

Application of Improved Lightweight YOLOv5 Algorithm in Pedestrian Detection

WANG Ziyuan(),WANG Guozhong^*()

School of Electrical and Electronic Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

Received:2022-07-11 Online:2023-12-20 Published:2023-12-25

摘要/Abstract

摘要：

【目的】 目前，行人检测算法存在模型复杂、检测精度较低、检测速度慢的问题。为了解决这些问题，将YOLOv5算法进行了改进，能够更好地应用于行人检测。【方法】 首先使用深度可分离卷积替换YOLOv5算法骨干网络中的普通卷积，降低了模型的计算量和参数量，提高模型的检测效率；然后在骨干网络的特征融合部分添加通道注意力和空间注意力机制，让网络关注于图像中行人的位置信息和通道信息；最后使用EIOU损失函数优化训练模型，并使用K-means++聚类算法来生成先验框。【结果】 将改进后的模型在INRIA行人检测数据集上与其他算法进行了对比实验。结果表明，改进后的模型精确度达到89%，相比于原模型提高了7.6%，检测速度达到每秒106帧。【结论】 本文改进算法提高了行人检测的速度和精度，且模型数据量小，易于实时检测和部署。

关键词: 行人检测, 深度学习, YOLOv5, 深度可分离卷积, 注意力机制

Abstract:

[Objective] In this paper, we propose an improved YOLOv5 algorithm to address the problems of the high computational complexity of pedestrian detection algorithms, low detection accuracy, and slow detection speed, which can be better applied to pedestrian detection. [Methods] Firstly, the vanilla convolution in the YOLOv5 backbone network is replaced by the depthwise separable convolution, which reduces the number of calculations and parameters while improving detection accuracy. Then, channel attention and spatial attention are incorporated into the feature fusion part of the backbone network, which can force our network to focus on the location and channel information of pedestrians in the image. Finally, the EIOU loss function is used to optimize the proposed model, and the K-means++ clustering algorithm is used to generate priori boxes. [Results] The results show our proposed model can achieve a detection accuracy of 89%, which is 7.6% higher than the original backbone, and the detection speed reaches 106 frames per second when using the INRIA pedestrian detection dataset. [Conclusions] Our proposed method significantly improves the speed and accuracy of pedestrian detection, has also small parameters and is easier to detect and deploy in real-time.

Key words: pedestrian detection, deep learning, YOLOv5, deep separable convolution, attention mechanism

王子元, 王国中. 改进的轻量级YOLOv5算法在行人检测的应用[J]. 数据与计算发展前沿, 2023, 5(6): 161-172.

WANG Ziyuan, WANG Guozhong. Application of Improved Lightweight YOLOv5 Algorithm in Pedestrian Detection[J]. Frontiers of Data and Computing, 2023, 5(6): 161-172, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2023.06.015.

图/表 21

表1

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

图12

表2

表3

图13

表4

表5

图14

图15

表6

参考文献 23

[1]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Ieee, 2005, 1: 886-893.
[2]	DOLLÁR P, TU Z, PERONA P, et al. Integral channel features[J]. Proceedings of the British Machine Conference, 2009, 91: 1-11.
[3]	LIU W, ANGUELOV D, ERHAN D, et al. Ssd: Single shot multibox detector[C]// European conference on computer vision, Springer, Cham, 2016: 21-37.
[4]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 779-788.
[5]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.
[6]	ZHAO Y, SHI Y, WANG Z. The Improved YOLOV5 Algorithm and Its Application in Small Target Detection[C]// International Conference on Intelligent Robotics and Applications, Springer, Cham, 2022: 679-688.
[7]	WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]// Proceedings of the European conference on computer vision (ECCV), 2018: 3-19.
[8]	ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. doi: 10.1016/j.neucom.2022.07.042
[9]	VASSILVITSKII S, ARTHUR D. k-means++: The advantages of careful seeding[C]// Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2006: 1027-1035.
[10]	KIRILLOV A, GIRSHICK R, HE K, et al. Panoptic feature pyramid networks[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 6399-6408.
[11]	REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019: 658-666.
[12]	CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 1251-1258.
[13]	ZHU X, CHENG D, ZHANG Z, et al. An empirical study of spatial attention mechanisms in deep networks[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2019: 6688-6697.
[14]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization[C]// Proceedings of the IEEE international conference on computer vision, 2017: 618-626.
[15]	YU J, JIANG Y, WANG Z, et al. Unitbox: An advanced object detection network[C]// Proceedings of the 24th ACM international conference on Multimedia, 2016: 516-520.
[16]	AGARWAL N, GOEL S, ZHANG C. Acceleration via fractal learning rate schedules[C]// International Conference on Machine Learning, PMLR, 2021: 87-99.
[17]	TANG S, GOTO S. Histogram of template for human detection[C]// 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2010: 2186-2189.
[18]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(06): 1137-1149.
[19]	LI J, LIANG X, SHEN S M, et al. Scale-aware fast R-CNN for pedestrian detection[J]. IEEE transactions on Multimedia, 2017, 20(4): 985-996.
[20]	GIRSHICK R. Fast r-cnn[C]// Proceedings of the IEEE international conference on computer vision, 2015: 1440-1448.
[21]	WANG X, XIAO T, JIANG Y, et al. Repulsion loss: Detecting pedestrians in a crowd[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7774-7783.
[22]	LIU W, LIAO S, HU W. Efficient Single-Stage Pedestrian Detector by Asymptotic Localization Fitting and Multi-Scale Context Encoding[J]. IEEE Transactions on Image Processing, 2019, 29(99): 1413-1425. doi: 10.1109/TIP.83
[23]	LIU W, HASAN I, LIAO S. Center and Scale Prediction: Anchor-free Approach for Pedestrian and Face Detection[J]. arXiv preprint arXiv:1904.02948, 2019.

Algorithms	Input size	Test dataset	Speed/(frame s-1)	mAP(0.5)/%
Fast RCNN	600×1000	VOC2007	0.5	70.2
Faster RCNN	600×1000	VOC2007	5	73.5
SSD	512×512	VOC2007	18	76.6
YOLOv1	448×448	VOC2007	46	62.8
YOLOv2	544×544	MS COCO	40	43.6
YOLOv3	608×608	MS COCO	24	58.7
YOLOv4	608×608	MS COCO	58	66.9
YOLOv5s	608×608	MS COCO	69	57.8

环境名称	相关配置
操作系统	Ubuntu18.04
中央处理器（CPU）	AMD Ryzen 7 5800H 3.20 GHz
运行内存RAM/GB	16
GPU	NVIDIA GeForce RTX 3060
编程环境	Python 3.7
深度学习框架	PyTorch
CUDA	11.3
CUDNN	8.2.1

Type	Positive	Negative
True	TP	FN
False	FP	TN

Model	Para(M)	FLOPs(B)	MB
YOLOv5s	7.2	16.8	14.8
YOLOv5s-D	3.8	7.3	7.8
YOLOv5s-DA	3.8	7.5	8.0

Model	Precision/%	Recall/%	AP(0.5)/%	FPS
Faster RCNN	76.4	70.8	73.4	16.2
SAF-RCNN	78.2	72.8	76.2	18.3
RepLoss	82.0	73.9	79.1	10.7
SSD	86.3	76.0	77.8	60.1
ALFNet	87.5	70.6	78.3	46.0
CSP	89.0	75.8	79.5	58.4
YOLOv5s	88.8	80.2	81.6	83.2
YOLOv5s-D	87.6	76.7	80.7	90.7
YOLOv5s-DA	89.2	75.2	87.3	100.3
YOLOv5s-DAE	91.5	72.1	89.2	106.7

改进的轻量级YOLOv5算法在行人检测的应用

Application of Improved Lightweight YOLOv5 Algorithm in Pedestrian Detection

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 21

参考文献 23

相关文章 15

编辑推荐

Metrics

本文评价

[1]	申志豪, 李娜, 尹世豪, 杜一, 胡良霖. 基于TPA-Transformer的机票价格预测[J]. 数据与计算发展前沿, 2023, 5(6): 115-125.
[2]	张蓉, 刘渊. 适用于方面级情感分析的多级数据增强方法[J]. 数据与计算发展前沿, 2023, 5(5): 140-153.
[3]	巨家骥, 黄勃, 张帅, 郭茹燕. 融合情感词典和自注意力的双通道情感分析模型[J]. 数据与计算发展前沿, 2023, 5(4): 101-111.
[4]	李俊飞, 徐黎明, 汪洋, 魏鑫. 基于深度学习技术的科技文献引文分类研究综述[J]. 数据与计算发展前沿, 2023, 5(4): 86-100.
[5]	张晓帆, 孙海春, 李欣. 融合多层注意力机制与BiLSTM的知识图谱补全算法研究[J]. 数据与计算发展前沿, 2023, 5(3): 123-137.
[6]	李妍,何洪波,王闰强. 微博热度预测研究综述[J]. 数据与计算发展前沿, 2023, 5(2): 119-135.
[7]	刘云帆,李琦,孙哲南,谭铁牛. 基于生成对抗网络的人脸年龄编辑方法综述[J]. 数据与计算发展前沿, 2023, 5(2): 2-23.
[8]	涂又友,郑奇靖,赵瑾. 基于深度学习方法研究分子/固体界面量子化质子耦合的电荷转移过程[J]. 数据与计算发展前沿, 2023, 5(2): 37-49.
[9]	许淞源,刘峰. ESDRec：一种面向地球大数据平台的数据推荐模型[J]. 数据与计算发展前沿, 2023, 5(1): 55-64.
[10]	刘琦玮,李俊,顾蓓蓓,赵泽方. TSAIE：图像增强文本的多模态情感分析模型[J]. 数据与计算发展前沿, 2022, 4(3): 131-140.
[11]	陈琼,杨咏,黄天林,冯媛. 小样本图像语义分割综述[J]. 数据与计算发展前沿, 2021, 3(6): 17-34.
[12]	蒲晓蓉,黄佳欣,刘军池,孙家瑜,罗纪翔,赵越,陈柯成,任亚洲. 面向临床需求的CT图像降噪综述[J]. 数据与计算发展前沿, 2021, 3(6): 35-49.
[13]	何涛,王桂芳,马廷灿. 基于词嵌入语义异常的跨学科研究内容发现方法[J]. 数据与计算发展前沿, 2021, 3(6): 50-59.
[14]	肖楠,周明珠,邢军,罗泽,李晓辉. 基于高分辨率网络和注意力机制的真伪卷烟包装鉴别[J]. 数据与计算发展前沿, 2021, 3(5): 118-129.
[15]	张怡宁,何洪波,王闰强. 热门数字音频预测技术综述[J]. 数据与计算发展前沿, 2021, 3(4): 81-92.

像素	小尺寸(<40)		中尺寸(<120)				大尺寸(>120)
高度	29	34	52	62	89	103		112	137	176
宽度	8	12	17	23	40	48		100	96	120