Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (2): 177-193.
CSTR: 32002.14.jfdc.CN10-1649/TP.2024.02.016
doi: 10.11871/jfdc.issn.2096-742X.2024.02.016
• Technology and Application • Previous Articles
LI Lin(),WANG Jiahua,ZHOU Chenyang,KONG Siman,SUN Jianzhi*(
)
Received:
2023-07-10
Online:
2024-04-20
Published:
2024-04-26
LI Lin, WANG Jiahua, ZHOU Chenyang, KONG Siman, SUN Jianzhi. An Overview of Object Detection Datasets[J]. Frontiers of Data and Computing, 2024, 6(2): 177-193, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2024.02.016.
Table 1
Object detection common dataset"
数据集 | 发布年份 | 图像数量 | 类别数量 | 特点 |
---|---|---|---|---|
VOC2007/2012 | 2007/2012 | 9,963/11,540 | 20/20 | 大小、方向、姿势、光照、位置、遮挡等方面具有较大变化 |
MS COCO | 2014 | 328,000 | 91 | 小目标较多、单幅图片目标多,大多数类别物体对应较多的实例 |
Caltech 101/256 | 2006/2007 | 9,146/30,607 | 101/256 | 大多数图像仅包含一个物体,图像变化较少 |
LabelMe | 2008 | 187,240 | 1,000+ | 提供了在线注释工具,更具开放性 |
ImageNet | 2009 | 14M | 21,841 | 数据集规模大、图像类别较多,具有很大挑战性 |
SUN | 2010 | 131,072 | 908 | 具有丰富的场景分类,对图像上的场景和对象提供了完整注释 |
YFCC100M | 2014 | 99.2M图像+800K视频 | — | 包含图像和视频的多模态数据集,数据集规模十分庞大 |
VQAv1/v2 | 2015/2017 | 254,721/286,046 | — | 在MS COCO的基础上形成的以视觉问答为主的图像数据集 |
Visual Genome | 2016 | 108,249 | 76,340 | 多模态数据集,融合了区域描述、关系、问题答案对等多种模态信息 |
Open ImagesV4 | 2018 | 1.9M | 600 | 手工注释,图像多显示为多目标的复杂场景 |
Table 2
Pedestrian detection dataset"
数据集 | 发布年份 | 图像 数量 | 实例 数量 | 分辨率 | 标注 | 城市 数量 | 行人 密度 | 数据来源 | 挑战性 |
---|---|---|---|---|---|---|---|---|---|
Daimler-DB[ | 2008 | 52910 | 60,407 | 640×480 | 边界框 | 1 | 低 | 车载摄像机 | 遮挡、尺度变化较大 |
Caltech Pedestrian[ | 2009 | 250000 | 350,000 | 640×480 | 边界框、时间对应关系 | 1 | 低 | 车载摄像机 | 遮挡、多尺度 |
TDC[ | 2016 | 14674 | 32,361 | 2048×1024 | 边界框、行人属性 | 1 | 低 | 车载摄像机 | 关注骑行者、遮挡实例非常多、 存在极端遮挡情况 |
CityPersons[ | 2017 | 5000 | 35,016 | 2048×1024 | 边界框、行人属性 | 27 | 中等 | 监控视频、互联网采集 | 季节、场景多样性,遮挡、运动模糊、行人密集 |
EuroCity[ | 2018 | 47337 | 238,285 | 1920×1024 | 边界框、行人属性 | 31 | 中等 | 车载摄像机 | 季节、城市场景多样,光照变化剧烈、运动模糊、行人密集 |
KAIST[ | 2018 | 95328对 | 103,128 | 不定 | 边界框 | 1 | 低 | 监控视频、车载摄像机 | 多模态、场景丰富、光照变化大、 形态多样性 |
CrowdHuman[ | 2018 | 24000 | 470,000 | 不定 | 身体和头部边界框、行人属性 | 40+ | 高 | 互联网采集 | 丰富的场景、遮挡、极大的人群 密度、形态多样性 |
Tiny Person[ | 2019 | 1610 | 72,651 | 不定 | 边界框 | — | 高 | 互联网采集 | 远距离、大量背景干扰、关注微小 目标、极大的人群密度 |
Table 3
Face detection dataset"
数据集 | 发布 年份 | 图像数量 | 实例 数量 | 标注 | 清洗/ 检查 | 挑战性 |
---|---|---|---|---|---|---|
AFLW[ | 2011 | 21,997 | 25,993 | 边界框、关键点 | 手动/人工 | 面部姿势变化较大,光照和遮挡等因素影响较大 |
MALF[ | 2014 | 5,250 | 11,931 | 边界框、性别、变形幅度、遮挡、佩戴眼镜、表情夸张、关键点 | 手动/人工 | 姿势变化、复杂背景 |
CelebA[ | 2015 | 202,599 | 202,599 | 边界框、40个人脸属性、关键点 | 手动/人工 | 姿势变化、背景混乱 |
Wider Face[ | 2015 | 32,203 | 393,703 | 边界框、遮挡、姿势、事件类别 | 手动/人工 | 主动选择的姿势、遮挡、表情等变化较大的人脸图像 |
IIIT-CFW[ | 2016 | 8,927 | — | 边界框、类型、种族、性别、姿势、表情、外表、年龄、胡子、眼镜 | 手动/人工 | 卡通人脸识别、数据少、高度漫画化以及其它人脸检测挑战 |
MegaFace[ | 2016 | 1.02M | — | 无 | 自动/无 | 存在大量遮挡、未经标注 |
MS1M[ | 2016 | 10M | — | 无 | - | 规模空前庞大,未经清洗 |
UFDD[ | 2017 | 6,425 | 10,897 | 边界框、天气、镜头障碍、模糊、光照变化、遮挡、关键点 | 手动/人工 | 关注雨、雪、霾等天气变化以及其它退化条件变化,包含大量干扰图像 |
Wildest Faces[ | 2017 | 67,889 | 109,771 | 边界框、尺度、年龄、遮挡、姿势 | 手动/人工 | 暴力场景带来更狂野的表情,同时带来姿势巨大变化、遮挡和模糊的挑战 |
IJB-C[ | 2018 | 31,334图像+11,779视频 | 89,642 | 边界框、性别、肤色、遮挡、粗姿势信息、关键点 | 手动/人工 | 多模态,光照变化剧烈、遮挡、低分辨率 |
VGG-Face2[ | 2018 | 3.31M | — | 边界框、姿势、年龄、关键点 | 半自动/人工 | 人脸多样性、标注质量低 |
MS1MV2[ | 2019 | 5.8M | — | 无 | 自动/无 | 标注质量不高、包含人脸检测中的广泛挑战 |
WebFace42M[ | 2021 | 42M | — | 边界框、姿势、年龄、种族、性别、帽子、眼镜和面具 | 自动/无 | 最大规模的公共人脸检测数据集,关注年龄、地区、背景等多层次的多样性 |
Table 4
Traffic sign dataset"
数据集 | 发布年份 | 图像数量 | 实例数量 | 类别 数量 | 分辨率 | 标注 | 场景 | 数据来源 | 挑战性 |
---|---|---|---|---|---|---|---|---|---|
GTSDB[ | 2011 | 900 | 1,206 | 4 | 1360×800 | 类型、边界框 | 白天、黄昏、城市、农村、高速公路 | 德国高速公路 | 光照变化、遮挡、模糊、阴影、雾霾、阴天、雨天 |
STS[ | 2011 | 4000 | 3,488 | 7 | 1280×960 | 类型、可见状态、道路状态、边界框 | 白天、城市、高速公路 | 瑞典道路 | 小目标、光照、遮挡、阴影、模糊、阴天、雨天 |
LISA[ | 2012 | 6,610图像, 17视频 | 7,855 | 49 | 640×480~1024×522 | 类型、遮挡、道路状态、边界框 | 白天、夜晚、城市、农村、高速公路 | 美国加州道路 | 多模态、小目标、背景复杂、光照、遮挡、阴影、模糊、镜头污垢、阴天 |
BelgiumTSC[ | 2013 | 7,356 | 11,219 | 62 | 640×480 | 类型、边界框、3D位置 | 白天、高速公路 | 比利时公路 | 多视角、光照、遮挡、变形、小目标 |
TT100K[ | 2016 | 100,000 | 30,000 | 128 | 2048×2048 | 类型、边界框、边界顶点 | 白天、夜晚、市中心、郊区 | 腾讯地图记录的中国5个城市街景数据 | 光照、天气、尺度的巨大变化、遮挡、小目标 |
CCTSDB[ | 2017 | 10,000 | 13,361 | 3 | 1000×350 | 类型、边界框 | 白天、城市、高速公路 | 中国道路 | 小目标、背景复杂、光照、遮挡、阴影、雨天、阴天、镜头污垢、雾霾、模糊 |
CURE-TSD[ | 2019 | 1,719,900图像,5,733视频 | 2,206,106 | 14 | 1628×1236 | 类型、边界框、挑战类型和等级 | 多种场景 | 比利时真实数据和合成虚拟数据 | 多模态、阴天、雨天、雪天、阴影、雾霾、光照、模糊、镜头污垢、遮挡 |
Table 5
Traffic light dataset"
数据集 | 发布年份 | 图像数量 | 实例数量 | 类别 | 分辨率 | 标注 | 数据来源 | 挑战性 |
---|---|---|---|---|---|---|---|---|
LaRA[ | 2013 | 11,179 | 9,168 | 4 | 640×480 | 边界框、类别 | 法国巴黎,车载相机拍摄 | 场景和天气条件具有多样性、存在遮挡和模糊情况 |
LISA[ | 2016 | 43,007图像+18视频 | 113,888 | 7 | 1280×960 | 边界框、类别、 状态 | 美国加州,车载相机拍摄 | 不同的光照条件、天气条件多样、场景丰富 |
DTLD[ | 2016 | 3,366图像+188视频 | 230,000 | 344 | 2048×1024 | 边界框、类别、 方向 | 德国11个城市,车载相机拍摄 | 规模大、交通灯分类多、小目标非常多、天气条件多样 |
BSTLD[ | 2017 | 5,093/8,334 | 10,756/13,493 | 15/4 | 1280×720 | 边界框、类别、 状态 | 美国加州,车载相机拍摄 | 阴影、光照、小目标 |
BDD100K[ | 2020 | 100K视频,100K图像 | 265,906 | 4 | 1280×720 | 边界框、类别、天气、场景、时间 | 众包方式获得,由全球各地司机上传 | 规模大、光照变化、具有雨雪等不同的恶劣天气条件、场景多样 |
Table 6
Other traffic road scene datasets"
数据集 | 发布时间 | 图片数量 | 分辨率 | 标注 | 应用场景 | 数据来源 | 特点 |
---|---|---|---|---|---|---|---|
Lost and Found[ | 2016 | 21,000图像+112视频 | 1920×1080 | 道路障碍物类别、ID | 障碍物检测 | 街道、车载相机 | 包含多种街道场景和障碍物类型,目标存在大小、距离、颜色和材料等变化,具有不规则的道路轮廓、较远的目标距离、不同的道路表面外观、强烈的光照变化等挑战 |
CULane[ | 2017 | 133,235 | 1640×590 | 车道标注、上下文注释 | 车道检测 | 北京,车载相机 | 包含人流量大、光线暗、无车线道路、阴影、眩光、弯道、岔道、车道遮挡严重、道路状况复杂等挑战 |
TuSimple lane[ | 2018 | 6,408 | 1280×720 | 车道标注 | 车道检测 | 北京,车载相机 | 车道线以点来标注,存在眩光、遮挡等挑战 |
LLAMAS[ | 2019 | 100,042 | 1280×717 | 2D、3D虚线车道标记的像素级注释 | 车道检测 | 高速公路,车载相机 | 基于高精度地图实现的无监督车道标记数据集,数据集进行了自动标注、人工检查 |
Table 7
Aerial remote sensing dataset"
数据集 | 发布 年份 | 图像数量 | 实例 数量 | 平均 实例 | 类别 | 最大分辨率 | 标注 | 图像类型 | 应用 | 挑战性 |
---|---|---|---|---|---|---|---|---|---|---|
DLR 3k Munich[ | 2015 | 20 | 14,235 | 711 | 2 | 5616×3744 | 定向边界框、类别 | 遥感卫星 | 车辆检测 | 小目标多、背景复杂、包含大量负样本 |
VEDAI[ | 2016 | 1,210 | 3,640 | 3 | 9 | 1024×1024 | 定向边界框、类别 | 空中俯瞰 | 多类车辆检测 | 包含彩色和红外两种模态 |
COWC[ | 2016 | 53 | 32,716 | 617 | 9 | 2048×2048 | 目标中心点、类别 | 遥感卫星 | 多类车辆检测 | 包含大量负样本、标注粗略、分类不准确 |
DOTAv1.0[ | 2018 | 2,806 | 188,282 | 67 | 15 | 4000×4000 | 定向边界框、类别 | 遥感卫星 | 多类别检测 | 图像来自多个传感器和平台,多目标和小目标是主要挑战 |
xView[ | 2018 | 1,128 | 1M | 886 | 60 | 3000×3000 | 水平边界框、类别 | 遥感卫星 | 多类别检测 | 具有多样的复杂场景,小目标实例多 |
VisDrone[ | 2018 | 288视频(261,908帧和10,209图像) | 2.6M | 9 | 10 | 2000×1500 | 水平边界框、类别、遮挡、截断率 | 空中俯瞰 | 多类别检测 | 多个城市采集的无人机图像,涵盖了各种天气和光照条件,包含多种生活场景 |
DIOR[ | 2019 | 23,463 | 192,472 | 8 | 20 | 800×800 | 水平边界框、类别 | 遥感卫星 | 多类别检测 | 天气、季节、成像条件、尺度、场景、遮挡等具有很大变化 |
DOTAv2.0[ | 2021 | 11,268 | 1.8M | 159 | 18 | 29200×27620 | 定向边界框、类别 | 遥感卫星 | 多类别检测 | 来自各种图像源的多样空中场景,小目标多且密集、方向任意 |
Table 8
Text detection dataset"
数据集 | 发布年份 | 图像 数量 | 实例 数量 | 标注 | 语言 种类 | 文本 方向 | 数据来源 | 挑战性 |
---|---|---|---|---|---|---|---|---|
SVT[ | 2009 | 350 | 725 | 单词边界框 | 英文 | 水平 | 谷歌街景图像 | 低分辨率、模糊、变形、复杂背景、光照不足 |
MSRA-TD500[ | 2012 | 500 | — | 文本行边界框 | 中文、英文 | 多方向 | 相机拍摄 | 长文本图像、文本密度较高,背景复杂 |
IIIT 5k-word[ | 2012 | 1,120 | 5,000 | 单词边界框、难度标注 | 英文 | 水平 | 谷歌图像搜索 | 包含场景文本和原生数字图像、存在光照变化、投影失真等挑战 |
ICDAR 2015[ | 2015 | 1,500 | 11,886 | 单词边界框 | 英文 | 水平 | 可穿戴设备拍摄 | 大多数文本都没有焦点、图像存在严重失真或模糊 |
ICDAR 2017[ | 2017 | 18,000 | — | 单词边界框 | 多语言 | 水平 | 手机拍摄及屏幕截图 | 自然场景,尺度、旋转、透视变换等方面更具挑战性 |
COCO Text[ | 2017 | 63,686 | 173,589 | 单词边界框、位置、易读性、类别、文本脚本 | 多语言 | 多方向 | MS COCO数据集 | 规模大,包含复杂日常场景的图像 |
Total-Text[ | 2017 | 1,555 | 9,330 | 单词多边形边 界框 | 英文 | 多方向 | 相机拍摄 | 关注曲线文本,近半曲线文本实例,存在遮挡、复杂背景挑战 |
MTWI[ | 2018 | 20,000 | — | 文本行边界框、文本转录 | 多语言 | 多方向 | 淘宝网页 | 关注网络文本,包含复杂布局、水印、小文本、紧密堆叠文本和复杂形状文本等挑战 |
CTW[ | 2019 | 32,285 | 1,018,402 | 文本行边界框、单词边界框、类别、遮挡、复杂背景、扭曲、凸起、艺术字、手写 | 中文 | 多方向 | TT100K、腾讯街景 | 来自中国几十个城市的场景文本图像、场景多样性丰富,具有遮挡、背景复杂、文字扭曲、文字凸起、艺术字、手写字等挑战 |
ICDAR2019-MLT[ | 2019 | 20,000 | 80,000 | 单词边界框 | 多语言 | 多方向 | 手机拍摄及互联网收集 | 自然场景、多语言 |
[1] |
ZOU Z X, SHI Z W, GUO Y H, et al. Object Detection in 20 Years: A Survey[J]. Proceedings of the IEEE, 2019, 111(3):257-276.
doi: 10.1109/JPROC.2023.3238524 |
[2] | PANDE B, PADAMWAR K, BHATTACHARYA S, et al. A Review of Image Annotation Tools for Object Detection[C]. 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 2022: 976-982. |
[3] |
AGRAWAL A, JIASEN L, STANISLAW A, et al. VQA: Visual Question Answering[J]. International Journal of Computer Vision, 2017, 123(1): 4-31.
doi: 10.1007/s11263-016-0966-6 |
[4] |
GOYAL Y, TEJAS K, DOUGLAS S, et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering[J]. International Journal of Computer Vision, 2016, 127(4): 398-414.
doi: 10.1007/s11263-018-1116-0 |
[5] |
EVERINGHAM M, GOOL L, WINN J, et al. The Pascal Visual Object Classes Challenge: A Retrospective[J]. International Journal of Computer Vision, 2014, 111(1): 98-136.
doi: 10.1007/s11263-014-0733-5 |
[6] | LIN T Y, MAIRE M, HAYS J, et al. Microsoft COCO: Common Objects in Context[C]. European Conference on Computer Vision (ECCV), 2014: 740-755. |
[7] |
LI F F, FERGUS R, PERONA P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611.
doi: 10.1109/TPAMI.2006.79 |
[8] | GRIFFIN G, HOLUB A, PERONA P. Caltech-256 Object Category Dataset[R]. California Institute of Technology, 2007. |
[9] |
RUSSELL B C, TORRALBA A, KEVIN P, et al. LabelMe: A Database and Web-Based Tool for Image Annotation[J]. International Journal of Computer Vision, 2007, 77(1-3): 157-173.
doi: 10.1007/s11263-007-0090-8 |
[10] | DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 248-255. |
[11] |
XIAO J X, EHINGER K, HAYS J, et al. SUN Database: Exploring a Large Collection of Scene Categories[J]. International Journal of Computer Vision, 2014, 119(1): 3-22.
doi: 10.1007/s11263-014-0748-y |
[12] |
KRISHNA R, ZHU Y, GROTH O, et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J]. International Journal of Computer Vision, 2016, 123(1): 32-73.
doi: 10.1007/s11263-016-0981-7 |
[13] | THOMEE B, SHAMMA D, FRIEDLAND G, et al. YFCC100M: The New Data in Multimedia Research[J]. Communications of the ACM, 2016, 59(2): 64-73. |
[14] |
KUZNETSOVA A, ROM H, ALLDRIN N, et al. The Open Images Dataset V4[J]. International Journal of Computer Vision, 2020, 128: 1956-1981.
doi: 10.1007/s11263-020-01316-z |
[15] | 罗艳, 张重阳, 田永鸿, 等. 深度学习行人检测方法综述[J]. 中国图象图形学报, 2022, 27(7): 2094-2111. |
[16] | LUO Y, ZHANG C, ZHAO M, et al. Where, What, Whether: Multi-modal Learning Meets Pedestrian Detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2020. |
[17] | 李晓艳, 符惠桐, 牛文涛, 等. 基于深度学习的多模态行人检测算法[J]. 西安交通大学学报, 2022, 56(10): 61-70. |
[18] |
ENZWEILER M, GAVRILA D. Monocular Pedestrian Detection: Survey and Experiments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(12): 2179-2195.
doi: 10.1109/TPAMI.2008.260 pmid: 19834140 |
[19] |
DOLLÁR P, WOJEK C, SCHIELE B, et al. Pedestrian Detection: An Evaluation of the State of the Art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743-761.
doi: 10.1109/TPAMI.2011.155 pmid: 21808091 |
[20] | LI X F, FLOHR F, YUE Y, et al. A new benchmark for vision-based cyclist detection[C]. 2016 IEEE Intelligent Vehicles Symposium, 2016: 1028-1033. |
[21] | CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213-3223. |
[22] |
BRAUN M, KREBS S, FLOHR F, et al. The EuroCity Persons Dataset: A Novel Benchmark for Object Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1844-1861.
doi: 10.1109/TPAMI.34 |
[23] |
CHOI Y, KIM N, HWANG S, et al. KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(3): 934-948.
doi: 10.1109/TITS.2018.2791533 |
[24] | SHAO S, ZHAO Z, LI B, et al. CrowdHuman: A Benchmark for Detecting Human in a Crowd[C]. arXiv preprint arXiv:1805.00123, 2018. |
[25] | YU X H, GONG Y Q, JIANG N, et al. Scale Match for Tiny Person Detection[C]. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020: 1246-1254. |
[26] | 李泽琛, 李恒超, 胡文帅, 等. 多尺度注意力学习的Faster R-CNN口罩人脸检测模型[J]. 西南交通大学学报, 2021, 56(5): 1002-1010. |
[27] | KÖSTINGER M, WOHLHART P, ROTH P, et al. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization[C]. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011: 2144-2151. |
[28] | YANG B, YAN J J, LEI Z, et al. Fine-grained evaluation on face detection in the wild[C]. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2015: 1-7. |
[29] | LIU Z, LUO P, WANG X, et al. Deep Learning Face Attributes in the Wild[C]. 2015 IEEE International Conference on Computer Vision (ICCV), 2015: 3730-3738. |
[30] | YANG S, LUO P, LOY C C, et al. WIDER FACE: A Face Detection Benchmark[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 5525-5533. |
[31] | MISHRA A, RAI S N, MISHRA A, et al. IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild[C]. European Conference on Computer Vision (ECCV), 2016: 35-47. |
[32] | KEMELMACHER-SHLIZERMAN I, SEITZ S M, MILLER D, et al. The MegaFace Benchmark: 1 Million Faces for Recognition at Scale[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 4873-4882. |
[33] | GUO Y D, ZHANG L, HU Y X, et al. MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World[J]. electronic imaging, 2016, 2016(11): 1-6. |
[34] | WANG L J, GAN Y K, LUO R X, et al. Unconstrained Face Detection via Anchor-based Region Proposal Network[C]. Proceedings of the 25th ACM International Conference on Multimedia, 2017: 1181-1189. |
[35] | YUCEL M K, BILGE Y C, OGUZ O, et al. Wildest Faces: Face Detection and Recognition in Violent Settings[J]. arXiv preprint arXiv: 1805.07566, 2018. |
[36] | MAZE B, ADAMS J C, DUNCAN J A, et al. IARPA Janus Benchmark-C: Face Dataset and Protocol[C]. 2018 International Conference on Biometrics (ICB), 2018: 158-165. |
[37] | CAO Q, SHEN L, XIE W D, et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age[C]. 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition, 2018: 67-74. |
[38] | DENG J, GUO J, ZAFEIRIOU J. ArcFace: Additive Angular Margin Loss for Deep Face Recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 4685-4694. |
[39] | ZHU Z, HUANG G, DENG J, et al. WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 10487-10497. |
[40] | SERMANET P, LECUN Y. Traffic sign recognition with multi-scale Convolutional Networks[C]. The 2011 International Joint Conference on Neural Networks, 2011: 2809-2813. |
[41] |
胡均平, 王鸿树, 戴小标, 等. 改进YOLOv5的小目标交通标志实时检测算法[J]. 计算机工程与应用, 2023, 59(2): 185-193.
doi: 10.3778/j.issn.1002-8331.2206-0503 |
[42] | HOUBEN S, STALLKAMP J, SALMEN J, et al. Detection of traffic signs in real-world images: The German traffic sign detection benchmark[C]. The 2013 International Joint Conference on Neural Networks (IJCNN), 2013: 1-8. |
[43] | LARSSON F, FELSBERG M. Using Fourier Descriptors and Spatial Models for Traffic Sign Recognition[C]. Scandinavian Conference on Image Analysis, 2011: 238-249. |
[44] |
MØGELMOSE A, TRIVEDI M M, MOESLUND T B. Vision-Based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2012, 13(4): 1484-1497.
doi: 10.1109/TITS.2012.2209421 |
[45] |
TIMOFTE R, ZIMMERMANN K, GOOL L. Multi-view traffic sign detection, recognition, and 3D localization[J]. Machine Vision and Applications, 2009, 25(3): 633-647.
doi: 10.1007/s00138-011-0391-3 |
[46] | ZHU Z, LIANG D, ZHANG S H, et al. Traffic-Sign Detection and Classification in the Wild[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 2110-2118. |
[47] |
ZHANG J M, HUANG M T, JIN X K, et al. A Real-Time Chinese Traffic Sign Detection Algorithm Based on Modified YOLOv2[J]. Algorithms, 2017, 10(4): 127.
doi: 10.3390/a10040127 |
[48] | TEMEL D, ALSHAWI T A, CHEN M, et al. Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions[J]. arXiv preprint arXiv: 1902.06857, 2019. |
[49] |
钱伍, 王国中, 李国平. 改进YOLOv5的交通灯实时检测鲁棒算法[J]. 计算机科学与探索, 2022, 16(1): 231-241.
doi: 10.3778/j.issn.1673-9418.2105033 |
[50] | DE CHARETTE R, NASHASHIBI F. Real time visual traffic lights recognition based on Spot Light Detection and adaptive traffic lights templates[C]. 2009 IEEE Intelligent Vehicles Symposium, 2009: 358-363. |
[51] |
JENSEN MB, PHILIPSEN MP, MØGELMOSE A, et al. Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(7): 1800 -1815.
doi: 10.1109/TITS.2015.2509509 |
[52] | FREGIN A, MÜLLER J, KREBEL U, et al. The DriveU Traffic Light Dataset: Introduction and Comparison with Existing Datasets[C]. 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018:3376-3383. |
[53] | BEHRENDT K, NOVAK L, BOTROS R. A deep learning approach to traffic lights: Detection, tracking, and classification[C]. 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017: 1370-1377. |
[54] | YU F, CHEN H, WANG X, et al. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020:2633-2642. |
[55] | PINGGERA P, RAMOS S, GEHRIG S K, et al. Lost and Found: detecting small road hazards for self-driving vehicles[C]. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016: 1099-1106. |
[56] | PAN X G, SHI J P, LUO P, et al. Spatial As Deep: Spatial CNN for Traffic Scene Understanding[C]. AAAI Conference on Artificial Intelligence, 2018:7276: 7283. |
[57] | YOO S, LEE H, MYEONG H, et al. End-to-End Lane Marker Detection via Row-wise Classification[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020: 4335-4343. |
[58] | BEHRENDT K AND SOUSSAN R. Unsupervised Labeled Lane Markers Using Maps[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019: 832-839. |
[59] | 赵文清, 孔子旭, 周震东, 等. 增强小目标特征的航空遥感目标检测[J]. 中国图象图形学报, 2021, 26(3): 644-653. |
[60] |
LIU K, MÁTTYUS G. Fast Multiclass Vehicle Detection on Aerial Images[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(9): 1938-1942.
doi: 10.1109/LGRS.2015.2439517 |
[61] |
RAZAKARIVONY S, JURIE F. Vehicle detection in aerial imagery : A small target detection benchmark[J]. Journal of Visual Communication and Image Representation, 2016, 34: 187-203.
doi: 10.1016/j.jvcir.2015.11.002 |
[62] | MUNDHENK T N, KONJEVOD G, SAKLA W A, et al. A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning[C]. European Conference on Computer Vision (ECCV), 2016: 785-800. |
[63] | XIA G S, BAI X, DING J, ET AL. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3974-3983. |
[64] | LAM D, KUZMA R, MCGEE K, et al. xView: Objects in Context in Overhead Imagery[J]. arXiv preprint arXiv: 1802.07856, 2018. |
[65] | ZHU P F, WEN L Y, BIAN X, et al. Vision Meets Drones: A Challenge[J]. arXiv preprint arXiv:1804.07437, 2018. |
[66] |
LI K, WAN G, CHENG G, et al. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 159: 296-307.
doi: 10.1016/j.isprsjprs.2019.11.023 |
[67] |
DING J, XUE N, XIA G, et al. Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(11): 7778-7796.
doi: 10.1109/TPAMI.2021.3117983 |
[68] | WANG K, BELONGIE S J. Word Spotting in the Wild[C]. European Conference on Computer Vision (ECCV), 2010: 591-604. |
[69] | YAO C, BAI X, LIU W Y, et al. Detecting texts of arbitrary orientations in natural images[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1083-1090. |
[70] | MISHRA A, KARTEEK A, JAWAHAR C. Scene Text Recognition using Higher Order Language Priors[C]. British Machine Vision Conference (BMVC), 2012. |
[71] | KARATZAS D, BIGORDA L G, NICOLAOU A, et al. ICDAR 2015 competition on Robust Reading[C]. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015: 1156-1160. |
[72] | NAYEF N, YIN F, BIZID I, et al. ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT[C]. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017: 1454-1459. |
[73] | VEIT A, MATERA T, NEUMANN L, et al. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images[J]. arXiv preprint arXiv:1601.07140, 2016. |
[74] | CHNG C AND CHAN C. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition[C]. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017: 935-942. |
[75] | HE M, LIU Y, YANG Z, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web Images[C]. 2018 24th International Conference on Pattern Recognition (ICPR), 2018: 7-12. |
[76] |
YUAN T, ZHU Z, XU K, et al. A Large Chinese Text Dataset in the Wild[J]. Journal of Computer Science and Technology, 2019, 34(3): 509-521.
doi: 10.1007/s11390-019-1923-y |
[77] | NAYEF N, PATEL Y, BUSTA M, et al. ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019[C]. 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019: 1582-1587. |
[78] | CUBUK E D, ZOPH B, MANÉ D, et al. AutoAugment: Learning Augmentation Strategies From Data[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 113-123. |
[79] |
SHORTEN C, KHOSHGOFTAAR M. A survey on Image Data Augmentation for Deep Learning[J]. Journal of Big Data, 2019, 6(1): 1-48.
doi: 10.1186/s40537-018-0162-3 |
[80] | ZHANG H Y, CISSÉ M, DAUPHIN Y, et al. mixup: Beyond Empirical Risk Minimization[J]. arXiv preprint arXiv:1710.09412, 2017. |
[81] | BERTHELOT D, CARLINI N, GOODFELLOW I J, et al. MixMatch: A Holistic Approach to Semi-Supervised Learning[J]. arXiv preprint arXiv:1905.02249, 2019. |
[82] | LAINE S, AILA T. Temporal Ensembling for Semi-Supervised Learning[J]. arXiv preprint arXiv:1610.02242, 2016. |
[83] | DOERSCH C, GUPTA A K, EFROS A A. Unsupervised Visual Representation Learning by Context Prediction[C]. 2015 IEEE International Conference on Computer Vision (ICCV), 2015: 1422-1430. |
[84] | QI C, LIU W, WU C X, et al. Frustum PointNets for 3D Object Detection from RGB-D Data[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 918-927. |
[85] | SHI S S, WANG X G, LI H S. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018:770-779. |
[86] | ZHOU Y, TUZEL O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4490-4499. |
[87] | FIEDLER N, BESTMANN M, HENDRICH N. ImageTagger: An Open Source Online Platform for Collaborative Image Labeling[M]. RoboCup 2018:Robot World Cup XXII. Lecture Notes in Computer Science, 2019, 11374: 162-169. |
[88] | LAHTINEN T, TURTIAINEN H, COSTIN A I. Brima: Low-Overhead Browser-Only Image Annotation Tool (Preprint)[C]. 2021 IEEE International Conference on Image Processing (ICIP), 2021: 2633-2637. |
[1] | WANG Ziyuan, WANG Guozhong. Application of Improved Lightweight YOLOv5 Algorithm in Pedestrian Detection [J]. Frontiers of Data and Computing, 2023, 5(6): 161-172. |
[2] | LI Dongwen,ZHONG Zhenyu,SHEN Junyu,WANG Haotian,SUN Yufei,ZHANG Yuzhi. NKCorpus: Extracting High Quality Large Chinese Dataset from Web Data [J]. Frontiers of Data and Computing, 2022, 4(3): 30-45. |
[3] | CHEN Qiong,YANG Yong,HUANG Tianlin,FENG Yuan. A Survey on Few-Shot Image Semantic Segmentation [J]. Frontiers of Data and Computing, 2021, 3(6): 17-34. |
[4] | SHEN Biao,CHEN Yang,YANG Chen,LIU Bowen. Computer Vision Detection and Analysis of Mesoscale Eddies in Marine Science [J]. Frontiers of Data and Computing, 2020, 2(6): 30-41. |
[5] | Leng Jiaxu,Liu Ying. Small Object Detection and Recognition Based onDeep Learning [J]. Frontiers of Data and Computing, 2020, 2(2): 120-135. |
[6] | Sun Zhenan,Zhang Zhaoxiang,Wang Wei,Liu Fei,Tan Tieniu. Artificial Intelligence: Developments and Advances in 2019 [J]. Frontiers of Data and Computing, 2019, 1(2): 1-16. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||