查看论文信息

中文题名：	智能汽车的多相机行人检测与跟踪方法研究
姓名：	石志奇
学号：	1049732004322
保密级别：	公开
论文语种：	chi
学科代码：	085206
学科名称：	工学 - 工程 - 动力工程
学生类型：	硕士
学校：	武汉理工大学
院系：	汽车工程学院
专业：	能源动力
研究方向：	智能网联汽车
第一导师姓名：	邹斌
第一导师院系：	汽车工程学院
完成日期：	2023-03-27
答辩日期：	2023-05-21
中文关键词：	多相机 ; 多任务 ; 智能汽车 ; 关联匹配
中文摘要：	︿智能汽车的发展是主要趋势，其所处的交通运输环境很复杂，其中存在着多样性的参与者以及难以预计的突发状况，如何从容应对这样的环境是一项很有必要的研究课题。对于智能汽车来说，接触世界与理解世界是最为基础的能力，它的决策和行为都要依赖于此，而构建这样能力的基石是视觉感知技术，目标的检测与跟踪是该技术的关键环节。得益于硬件设备的堆料，智能汽车配备了多个相机，但是，如何协同多个相机处理复杂视角的场景成为一个新的问题。所以，本文将以多相机协同匹配的检测与跟踪作为研究目标，以交通环境中复杂多变的行人目标作为研究对象，从底层的检测与跟踪任务出发，逐步开展研究，主要的研究内容有：（1）针对目标检测与目标跟踪任务不一致的问题，本文构建特征深度聚合主干特征提取网络，在保证整体工作效率的情况下，提升网络模型的性能。首先，理论分析检测与跟踪任务各自的特性，检测需要多尺度的融合特征，跟踪则更需要低维的外观特征；其次，从主干网络深度与宽度的平衡出发，融合使用残差结构、深度聚合架构和特征分组注意结构来构建主干网络；最后，利用简单的检测模块在复杂人群数据集上进行测试，验证主干网络模型有效性，同时训练和推理可为后续检测与跟踪环节奠定模型基础。（2）针对多任务学习中存在难以取得平衡的问题，通过多目标优化的方法，兼顾了本文所需多个特征的学习能力，保证了网络模型学习的有效进行。首先，分析现有的跟踪方法，联合多任务学习的跟踪在速度与精度两方面都表现良好；其次，考虑到后续跟踪所需要的重识别的特征表示与主流检测锚框的特征表示的差异，采用关键点的形式进行目标特征的表达，并在检测分支和特征分支后使用多目标优化的损失进行联合训练；最后，为学习到更为泛化性的特征，组合重识别、检测和跟踪三个领域的经典数据集，进行模型的训练与推理，目标id的分类指标表明网络模型学习到了有效的特征表达。（3）针对多个相机存在视角差异、环境要素复杂的问题，构建动态图来链接多个相机的局部跟踪轨迹，使多相机能够更好地协调配合。首先，分析现有的关联匹配方法，采用图节点表示的方法效率表现较好；其次，考虑到每个节点的差异，引入动态图和注意力机制进行节点特征的建模，得到预测模型；最后，使用重识别领域的数据集，专门验证不同视角的目标id的识别能力，以此为后续环节中不同相机的交接打下基础。（4）为验证实际的整体工作流程，在使用交通实景的数据集上进行验证。首先，针对本文多视角的2D图像数据需求，调用nuScenes 3D目标跟踪数据集中符合条件的部分数据；最后，测试验证的结果表明，本文方法能够较好地应对目标遮挡和目标交接等问题，成功地维护了目标的轨迹id等信息，在单相机的局部跟踪和多相机的全局跟踪都得到了较好性能提升。﹀
参考文献：	︿ [1] 智能汽车创新发展战略[EB/OL]. [2020-02-10]. https://www.ndrc.gov.cn/xxgk/zcfb /tz/202002/P020200224573058971435.pdf. [2] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. [3] Zhang S, Chi C, Yao Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 9759-9768. [4] Girshick R, Donahue J, Darrell T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(1): 142-158. [5] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [6] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28. [7] Jiang P, Ergu D, Liu F, et al. A Review of Yolo algorithm developments[J]. Procedia Computer Science, 2022, 199: 1066-1073. [8] Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6569-6578. [9] Zhou X, Zhuo J, Krahenbuhl P. Bottom-up object detection by grouping extreme and center points[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 850-859. [10] Chen F, Wang X, Zhao Y, et al. Visual object tracking: A survey[J]. Computer Vision and Image Understanding, 2022, 222: 103508. [11] Baker S, Matthews I. Lucas-kanade 20 years on: A unifying framework[J]. International journal of computer vision, 2004, 56: 221-255. [12] Nguyen H T, Smeulders A W M. Fast occluded object tracking by a robust appearance filter[J]. IEEE transactions on pattern analysis and machine intelligence, 2004, 26(8): 1099-1104. [13] Zhou Y, Bai X, Liu W, et al. Similarity fusion for visual tracking[J]. International Journal of Computer Vision, 2016, 118: 337-363. [14] Isard M, Blake A. CONDENSATION-conditional density propagation for visual tracking[J]. International journal of computer vision, 1998, 29(1): 5. [15] Sui Y, Tang Y, Zhang L, et al. Visual tracking via subspace learning: A discriminative approach[J]. International Journal of Computer Vision, 2018, 126: 515-536. [16] Mei X, Ling H, Wu Y, et al. Minimum error bounded efficient ℓ 1 tracker with occlusion detection[C]//CVPR 2011. IEEE, 2011: 1257-1264. [17] Zhou H, Fei M, Sadka A, et al. Adaptive fusion of particle filtering and spatio-temporal motion energy for human tracking[J]. Pattern Recognition, 2014, 47(11): 3552-3567. [18] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters[C]//2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010: 2544-2550. [19] Kiani H, Sim T, Lucey S. Multi-channel correlation filters[C]//ICCV. 2013, 4: 5. [20] Danelljan M, Hager G, Shahbaz Khan F, et al. Convolutional features for correlation filter based visual tracking[C]//Proceedings of the IEEE international conference on computer vision workshops. 2015: 58-66. [21] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780. [22] Cui Z, Xiao S, Feng J, et al. Recurrently target-attending tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1449-1458. [23] Bromley J, Guyon I, LeCun Y, et al. Signature verification using a" siamese" time delay neural network[J]. Advances in neural information processing systems, 1993, 6. [24] Tao R, Gavves E, Smeulders A W M. Siamese instance search for tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1420-1429. [25] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14. Springer International Publishing, 2016: 850-865. [26] Li B, Yan J, Wu W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8971-8980. [27] Fan H, Ling H. Sanet: Structure-aware network for visual tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017: 42-49. [28] Bewley A, Ge Z, Ott L, et al. Simple online and realtime tracking[C]//2016 IEEE international conference on image processing (ICIP). IEEE, 2016: 3464-3468. [29] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE international conference on image processing (ICIP). IEEE, 2017: 3645-3649. [30] Zhou X, Wang D, Krähenbühl P. Objects as points[J]. arXiv preprint arXiv:1904.07850, 2019. [31] Zhou X, Koltun V, Krähenbühl P. Tracking objects as points[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV. Cham: Springer International Publishing, 2020: 474-490. [32] 龚轩,乐孜纯,王慧,武玉坤.多目标跟踪中的数据关联技术综述[J].计算机科学,2020,47(10):136-144. [33] Ristani E, Tomasi C. Features for multi-target multi-camera tracking and re-identification[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6036-6046. [34] 赵才荣,齐鼎,窦曙光,涂远鹏,孙添力,柏松,蒋忻洋,白翔,苗夺谦.智能视频监控关键技术:行人再识别研究综述[J].中国科学:信息科学,2021,51(12):1979-2015. [35] Jiang N, Bai S C, Xu Y, et al. Online inter-camera trajectory association exploiting person re-identification and camera topology[C]//Proceedings of the 26th ACM international conference on Multimedia. 2018: 1457-1465. [36] Hsu H M, Wang Y, Cai J, et al. Multi-Target Multi-Camera Tracking of Vehicles by Graph Auto-Encoder and Self-Supervised Camera Link Model[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022: 489-499. [37] Zhang Z, Wu J, Zhang X, et al. Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project[J]. arXiv preprint arXiv:1712.09531, 2017. [38] Leal-Taixé L, Canton-Ferrer C, Schindler K. Learning by tracking: Siamese CNN for robust target association[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2016: 33-40. [39] He Y, Wei X, Hong X, et al. Multi-target multi-camera tracking by tracklet-to-target assignment[J]. IEEE Transactions on Image Processing, 2020, 29: 5191-5205. [40] Chen W, Cao L, Chen X, et al. An equalized global graph model-based approach for multicamera object tracking[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 27(11): 2367-2381. [41] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018. [42] Wang Z, Zheng L, Liu Y, et al. Towards real-time multi-object tracking[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer International Publishing, 2020: 107-122. [43] Liang C, Zhang Z, Zhou X, et al. Rethinking the competition between detection and ReID in multiobject tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 3182-3196. [44] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125. [45] Zhang Y, Wang C, Wang X, et al. Fairmot: On the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision, 2021, 129: 3069-3087. [46] Liang T, Chu X, Liu Y, et al. Cbnet: A composite backbone network architecture for object detection[J]. IEEE Transactions on Image Processing, 2022, 31: 6893-6906. [47] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [48] Shafiq M, Gu Z. Deep residual learning for image recognition: a survey[J]. Applied Sciences, 2022, 12(18): 8972. [49] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9. [50] Yu F, Wang D, Shelhamer E, et al. Deep layer aggregation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2403-2412. [51] Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2736-2746. [52] Shao S, Zhao Z, Li B, et al. Crowdhuman: A benchmark for detecting human in a crowd[J]. arXiv preprint arXiv:1805.00123, 2018. [53] Leichter I, Lindenbaum M, Rivlin E. Mean shift tracking with multiple reference color histograms[J]. Computer Vision and Image Understanding, 2010, 114(3): 400-408. [54] Zhao F, Hui K, Wang T, et al. A KCF-based incremental target tracking method with constant update speed[J]. IEEE Access, 2021, 9: 73544-73560. [55] Ruder S. An overview of multi-task learning in deep neural networks[J]. arXiv preprint arXiv:1706.05098, 2017. [56] Liang C, Zhang Z, Zhou X, et al. Rethinking the competition between detection and ReID in multiobject tracking[J]. IEEE Transactions on Image Processing, 2022, 31: 3182-3196. [57] Vandenhende S, Georgoulis S, Van Gansbeke W, et al. Multi-task learning for dense prediction tasks: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(7): 3614-3633. [58] Dollár P, Wojek C, Schiele B, et al. Pedestrian detection: A benchmark[C]//2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009: 304-311. [59] Xiao T, Li S, Wang B, et al. Joint detection and identification feature learning for person search[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 3415-3424. [60] Zheng L, Zhang H, Sun S, et al. Person re-identification in the wild[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1367-1376. [61] Milan A, Leal-Taixé L, Reid I, et al. MOT16: A benchmark for multi-object tracking[J]. arXiv preprint arXiv:1603.00831, 2016. [62] Zheng L, Shen L, Tian L, et al. Scalable person re-identification: A benchmark[C] //Proceedings of the IEEE international conference on computer vision. 2015: 1116-1124. [63] Hsu H M, Wang Y, Cai J, et al. Multi-Target Multi-Camera Tracking of Vehicles by Graph Auto-Encoder and Self-Supervised Camera Link Model[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022: 489-499. [64] Ji X, Zhang G, Chen X, et al. Multi-perspective tracking for intelligent vehicle[J]. IEEE transactions on intelligent transportation systems, 2018, 19(2): 518-529. [65] Luna E, SanMiguel J C, Martínez J M, et al. Online clustering-based multi-camera vehicle tracking in scenarios with overlapping FOVs[J]. Multimedia Tools and Applications, 2022, 81(5): 7063-7083. [66] Zheng L, Bie Z, Sun Y, et al. Mars: A video benchmark for large-scale person re-identification[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer International Publishing, 2016: 868-884. [67] Cao L, Chen W, Chen X, et al. An equalised global graphical model-based approach for multi-camera object tracking[J]. arXiv preprint arXiv:1502.03532, 2015, 8. [68] Milan A, Schindler K, Roth S. Challenges of ground truth evaluation of multi-target tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2013: 735-742. [69] Caesar H, Bankiti V, Lang A H, et al. nuscenes: A multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11621-11631. [70] Zeng F, Dong B, Zhang Y, et al. Motr: End-to-end multiple-object tracking with transformer[C] //Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII. Cham: Springer Nature Switzerland, 2022: 659-675. [71] Zhang T, Chen X, Wang Y, et al. Mutr3d: A multi-camera tracking framework via 3d-to-2d queries[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 4537-4546. ﹀
中图分类号：	TK05
条码号：	002000070906
馆藏号：	TD10058241
馆藏位置：	403
备注：	403-西院分馆博硕论文库；203-余家头分馆博硕论文库

附件下载