目标跟踪算法是对给定目标的位置进行预估与定位从而实现持续跟踪。随着硬件技术以及神经网络的发展,目标跟踪在精度与速度上远超传统算法,基于孪生神经网络的跟踪算法是当前众多学者主要研究的方向之一。本文主要对孪生网络结构以及相关算法进行介绍。首先介绍孪生结构原理,其次根据改进方向对存在的算法进行阐述,随后介绍经典数据集,最后对现有算法发展进行总结与展望。
Target tracking algorithm is to estimate and locate the position of a given target to achieve continuous tracking. With the development of hardware technology and neural network, target tracking is far more accurate and faster than traditional algorithms. Tracking algorithm based on siamese neural network is one of the main research directions of many scholars at present. This paper mainly introduces the siamese network structure and related algorithms. Firstly, the principle of siamese structure is introduced, then the existing algorithms are described according to the improvement direction, then the classical data sets are introduced, and finally, the development of existing algorithms is summarized and prospected.
Algorithm Research of Siamese Neural Network in Target Tracking
Minghan Li
Computer Department, North China University of Technology, Beijing
Received: Jun. 23rd, 2022; accepted: Aug. 5th, 2022; published: Aug. 15th, 2022
ABSTRACT
Target tracking algorithm is to estimate and locate the position of a given target to achieve continuous tracking. With the development of hardware technology and neural network, target tracking is far more accurate and faster than traditional algorithms. Tracking algorithm based on siamese neural network is one of the main research directions of many scholars at present. This paper mainly introduces the siamese network structure and related algorithms. Firstly, the principle of siamese structure is introduced, then the existing algorithms are described according to the improvement direction, then the classical data sets are introduced, and finally, the development of existing algorithms is summarized and prospected.
SINT [18] 算法最早将孪生结构用于目标跟踪,通过学习匹配函数返回后续帧中与目标最相似的patch从而实现较精准的定位。同年,Bertinetto L [19] 等人在跟踪部分使用相似度量的思想提出SiamFC算法,采用孪生神经网络结构如图2,将第一帧的目标作为模板图像,后续帧作为搜索图像,对模板图像和搜索图像进行放缩填充后,输入到相同的骨干网络中,生成各自的特征图,以模板图像的特征图作为卷积核在搜索图像的特征图上滑动互卷积,进行相似性判断生成对应的置信图,得分最高的子窗口即为预测目标所在位置。
在目标跟踪任务中,连续帧之间存储着较为丰富的时间信息,大多数算法忽略了这一特征,Wang N [41] 等将Transformer结构引入到跟踪框架中进行辅助跟踪,不修改模板匹配方法,将编码器和解码器分离为两个并行分支,连接视频流中的隔离帧,在帧间传递时间上下文信息。TransT [42] 中借助Transformer的思想引入上下文增强模块ECA与交叉特征增强模块CFA来完成特征融合的功能。Zhao M [43] 等改进Ocean算法,使用Transformer结构完成孪生网络中的互相关操作,以此来获得全局和丰富的上下文相关性。
基于Transformer的许多算法在性能方面不如CNN,直到Swin-Transformer [44] 的出现。Lin L [45] 等将特征提取网络直接换成Swin-Transformer,提出了完全基于注意力机制的跟踪算法SwinTrack,在孪生结构的基础上,直接使用Transformer结构进行特征提取与特征融合,在众多具有挑战的数据集上都处于领先位置。
李铭涵. 孪生神经网络在目标跟踪中的算法研究Algorithm Research of Siamese Neural Network in Target Tracking[J]. 人工智能与机器人研究, 2022, 11(03): 278-287. https://doi.org/10.12677/AIRR.2022.113029
参考文献References李玺, 查宇飞, 张天柱, 崔振, 左旺孟, 侯志强, 卢湖川, 王菡子. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080.葛宝义, 左宪章, 胡永江. 视觉目标跟踪方法研究综述[J]. 中国图象图形学报, 2018, 23(8): 1091-1107.卢湖川, 李佩霞, 王栋. 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31(1): 61-76.刘艺, 李蒙蒙, 郑奇斌, 秦伟, 任小广. 视频目标跟踪算法综述[J]. 计算机科学与探索, 2022, 16(7): 1504-1515.Jha, S., Seo, C., Yang, E. and Joshi, G.P. (2020) Real Time Object Detection and Tracking System for Video Surveillance System. Multimedia Tools and Applications, 80, 3981-3996. <br>https://doi.org/10.1007/s11042-020-09749-x毛昭勇, 王亦晨, 王鑫, 沈钧戈. 面向高速公路的车辆视频监控分析系统[J]. 西安电子科技大学学报, 2021, 48(5): 178-189.金立生, 华强, 郭柏苍, 谢宪毅, 闫福刚, 武波涛. 基于优化DeepSort的前方车辆多目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1056-1064.林梦馨. 基于孪生网络的颈动脉超声图像血管目标跟踪方法研究[D]: [硕士学位论文]. 上海: 华东师范大学, 2022.何金刚, 徐林峰, 张金鹏, 余治民. 一种基于自适应网格机制的强机动目标滤波算法[J]. 航空兵器, 2021, 28(6): 40-45鲁仁全, 罗勇民, 林明, 徐雍, 绕红霞. 基于轻量型孪生网络的降落跟踪控制方法和系统及无人机[P]. 中国专利, CN202110426555.2. 2021-07-16.刘晓峰, 张春富, 唐鹏. 基于单目视觉的移动光斑跟踪定位方法[J]. 信息技术, 2020, 44(1): 48-53.周俊宇, 赵艳明. 卷积神经网络在图像分类和目标检测应用综述[J]. 计算机工程与应用, 2017, 53(13): 34-41.李旭冬, 叶茂, 李涛. 基于卷积神经网络的目标检测研究综述[J]. 计算机应用研究, 2017, 34(10): 2881-2886+2891.Bromley, J., Guyon, I., Lecun, Y., et al. (1993) Signature Verification Using a “Siamese” Time Delay Neural Network. In: Cowan, J., Tesauro, G. and Alspector, J., Eds., Advances in Neural Information Processing Systems 6.Nair, V., Hinton, G.E. (2010) Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, 21-24 June 2010, 807-814.Zagoruyko, S. and Komodakis, N. (2015) Learning to Compare Image Patches via Convolutional Neural Networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 4353-4361.
<br>https://doi.org/10.1109/CVPR.2015.7299064张卡, 宿东, 王蓬勃, 陈辉, 张珊, 叶龙杰, 赵娜. 深度学习技术在影像密集匹配方面的进展与应用[J]. 科学技术与工程, 2020, 20(30): 12268-12278.Tao, R., Gavves, E. and Smeulders, A.W.M. (2016) Siamese Instance Search for Tracking. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1420-1429.
<br>https://doi.org/10.1109/CVPR.2016.158Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A. and Torr, P.H.S. (2016) Fully-Convolutional Siamese Networks for Object Tracking. 2016 European Conference on Computer Vision, Amsterdam, 8-10 and 15-16 October 2016, 850-865. <br>https://doi.org/10.1007/978-3-319-48881-3_56蒲磊, 李海龙, 侯志强, 冯新喜, 何玉杰. 基于高层语义嵌入的孪生网络跟踪算法[J/OL]. 北京航空航天大学学报, 2022: 1-10. <br>https://doi.org/10.13700/j.bh.1001-5965.2021.0319, 2022-07-09.Li, Y. and Zhang, X. (2019) SiamVGG: Visual Tracking Using Deeper Siamese Networks.陈富健, 谢维信. 引入抗遮挡机制的SiamVGG网络目标跟踪算法[J]. 信号处理, 2020, 36(4): 562-571.
<br>https://doi.org/10.16798/j.issn.1003-0530.2020.04.010Zhang, Z. and Peng, H. (2020) Deeper and Wider Siamese Networks for Real-Time Visual Tracking. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4586-4595.
<br>https://doi.org/10.1109/CVPR.2019.00472邵江南, 葛洪伟. 融合残差连接与通道注意力机制的Siamese目标跟踪算法[J]. 计算机辅助设计与图形学学报, 2021, 33(2): 260-269.Li, B., Yan, J., Wu, W., Zhu, Z. and Hu, X. (2018) High Performance Visual Tracking with Siamese Region Proposal Network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8971-8980. <br>https://doi.org/10.1109/CVPR.2018.00935Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J. and Yan, J. (2018) SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4277-4286. <br>https://doi.org/10.1109/CVPR.2019.00441尚欣茹, 温尧乐, 奚雪峰, 胡伏原. 孪生导向锚框RPN网络实时目标跟踪[J]. 中国图象图形学报, 2021, 26(2): 415-424.姜文涛, 崔江磊. 旋转区域提议网络的孪生神经网络跟踪算法[J/OL]. 计算机工程与应用, 2022: 1-11.
http://kns.cnki.net/kcms/detail/11.2127.TP.20220618.1146.014.html, 2022-07-09.Tian, Z., Shen, C., Chen, H. and He, T. (2019) FCOS: Fully Convolutional One-Stage Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 9626-9635.
<br>https://doi.org/10.1109/ICCV.2019.00972Guo, D., Wang, J., Cui, Y., Wang, Z. and Chen, S. (2020) SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 6268-6276. <br>https://doi.org/10.1109/CVPR42600.2020.00630Chen, Z., Zhong, B., Li, G., Zhang, S. and Ji, R. (2020) Siamese Box Adaptive Network for Visual Tracking. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 6667-6676.
<br>https://doi.org/10.1109/CVPR42600.2020.00670Xu, Y., Wang, Z., Li, Z., Yuan, Y. and Yu, G. (2019) SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12549-12556.
<br>https://doi.org/10.1609/aaai.v34i07.6944Zhang, Z., Peng, H., Fu, J., Li, B. and Hu, W. (2020) Ocean: Object-Aware Anchor-Free Tracking. European Conference on Computer Vision 2020, Vol. 12366, Glasgow, 23-28 August 2020, 771-787.
<br>https://doi.org/10.1007/978-3-030-58589-1_46Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate.Mnih, V., Heess, N., Graves, A. and Kavukcuoglu, K. (2014) Recurrent Models of Visual Attention. arXiv:1406.6247.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. arXiv:1706.03762.Ni, Z.L., Bian, G.B., Xie, X.L., Hou, Z.-G., Zhou, X.-H. and Zhou, Y.-J. (2019) RASNet: Segmentation for Tracking Surgical Instruments in Surgical Videos Using Refined Attention Segmentation Network. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, 23-27 July 2019, 5735-5738. <br>https://doi.org/10.1109/EMBC.2019.8856495Yu, Y., Xiong, Y., Huang, W. and Scott, M.R. (2020) Deformable Siamese Attention Networks for Visual Object Tracking. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 6727-6736. <br>https://doi.org/10.1109/CVPR42600.2020.00676Du, F., Liu, P., Zhao, W. and Tang, X. (2020) Correlation-Guided Attention for Corner Detection Based Visual Tracking. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 6835-6844. <br>https://doi.org/10.1109/CVPR42600.2020.00687Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.Wang, N., Zhou, W., Wang, J. and Li, H. (2021) Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 1571-1580. <br>https://doi.org/10.1109/CVPR46437.2021.00162Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X. and Lu, H. (2021) Transformer Tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 8122-8131.
<br>https://doi.org/10.1109/CVPR46437.2021.00803Zhao, M., Okada, K. and Inaba, M. (2021) TrTr: Visual Tracking with Transformer. arXiv:2105.03817.Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. <br>https://doi.org/10.1109/ICCV48922.2021.00986Lin, L., Fan, H., Xu, Y. and Ling, H. (2021) SwinTrack: A Simple and Strong Baseline for Transformer Tracking. arXiv:2112.00995.Yan, B., Peng, H., Wu, K., Wang, D., Fu, J. and Lu, H. (2021) LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 15175-15184. <br>https://doi.org/10.1109/CVPR46437.2021.01493Blatter, P., Kanakis, M., Danelljan, M. and Van Gool, L. (2021) Efficient Visual Tracking with Exemplar Transformers. arXiv:2112.09686.Borsuk, V., Vei, R., Kupyn, O., Martyniuk, T., Krashenyi, I. and Matas, J. (2021) FEAR: Fast, Efficient, Accurate and Robust Visual Tracker. arXiv:2112.07957