Journal of Image and Signal Processing
Vol. 08  No. 02 ( 2019 ), Article ID: 29496 , 8 pages
10.12677/JISP.2019.82007

Target Detection and Recognition Based on Improved Faster R-CNN

Jingjing Fang, Jinyong Cheng

School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Ji’nan Shandong

Received: Mar. 6th, 2019; accepted: Mar. 17th, 2019; published: Mar. 28th, 2019

ABSTRACT

In recent years, with the continuous development of in-depth learning, image research and application based on in-depth learning has achieved excellent results in many fields. RCNN network and full convolution network make the development of target detection technology more and more rapid. Faster R-CNN algorithm has been proposed and widely used in the field of target detection and recognition. In this paper, we mainly study the object detection based on Faster R-CNN algorithm for the image in the data set of self-made office supplies. Compared with RCNN series algorithms, Faster R-CNN proposes a regional recommendation network, and integrates feature extraction, candidate box extraction, boundary box regression and classification into a network, which greatly improves the overall performance. In this paper, an improved Faster R-CNN algorithm based on activation function is proposed. When extracting features, the data set usually has a large number of high-density continuity characteristics, while the activation function is sparse, which solves the problem of target detection of office supplies under small targets and complex background, and improves the detection speed and accuracy.

Keywords:Deep Learning, Object Detections, Region Proposal Network, Feature Extraction

1. 引言

Fast R-CNN基本实现端到端的检测 [8] ，但是在选择性搜索(Selective Search，简称SS)算法 [9] [10] 提取候选框时需要耗费大量的时间，针对该问题Faster R-CNN算法中提出了区域建议网络(Region Proposal Network，简称RPN) [11] 的概念，这个RPN网络是利用神经网络自己学习来产生候选区域 [12] 。在处理办公用品数据集时因为图像背景复杂特征提取不准确，本文在基础的Faster R-CNN算法上使用ReLU和Leaky ReLU激活函数，这个方法很大程度地提高了生成候选区域的可靠程度和目标检测的准确度，并且有效地缩短了预测时间。

2. 基于Faster R-CNN算法目标检测与识别

2.1. Faster R-CNN算法

Faster R-CNN作为一种CNN网络目标检测算法，首先使用卷积层提取输入图像的特征图 [13] ，该特征图被共享用于RPN网络和全连接层 [14] 。随后用RPN网络生成区域建议，通过softmax分类器判断候选区域属于前景还是背景，再利用边界框回归 [15] 修正候选区域的位置，获得精准的检测框。在Faster R-CNN算法中感兴趣区域(Regions of Interest，简称ROI)池化层收集输入的特征图和区域建议，综合这些信息后对区域建议提取特征图，然后送入全连接层判定目标类别。最后利用区域建议特征图计算区域建议的类别，同时再次使用边界框回归和非极大值抑制 [16] 获得检测框的精确位置，如图1所示。

Figure 1. Faster R-CNN algorithm flow chart

2.2. 区域建议网络

2.3. 非极大值抑制

IOU定义了两个候选框的重叠度，设矩形框$T2$ ，矩形框 $T1$$T2$ 重叠度的计算公式为： $\text{IOU}=\left(T1\cap T2\right)/\left(T1\cup T2\right)$ ，IOU就是矩形框 $T1$$T2$ 的重叠面积占 $T1$$T2$ 并集面基的比例。

NMS在广泛应用在计算机视觉领域，例如目标跟踪、目标识别、数据挖掘以及纹理分析等。

3. 相关工作

3.1. AlexNet模型与激活函数

AlexNet模型是2012年由Alex Krizhevsky提出，该模型采用了8层神经网络，5个连接层和3个全连接层。AlexNet使用ReLU函数作为CNN的激活函数，解决了在深度网络中Sigmoid函数造成的梯度弥散问题。

ReLU激活函数在反向传播求误差梯度时计算量相对较小，节省很多。ReLU是从底部开始半修正的一种函数， $\text{relu}\left(x\right)=\mathrm{max}\left(0,x\right)$ 。当输入 $x<0$ 时，输出为0，当 $x>0$ 时，输出为 $x$ 。ReLU激活函数能够更加快速的使网络收敛。它不会饱和，即它可以对抗梯度消失问题，至少在正区域 $x>0$ 可以这样，因此神经元至少在一半区域中不会把所有零进行反向传播。Leaky ReLU函数是经典的ReLu激活函数的变体，该函数的输出对于负值输入有很小的坡度。Leaky ReLU函数的导数总是不为零， $\text{leakyrelu}\left(x\right)=\mathrm{max}\left(0.01x,x\right)$ 。因此能够减少静默神经元的出现，允许基于梯度的学习。

3.2. 改进的Faster R-CNN

Figure 2. CNN model diagram

4. 实验和分析

4.1. 实验环境及数据集

4.2. 实验过程

CNN网络结构设置，实验中我们采用3 * 3的卷积核进行卷积，采用步幅为2的3 * 3的空间池区域对数据维度进行下采样操作，在全连接层中添加一个非线性的ReLU层。本实验用25层的CNN网络来进行特征提取。

 ，其中 $f\left(m\right)=L\left(\left\{{p}_{m}\right\},\left\{{t}_{m}\right\}\right)$ (1)

${L}_{c}\left({p}_{m},{p}_{m}^{*}\right)=-\mathrm{log}\left[{p}_{m}{p}_{m}^{*}+\left(1-{p}_{m}\right)\left(1-{p}_{m}^{*}\right)\right]$ (2)

${L}_{r}\left({t}_{m},{t}_{m}^{*}\right)$ 是回归损失，用 ${L}_{r}\left({t}_{m},{t}_{m}^{*}\right)=S\left({t}_{m},{t}_{m}^{*}\right)$ 来计算，S是smooth L1函数。

Fast R-CNN网络有分类得分层和候选框预测层，这两个同级输出层都是全连接层。分类得分层用于分类，输出维数组 $p$ ，表示属于  类和背景的概率。对每个ROI输出离散型概率分布： $p=\left({p}_{0},{p}_{1},\cdots ,{p}_{c}\right)$ 其中， $p$$c+1$ 类的全连接层利用softmax分类器计算得出。

${t}^{c}=\left({t}_{x}^{c},{t}_{y}^{c},{t}_{w}^{c},{t}_{h}^{c}\right)$ (3)

$c$ 表示类别的索引， ${t}_{x}^{c},{t}_{y}^{c}$ 是指相对于区域建议尺度不变的平移， ${t}_{w}^{c},{t}_{h}^{c}$ 是指对数空间中相对于区域建议的高与宽。

Faster R-CNN模型分四步训练，前两步训练区域建议和检测网络，用于Fast R-CNN网络，后两步将前两步中的网络结合，创建单个网络进行目标检测。每个步骤有不同的收敛速度，有利于为每个步骤指定独立的训练项。表1为Epoch为10的RPN网络训练过程的一部分数据。

Table 1. RPN training process

4.3. 实验结果及分析

Table 2. Experimental results of Faster R-CNN in different data sets

Table 3. Experimental results of Faster R-CNN based on AlexNet in different datasets

5. 总结

Target Detection and Recognition Based on Improved Faster R-CNN[J]. 图像与信号处理, 2019, 08(02): 43-50. https://doi.org/10.12677/JISP.2019.82007

1. 1. Chavali, N., Agrawal, H., Mahendru, A., et al. (2016) Object-Proposal Evaluation Protocol Is “Gameable”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Sunday, 26 June-1 July 2016, 835-844.

2. 2. Xie, S., Girshick, R., Dollár, P., et al. (2017) Aggregated Residual Transformations for Deep Neural Networks. Conference on Computer Vision and Pattern Recognition, 21-26 July 2017, 5987-5995.

3. 3. Dai, J., He, K. and Sun, J. (2015) Convolutional Feature Masking for Joint Object and Stuff Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3992-4000.

4. 4. Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Pro-posal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
https://doi.org/10.1109/TPAMI.2016.2577031

5. 5. Hosang, J., Benenson, R., Dollár, P., et al. (2016) What Makes for Effective Detection Proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 814-830.
https://doi.org/10.1109/TPAMI.2015.2465908

6. 6. Pinheiro, P.O., Collobert, R. and Dollár, P. (2015) Learning to Segment Object Candidates. Advances in Neural Information Processing Systems, Montreal, 7-12 December 2015, 1990-1998.

7. 7. Liu, W., Anguelov, D., Erhan, D., et al. (2016) Ssd: Single Shot Multibox Detector. In: European Conference on Computer Vision, Springer, Cham, 21-37.

8. 8. Girshick, R. (2015) Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Araucano Park, 11-18 December 2015, 1440-1448.

9. 9. Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Se-mantic Segmentation. IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 1.

10. 10. Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., et al. (2013) Selective Search for Object Recognition. International Journal of Computer Vision, 104, 154-171.
https://doi.org/10.1007/s11263-013-0620-5

11. 11. Hariharan, B., Arbeláez, P., Girshick, R., et al. (2014) Simultaneous Detection and Segmentation. In: European Conference on Computer Vision, Springer, Cham, 297-312.

12. 12. Erhan, D., Szegedy, C., Toshev, A., et al. (2014) Scalable Object Detection Using Deep Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 2147-2154.

13. 13. Szegedy, C., Ioffe, S., Vanhoucke, V., et al. (2017) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.

14. 14. Wang, X., Girshick, R., Gupta, A., et al. (2018) Non-Local Neural Networks. The IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, Vol. 1, 4.

15. 15. Wei, S.E., Ramakrishna, V., Kanade, T., et al. (2016) Convolutional Pose Machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4724-4732.

16. 16. Neubeck, A. and Van Gool, L. (2006) Efficient Non-Maximum Suppression. 18th International Conference on Pattern Recognition, Hong Kong, 20-24 August 2006, Vol. 3, 850-855.

17. 17. Girshic, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587.

18. 18. Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788.

19. 19. Zeiler, M.D. and Fergus, R. (2014) Visualizing and Understanding Convolutional Networks. In: European Conference on Computer Vision, Springer, Cham, 818-833.

20. 20. Gulcehre, C., Moczulski, M., Denil, M., et al. (2016) Noisy Activation Functions. International Conference on Machine Learning, 48, 3059-3068.