本文解决了从单张图像估计深度信息的问题。单张图像与深度图之间的映射是是模棱两可的,它需要全局信息和局部信息。本文部署了一个全卷积U型神经网络,它用预训练的ResNet-50网络提取图像特征,然后用残差上采样模块将特征图恢复到深度图的尺寸大小,并且使用了跳跃链接,整个网络呈现U型,从而对全局信息和局部信息进行融合。整个网络可以进行端到端的训练。 The problem of depth estimation from single image has been addressed. The mapping between a single image and the depth map is inherently ambiguous, and requires both global and local information. This paper presents a fully convolutional U-net whose encoder is pretrained ResNet50 without fully connected layer or pooling layer, and uses residual up-sampling layers to enlarge the feature maps. Besides, skip connection is introduced, making the model U-net, to fuse global and local information. The network can be end-to-end trained.
经典方法依赖于对场景几何的强烈假设,依赖于手工制作的特征和概率图模型,而概率图模型利用图像的水平对齐几何信息或其他的几何信息。比如,Saxena et al. [3] 利用线性回归和MPF从图像特征中预测出深度,之后又将该工作扩展到Make3D系统 [4] 。然而,这个系统依赖于图像的水平对齐。
2.2. 基于特征的映射方法
第二种相关工作是基于特征的,给定一张RGB图像,在RGB-D的数据集中找到最近邻的图相对,检索出来的深度图会被用来产生最后的深度图。Karsch et al. [5] 利用SIFT流,之后用一种全局优化的方法,而Konrad et al. [6] 计算检索出来的深度图的中值,之后用交叉双边滤波来平滑。Liu et al. [7] 将优化问题建模为连续和离散的可变势能的条件随机场Conditional Random Field (CRF)。这些方法都基于这样一个假设:RGB图像中的区域之间的相似性也意味着相似的深度线索。
2.3. 基于卷积神经网络的方法
近来,基于CNN的深度估计方法开始占据主流。由于这个任务跟语义分割很相近,所以大多数工作都基于The Image Net Large Scale Visual Recognition Challenge (ILSVRC) [8] 中最成功的架构。Eigen et al. [9] 是第一个运用CNN来预测单目图像深度的。他们用了两个深度网络模块,第一个做全局粗糙的速度估计,第二个在局部改善预测结果。这个想法之后被扩展 [2] ,三个CNN网络栈被用来额外预测表面法线、类别以及深度。另外一个提高预测深度图质量的方向是将卷积神经网络与图模型结合。Liu et al. [10] 提出用一种CRF loss的方式在CNN训练过程中学习一元的和成对势能,这种方法没有利用几何先验就达到了最好的结果.这个想法行得通是因为深度值是连续的 [11] 。Li et al. [12] 和Wang et al. [13] 用层次CRFs来细化patch-wise的CNN预测结果,从超分辨到像素级别。
王小康,付小宁,董 悫. 一种基于U型全卷积神经网络的深度估计模型Image Depth Estimation Model Based on Fully Convolutional U-Net[J]. 计算机科学与应用, 2019, 09(02): 250-255. https://doi.org/10.12677/CSA.2019.92029
参考文献ReferencesRen, X., Bo, L. and Fox, D. (2012) Rgb-(d) Scene Labeling: Features and Algorithms. 2012 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), Providence, 16-21 June 2012, 2759-2766.Eigen, D. and Fergus, R. (2015) Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 2650-2658. <br>https://doi.org/10.1109/ICCV.2015.304Saxena, A., Chung, S.H. and Ng, A.Y. (2006) Learning Depth from Single Monocular Images. Advances in Neural Information Processing Systems, 18, 1161-1168.Saxena, A., Sun, M. and Ng, A.Y. (2009) Make3d: Learning 3d Scene Structure from a Single Still Image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 824-840. <br>https://doi.org/10.1109/TPAMI.2008.132Karsch, K., Liu, C. and Kang, S. (2012) Depth Extraction from Video Using Non-Parametric Sampling. Proceedings of the 12th European Conference on Computer Vision—Volume Part V, Florence, 7-13 October 2012, 775-788.
<br>https://doi.org/10.1007/978-3-642-33715-4_56Konrad, J., Wang, M. and Ishwar, P. (2012) 2D-to-3D Image Conversion by Learning Depth from Examples. 2012 IEEE Computer Society Conference on Computer Vision and Pat-tern Recognition Workshops (CVPRW), Providence, 16-21 June 2012, 16-22. <br>https://doi.org/10.1109/CVPRW.2012.6238903Liu, M., Salzmann, M. and He, X. (2014) Dis-crete-Continuous Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 716-723.
<br>https://doi.org/10.1109/CVPR.2014.97Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. (2015) Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252. <br>https://doi.org/10.1007/s11263-015-0816-yEigen, D., Puhrsch, C. and Fergus, R. (2014) Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network. Advances in Neural Information Processing Systems, 2366-2374.Liu, F., Shen, C. and Lin, G. (2015) Deep Con-volutional Neural Fields for Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 5162-5170. <br>https://doi.org/10.1109/CVPR.2015.7299152Liu, F., Shen, C., Lin, G. and Reid, I. (2016) Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2024-2039.
<br>https://doi.org/10.1109/TPAMI.2015.2505283Li, B., Shen, C., Dai, Y., van den Hengel, A. and He, M. (2015) Depth and Surface Normal Estimation from Monocular Images Using Regression on Deep Features and Hierar-chical CRFS. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 1119-1127.Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B. and Yuille, A.L. (2015) Towards Unified Depth and Semantic Prediction from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, Boston, 7-12 June 2015, 2800-2809.Cao, Y., Wu, Z. and Shen, C. (2016) Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. arXiv:1605.02305 [cs.CV]Li, B., Dai, Y., Chen, H. and He, M. (2017) Single Image Depth Estimation by Dilated Deep Residual Convolutional Neural Network and Soft-Weight-Sum Inference. arXiv:1705.00534 [cs.CV]Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. International Con-ference on Medical Image Computing and Computer-Assisted Intervention, Munich, 5-9 October 2015, 234-241. <br>https://doi.org/10.1007/978-3-319-24574-4_28Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F. and Navab, N. (2016) Deeper Depth Prediction with Fully Convolutional Residual Networks. 2016 Fourth International Conference on 3D Vision (3DV), Stanford, 25-28 October 2016, 239-248.Silberman, N., Hoiem, D., Kohli, P. and Fergus, R. (2012) Indoor Segmentation and Support Inference from RGBD Images. Computer Vision— ECCV 2012, Florence, 7-13 October 2012, 746-760.
<br>https://doi.org/10.1007/978-3-642-33715-4_54