论文标题
带有遮挡处理的单眼视图生成的轻量级神经网络
A Lightweight Neural Network for Monocular View Generation with Occlusion Handling
论文作者
论文摘要
在本文中,我们提供了一个非常轻巧的神经网络体系结构,该架构在立体声数据对中进行了训练,该架构可以从一个图像中执行视图合成。随着多视图格式的越来越多的成功,这个问题确实越来越相关。该网络返回根据差异估计构建的预测,该预测使用遮挡处理技术填充了错误的预测区域。为此,在训练期间,网络学会了估计一对立体声输入图像的左右一致性结构约束,以便能够在测试时间从一个图像中复制它。该方法建立在混合两个预测的想法上:基于差异估计的预测,以及基于封闭区域中直接最小化的预测。该网络还能够在训练中和测试时识别这些遮挡区域,并检查产生的差异图的Pixelwise左右一致性。在测试时,该方法可以从一个输入图像以及深度图和预测中的PixelWise置信度度量中生成左侧和右侧视图。在具有挑战性的Kitti数据集上,工作在视觉和度量方面的最新方法都优于最先进的方法,同时降低了所需的参数数(6.5 m)的非常重要的数量级(5或10倍)。
In this article, we present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image. With the growing success of multi-view formats, this problem is indeed increasingly relevant. The network returns a prediction built from disparity estimation, which fills in wrongly predicted regions using a occlusion handling technique. To do so, during training, the network learns to estimate the left-right consistency structural constraint on the pair of stereo input images, to be able to replicate it at test time from one single image. The method is built upon the idea of blending two predictions: a prediction based on disparity estimation, and a prediction based on direct minimization in occluded regions. The network is also able to identify these occluded areas at training and at test time by checking the pixelwise left-right consistency of the produced disparity maps. At test time, the approach can thus generate a left-side and a right-side view from one input image, as well as a depth map and a pixelwise confidence measure in the prediction. The work outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset, all while reducing by a very significant order of magnitude (5 or 10 times) the required number of parameters (6.5 M).
