SURGE: Surface Regularized Geometry Estimation from a Single Image摘要和简介翻译

SURGE: Surface Regularized Geometry Estimation from a Single Image

从单一图像中得到表面正则化的几何结构估计

Abstract

摘要

       This paper introduces an approach to regularize 2.5D surface normal and depth predictions at each pixel given a single input image. The approach infers and reasons about the underlying 3D planar surfaces depicted in the image to snap predicted normal and depths to inferred planar surfaces, all while maintaining fine detail within objects. Our approach comprises two components: (i) a four-stream convolutional neural network (CNN) where depths, surface normals, and likelihoods of planar region and planar boundary are predicted at each pixel, followed by (ii) a dense conditional random field (DCRF) that integrates the four predictions such that the normal and depths are compatible with each other and regularized by the planar region and planar boundary information. The DCRF is formulated such that gradients can be passed to the surface normal and depth CNNs via backpropagation. In addition, we propose new planar-wise metrics to evaluate geometry consistency within planar surfaces, which are more tightly related to dependent 3D editing applications. We show that our regularization yields a 30% relative improvement in planar consistency on the NYU v2 dataset. [24]

       这篇论文引入了一个方法来正则化2.5维的表面法向量,并对输入图像的每个像素都进行深度预测。这个方法推测并推理图像中潜在的三维平面表面,所有的这些都同时保持着物体的小细节。我们的方法由两个组成部分构成:(一)是四流卷积神经网络(CNN),这四个分支分别是深度和表面法向量,以及类似平面的区域和平面边界,这些都会逐个像素进行预测。接下来是(二)一个紧密的条件随机场,用于整合四个预测,使得法向量和深度都是彼此兼容的,并且平面区域和平面边界信息都被正则化了。紧密条件随机场被构建了,所以梯度可以从表面法向量和深度卷积神经网络通过反向传导进行传递了。另外,我们提出了新的平面矩阵来评估平面表面的几何结构一致性,这与依赖三维编辑的应用紧密相关。我们展示出自己的正则化产生了30%的相对提升,这一点体现在NYU v2数据集上的平面一致性上。

1       Introduction

1       简介

Recent efforts to estimate the 2.5D layout of a depicted scene from a single image, such as per-pixel depths and surface normal, have yielded high-quality outputs respecting both the global scene layout and fine object detail [2, 6, 7, 29]. Upon closer inspection, however, the predicted depths and normal may fail to be consistent with the underlying surface geometry. For example, consider the depth and normal predictions from the contemporary approach of Eigen and Fergus [6] shown in Figure 1 (b) (Before DCRF). Notice the significant distortion in the prediction depth corresponding to the depicted planar surfaces, such as the back wall and cabinet. We argue that such distortion arises from the fact that the 2.5D predictions (i) are made independently per pixel from appearance information alone. and (ii) do not explicitly take in to account the underlying surface geometry. When 3D geometry has been used e.g., [29], it often consists of a boxy room layout constraint, which may be too coarse and fail to account for local planar regions that do not adhere to the box constraint. Moreover, when multiple 2.5D predictions are made (e.g., depth and normal), they are not explicitly enforced to agree with each other.

近来许多从单一相片中估计2.5维的场景布局,比如说每个像素的深度、表面法向量等,已经产生了关于全局场景布局和小物体细节方面的高质量输出【2, 6, 7, 29】。然而更进一步的检查发现,预测的深度和法向量可能会在潜在表面几何特征的一致性方面失效。比如说,考虑展现于图1(b)(在稠密条件随机场之前)用当代的Eigen 和Fergus的方法,来预测深度和法向量。注意到在预测对应于展示出的平面表面的深度时,显示出了显著的变形,比如后面的墙和橱柜。我们认为这样的形变是由于:2.5维预测(i)是通过对每个像素独立的从外表的信息中得到的,并且(ii)并不显式的考虑潜在的表面几何结构。当三维几何结构被使用的时候,比方说参见【29】,他常常由箱状的室内布局所约束,这可能太过粗糙了并且不能考虑局部平面区域,这些平面区域并不受盒装约束。此外,当有了许多2.5维预测的时候(比如深度和法向量),他们并不会显式的强制彼此间保持一致。

To overcome the above issues, we introduce an approach to identify depicted 3D planar regions in the image along with their spatial extent, and to leverage such planar regions to regularize the depth and surface normal outputs. We formulate our approach as a four-stream convolutional neural network (CNN), followed by a dense conditional random field (DCRF). The four-stream CNN independently predicts at each pixel the surface normal, depth, and likelihoods for planar region and planar boundary. The four cues are integrated into a DCRF, which encourages the output depths and normal to align with the inferred 3D planar surfaces while maintaining fine detail within objects. Furthermore, the output depths and normal are explicitly encouraged to agree with each other.

为了克服上述的这些问题,我们引入了一个方法,用于识别描述图中的三维平面区域,以及这些空间的扩展,并利用这些平面区域来正则化深度和表面法向量的输出。这个方法的构成如下,首先用一个四分支的卷积神经网络(CNN),接下来是稠密条件随机场(DCRF)。四分支卷积神经网络独立的预测每个像素的表面法向量、深度、平面区域和平面边界的可能性。这四个线索被集成到稠密条件随机场中,鼓励输出的深度和法向量与推测的三维表面对齐,同时维持物体的内部细节。此外,还显式的鼓励输出的深度和法线保持一致。

 

 

 

 

Figure 1: Framework of SURGE system. (a) We include surface regularization in geometry estimation though DCRF, and enable joint learning with CNN, which largely improves the visual quality (b)..

图一:表面正则化几何估计系统的框架。(a)我们通过稠密条件随机场,将表面正则化包括在几何结构估计中,并且让他可以和神经网络一起学习,这很大程度上提升了图b中的可视化质量。

We show that our DCRF is differentiable with respect to depth and surface normal, and allows back-propagation to the depth and normal CNNs during training. We demonstrate that the proposed approach shows relative improvement over the base CNNs for both depth and surface normal prediction on the NYU v2 dataset using the standard evaluation criteria, and is significantly better when evaluated using our proposed plane-wise criteria.

我们证明了我们的稠密条件随机场对于深度和表面法向量是可微的,并且允许在训练过程中对深度和法向量通过卷积神经网络进行反向传播。我们展示出了提出的方法显示基于卷积神经网络有了相对的提升,使用NYU v2数据集,我们的方法无论是在深度预测还是表面法向量预测上都在标准的评价指标上有提升。当使用我们提出的平面方面的评价标注你的时候,提升的效果更佳明显。

posted @ 2021-06-18 11:48  ProfSnail  阅读(121)  评论(0编辑  收藏  举报