A Point Set Generation Network for 3D object Reconstruction from a Single Image摘要和简介翻译

A Point Set Generation Network for 3D object Reconstruction from a Single Image




       Generating of 3D data by deep neural networks has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations, and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward from output – point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows strong performance for 3D shape completion and promising ability in making multiple plausible predictions.


1.     Introduction

1.     简介

        As we try to duplicate the success of current deep convolutional architectures in the 3D domain, we face a fundamental representational issue. Extant deep net architectures for both discriminative and generative learning in the signal domain are well-suited to data that is regularly sampled, such as images, audio, or video. However, most common 3D geometry representations, such as 2D meshes or point clouds are not regular structures and do not easily fit into architectures that exploit such regularity for weight sharing, etc. That is why the majority of extant works on using deep nets for 3D data resort to either volumetric girds or collections of images (2D views of the geometry). Such representations, however, lead to difficult trade-offs between sampling resolution and net efficiency. Furthermore, they enshrine quantization artifacts that obscure nature invariances of the data under rigid motions, etc.


        In this paper we address the problem of generating the 3D geometry of an object based on a single image of that object. We explore generative networks for 3D geometry based on a point cloud representation. A point cloud representation may not be as efficient in representing the underlying continuous 3D geometry as compared to a CAD model using geometric primitives or even a simple mesh, but for our purposes it has many advantages. A point cloud is a simple, uniform structure that is easier to learn, as is does not have to encode multiple primitives or combinatorial connectivity patterns. in addition, a point cloud allows simple manipulation when it comes to geometric transformations and deformations, as connectivity dose not have to be updated. Our pipeline infers the point positions in a 3D frame determined by the input image and the inferred viewpoint position.


        Given this unorthodox network output, one of our challenges Is how to measure loss during training, as the same geometry may admit different point cloud representation at the same degree of approximation. Unlike the usual L2 type losses, we use the solution of a transportation problem based on the Earth Mover’s distance (EMD), effectively solving an assignment problem. We exploit an approximation to the EMD to provide speed as well as ensure differentiability for end-to-end training.

        考虑到这个非正统的网络输出,我的遇到的挑战之一是如何评估在训练中的损失,因为相同的几何结构可能会在相同程度的近似上接受不同的点云表示。不像常见的L2种类的损失,我们使用了一个机遇Earth Mover距离转换问题的解决方案,有效的解决了一个分配的问题。我们探索了一个近似于EMD的方法,提供了速度以及保证端到端训练的可微性。

        Our approach effectively attempts to solve the ill-posed problem of 3D structure recovery from a single projection using certain learned priors. The network has to estimate depth for the visible parts of the image and hallucinate the rest of the object geometry, assessing the plausibility of several different completions. From a statistical perspective, it would be ideal if we can fully characterize the landscape of the ground truth space, or be able to sample plausible candidates accordingly. If we view this as a regression problem, then it has a rather unique and interesting feature arising from inherent object ambiguities in certain views. These are situations where there are multiple, equally good 3D reconstructions of a 2D image, making our problem very different from classical regression/classification settings, where each training sample has a unique ground truth annotation. In such settings the proper loss definition can be crucial in getting the most meaningful result.


       Our final algorithm is a conditional sampler, which samples plausible 3D point clouds from the estimated ground truth space given an input image. Experiments on both synthetic and real world data verify the effectiveness of our method. Our contributions can be summarized as follows:


l  We use deep learning techniques to study the point set generation problem;

l  我们使用深度学习技术来学习点集生成问题。

l  On the task of 3D reconstruction from a single image, we apply our point set generation network and significantly outperform state of the art;

l  在从一个单张图像中生成三维重建的工作上,我们应用了点集生成网络,并比现有工作有显著提升。

l  We systematically explore issue in the architecture and loss function design for point generation network;

l  我们系统的研究了用于点生成网络的结构和损失函数的问题。

l  We discuss and address the ground-truth ambiguity issue for the 3D reconstruction from single image task.

l  我们讨论并解决了真实值模糊的问题,用于从单个图像中重建的任务。

Source and code demonstrating our system can be obtained from https://github.com/fanhqme/PointSetGeneration.


posted @ 2021-06-18 11:37  ProfSnail  阅读(421)  评论(0编辑  收藏  举报