Learning Shape Abstractions by Assembling Volumetric Primitives摘要和简介翻译

Learning Shape Abstractions by Assembling Volumetric Primitives

使用集成体元学习形状的抽象

Abstract

摘要

       We present a learning framework for abstracting complex shapes by learning to assemble object using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure. We also examine applications for image-based prediction as well as shape manipulation.

       我们提出了一个用于抽象复杂形状的学习框架,通过学习集成物体的三维体元进行工作。除了生成简单地和可以在集合上说明的三维物体的解释,我们的框架同样也允许我们来自动的发现并探索数据中一致的结构。我们证明了使用我们的方法可以预测形状的表示,这个形状的解释可以用于两个方面,第一个方面是在形状集合的各个实例中获得一个一致的解释,第二个方面是构建一个可解释的形状相似度度量。我们也验证了应用于基于图像的预测,以及形状的控制。

  1. Introduction
  2. 简介

“Treat nature by means of cylinder, the sphere, the cone, everything brought into proper perspective” ---- Paul Cezanne

“使用圆柱、球体、圆锥体等一切能够带来合适感知的东西来对待自然。”——Paul Cezanne.

       Cezanne’s insight that an object can be conceived as assembled from a set of volumetric primitives has resurfaced multiple times in the vision and graphics literature. In computer vision, generalized cylinders were introduced by Binford back in 1971, where a cross-sectional area is swept along a straight or curved axis while possibly being shrunk or expanded during the process [3]. One of the key motivations was parsimony of description – an object could be described by relatively few generalized cylinders, each of which in turn requiring only a few parameters. Volumetric primitives remained popular through the 1990s as they provided a coherent framework for exampling shape inference from a singe image, perceptual organization, as well as recognition of a 3D object from 2Dviews. However, fitting generalized cylinders to image data required considerable hand crafting, and as machine learning techniques for object recognition came to the fore in the 1990s, this paradigm faded from the main stage.

       Cezanne的关于一个物体可以被视作许多体元的集合这一思想,多次在视觉和图形学著作方面浮出水面。在计算机视觉领域,一般化的圆柱体于1971年被Binford提出,当一个界面的区域被交换,在直的或者是弯曲的坐标轴下进行交换,在这个过程中可能会被收缩或被放大【3】。关键的动力之一是描述的过度节俭——一个物体可以被相对较少的一般化的圆柱体进行描述,反过来说每一个都只需要一少部分的参数。体元到上世纪九十年代还一直相当流行,因为他们提供了一个连贯的框架,用于从一个单张图像中采样形状的推测,感知上的组织,以及从二维视角中识别三维物体。然而,然而,将泛化的圆柱体适应到图像数据中需要非常大量的手工作业,并且因为用于物体识别的机器学习技术在1990年代逐渐崭露头角,这一范例也就退出了主流的研究领域。

       Of course, finding parsimonious explanations for complex phenomena lies at the core of learning-based visual understanding. Indeed, machine learning is only possible because our visual world, despite its enormous complexity, is also highly structured – visual patterns don’t just happen once, but keep on repeating in various configurations. In contemporary computer vision, this structure is most often modeled via human supervision: the repeating patterns are labeled as objects or object parts, and supervised learning methods are employed to find and name theme in novel imagery. However, it could seem more satisfying if complex structures could be explained in terms of simpler underlying structures.

       当然了,找到复杂现象背后的简洁解释是基于学习的视觉理解的核心。 事实上,只有机器学习是可能的,因为我们的可见的世界中,先不论世界的巨大的复杂性,这个世界也是高度结构化的——视觉模式并不仅仅只出现一次,而是重复着许多不同的形状。在当代的计算机视觉中,这个结构经常通过人类的监管进行建模:常常出现的模式会被标签为某个物体或者物体的某个部分,并通过监督学习的方法在新的意向中找到并命名他们。然而,如果能够用更简单的潜在结构来解释复杂的结构就会更加让人心情愉悦了。

In this paper we return to the classic problem of explaining objects with volumetric primitives, but using the modern tools of unsupervised learning and convolutional neural networks (CNNs). We choose the simplest possible primitives, rigidly transformed cuboids, and show how deep convolutional networks can be trained to assemble arbitrary 3D objects out of them (at some level of approximation). The main reason we succeed where the classic approaches failed is because we aim to explain the entire dataset of 3D objects jointly, allowing us to learn the common 3D patterns directly from the data.

       在这篇论文中,我们又研究了经典的用体元解释物体的问题,但是使用了现代的监督学习和卷积神经网络工具来进行学习。我们选择最简单的可能的基元,严格的转换立方体,并且展示出深度卷积神经网络是呼和可以被训练,用于使用立方体集成任意的三维物体(在某个模拟的级别而言)。我们能够成功而传统方法失败的原因是,我们着重合起来解释整个三维物体的数据集,能够让我们直接从数据中学习常见的三维模式。

       While the representation of the 3D object shapes e.g. as meshes or voxel occupancies, is typically complex and high-dimensional, the resulting explanation in terms of basic primitives is parsimonious, with a small number of parameters. As examples of their applicability, we leverage the primitive based representation for various tasks e.g. part discovery, image based abstraction, shape manipulation etc. Here we do not wish to reprise the classic debates on the value of volumetric primitives – while they were oversold in the 70s and 80s, they suffer from complete neglect now, and we hope that this demonstration of feasibility of learning how to assemble an object from volumetric primitives will reignite interest. Code is available at https://shubhtuls.github.io/volumetricPrimitives.

       尽管三维物体形状的表示(比如说网格或者是体素占用)是一个经典的复杂并且高位的,但是用基本原始形式进行表示就很节省地方,只需要少数的参数就可以了。作为他们可用性的例子,我们利用基于表达的基本元素以应对各种任务,比方说部分发现,基于抽象的图像,操纵形状等。在这里,我们并不想重复体元价值的经典辩论了——他们在上世纪六十年代和七十年代中都被过度吹嘘了,他们也在现在受到了完全忽视的痛苦,我们希望这个学习如何从体元中集成一个物体的可行性,会让相关领域的兴趣再次重燃。代码已经开源,请参见:https://shubbtuls.github.io/volumetricPrimitives

posted @ 2021-06-18 11:46  ProfSnail  阅读(137)  评论(0编辑  收藏  举报