Roomnet: End-to-End Room Layout Estimation摘要和简介翻译

Roomnet: End-to-End Room Layout Estimation

房间网络:端到端的房间布局估计

Abstract

摘要

       This paper focuses on the task of room layout estimation from a monocular RGB image. Prior works break the problem into two sub-tasks: semantic segmentation of the floor, walls, celling to produce layout hypotheses, followed by an interactive optimization step to rank these hypotheses.

       这篇论文主要研究从一个单目的RGB图像中估计房间的布局。此前的工作将这个问题划分成如下的两个子任务:关于地板、墙壁、天花板的语义分割,来生成布局的假设,这之后是交互式的油画布周,来评估这些假设。

       In contrast, we adopt a more direct formulation of this problem as one of estimating an ordered set of room layout keypoints. The room layout and the corresponding segmentation is completely specified given (by) the locations of these ordered keypoints. We predict the locations of the room layout keypoints using RoomNet, and end-to-end trainable encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN, we achieve state-of-the-art performance along with 200× to 600× speed up compared to the most recent work. Additionally, we present optional extensions to the RoomNet architecture such as including recurrent computations and memory units to refine the keypoints locations under the same parametric capacity.

相反地,我们采用一个此问题的更加直接的方法,就是一个房间布局的关键点的有序集合的估计。房间的布局和对应的分割是完全由这些有序的关键点的位置所给定的。我们使用RoomNet和端到端的可训练的编码-解码网络,来估计房间布局关键点的位置。在挑战Headau和LSUN标准数据集的时候,我们的表现达到了业界最佳水平,相比于最近的其他工作,我们有了200倍到600倍的速度提升。另外,我们提出了可选择的RoomNet网络结构的拓展,比如说包含循环计算和内存节点,用于在相同的参数容量下微调关键点的位置。

  1. 1.     Introduction
  2. 1.     简介

Room layout estimation from a monocular image, which aims to delineate a 2D boxy representation of an indoor scene, is an essential step for a wide variety of computer vision tasks, and has recently received great attention from several applications. These include indoor navigation [29], scene reconstruction/rendering [19], and augmented reality [46, 25, 10].

从一个单目图像中估计房间的布局是对于许多计算机视觉任务都是重要的一步,估计房间布局的目的是描绘一个二维方形的室内场景表示,并且这一问题已经从各个应用上受到了广泛的关注。关注的应用包括室内导航【29】,场景重建/场景生成【19】,以及增强现实【46,25,10】。

The field of room layout estimation has been primarily focused on using bottom-up image features such as local color, texture, and edge cues followed by vanishing point detection. A separate post-processing stage is used to clean up feature outliers and generate/rank a large set of room layout hypotheses with structured SVMs or conditional random fields (CRFs) [15, 11, 16, 36, 49]. In principle, the 3D reconstruction of the room can be obtained (up to scale) with knowledge of the 2D layout and the vanishing points. However, in practice, the accuracy of the final layout prediction often largely depends on the quality of the extracted low-level image features, which in itself is susceptible to local noise, scene clutter and occlusion.

房间布局估计的领域主要的关注点,一直以来是使用自底向上的图像特征,比方说局部的颜色、纹理和边缘线索,以及消失点的检测。一个单独立的后处理阶段用于清理外层特征,并生成或者是排名一大批房间布局的假设,使用结构化的支持向量机或者条件随机场【15, 11, 16, 36, 49】。原则上,房间的三维重建可以从二维布局和消失点的知识中获得(取决于尺度)。然而,实际上,最终布局的准确率在很大程度上取决于低级图像特征的提取,这很容易受到局部噪点、场景杂乱度和遮挡的影响。

Recently, with the rapid advances in deep convolutional neural networks (CNNs) for semantic segmentation [5, 27, 32, 2], researchers have been exploring the possibility of using such CNNs for room layout estimation. More specifically, Mallya et al. [28] first train a fully convolutional network (FCN) [27] model to produce “informative edge maps” that replace hand engineered low-level image feature extraction. The predicted edge maps are then used to sample vanishing lines for layout hypotheses generation and ranking. Dasgupta et al. [7] use the FCN to learn sematic surface labels such as left wall, front wall, right wall, ceiling, and ground. The connected components and hole filling techniques are used to refine the raw per pixel prediction of the FCN, flowed by the classic vanishing point/line sampling methods to produce room layouts. However, despite the improved results, these methods use CNNs to generate a new set of “low-level” features and fall short of exploiting the end-to-end learning ability of CNNs. In other words, the raw CNN predictions need to be post-processed by an expensive hypotheses testing stage to produce the final layout. This, for example, takes the pipeline of Dasgupta et al. [7] 30 seconds to process each frame.

近来,受到应用于语义分割的深度卷积神经网络快速发展带来的影响,研究人员已经开始探索使用类似的卷积神经网络【5, 27, 32, 2】来预测房间布局的可能性了。特别地,Mallya等人在【28】中首次训练了一个全卷积神经网络模型【27】来产生“有信息的边缘图”,这替代了手工设计的低级图像特征提取方法。预测的边缘图接下来被用于采样消失线,以生成和排列房间布局的假设。Dasgupta等人在【7】中应用了全卷积神经网络,来学习语义上的表面标签,比如说左侧的墙、前面的墙、右侧的墙、天花板和地板等。相连的组间和填充孔洞的技术,可以被用来修复原始的全链接神经网络在每个像素点上的预测值,紧接着就用到了经典的消失点、消失线的采样技术以生成房间的布局。然而,尽管结果有所提升,这些方法使用卷积神经网络来生成一个新的低级特征集合,这就没有利用好卷积神经网络的端到端学习能力。换句话来说,原始的卷积神经网络预测需要昂贵的假设测试后处理阶段来产生最后的布局。比如说,这个在Dasgupta等人【7】中就花费了三十秒钟的时间用来在流水线中处理每一帧(时间代价是昂贵的)。

In this work, we address the problem top-down by directly training CNNs to infer both the room layout corners (keypoints) and room type. Once the room type is inferred and the corresponding set of ordered keypoints are localized, we can connect them in a specific order to obtain the 2D spatial room layout. The proposed method, RoomNet, is direct and simple as illustrated in Figure 1: The network takes an input image of size 320 × 320, processing it through a convolutional encode-decoder architecture, extracts a set of room layout keypoints, and then simply connects the obtained keypoints in a specific order to draw a room layout. The semantic segmentation of the layout surfaces is simply obtainable as a consequence of this connectivity.

       在这项工作中,我们用自顶向下的方法解决了这个问题,直接训练卷积神经网络来推测房间布局的角点和房间的种类。一旦房间的种类被推测出来之后,对应的有序点集也就被定位了,我们可以使用一个特殊的顺序将他们连接在一起,来获得二维的空间布局。提出来的名为RoomNet的方法,是展示在图一中的直接又简单的方法:网络使用一个320*320的图像作为输入,将它通过一个卷积编码-反编码的卷积结构,提取一系列房间布局关键点的集合,并且简单的将这些获取到的点以一个特殊的顺序连接起来,以描绘出房间的布局。在布局表面的语义分割可以简单地作为这个连接的结果。

       Overall, we make several contributions in this paper: (1) reformulate the task of room layout estimation as a keypoint localization problem that can be directly addressed using CNNs, (2) a custom designed convolutional encoder-decoder network, RoomNet, for parametrically efficient and effective joint keypoint regression and room layout type classification, and (3) state-of-the-art performance on challenging benchmark Hedau [15] and LSUN [50] along with 200× to 600× speedup compared to the most recent work.

       总的来说,我们在这篇论文中做出了如下的几个贡献:(1)重新定义房间布局的估计为一个关键点定位的任务,这个任务可以被直接的使用卷积神经网络进行解决。(2)一个定制的卷积编码-解码网络RoomNet,为了参量有效的和有效的将关键点回归和房间布局种类分类有效的连接在一起。(3)在基准训练Hedau 【15】和 LSUN【50】上表现出来的性能达到了业界前沿水平,相比于近期的其他工作而言有了两百倍到六百倍的加速。

 

Figure 1. (a) Typical multi-step pipeline for room layout estimation. (b) Room layout estimation with RoomNet is direct and simple: run Roomnet, extract a set of room layout keypoints, and connect the keypoints in a specific order to obtain the layout.

图一(a)是典型的用于房间布局估计的步骤流程(b)使用RoomNet进行房间布局估计是简单又直接的:运行Roomnet,提取一系列房间布局的关键带你,并且将这些关键点用特定顺序连接起来,就得到了房间的布局。

posted @ 2021-06-18 11:44  ProfSnail  阅读(205)  评论(0编辑  收藏  举报