【论文速读】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection

XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection

作者和代码

关键词

文字检测、多方向、SSD、$$xywh\theta$$、one-stage，开源

方法亮点

核心思想认为，分类问题对于旋转不敏感，但回归问题对于旋转是敏感的，因此两个任务不应该用同样的特征。所以作者提出来基于旋转CNN的思路，先对特征做不同角度的旋转，该特征用于做框的回归，而对分类问题，采用沿oriented response pooling，所以对旋转不敏感。

Text coordinates are sensitive to text orientation. Therefore, the regression of coordinate offsets should be performed on rotation-sensitive features.

In contrast to regression, the classification of text presence should be rotation-invariant, i.e., text regions of arbitrary orientations should be classified as positive.

Figure 1: Visualization of feature maps and results of baseline and RRD. Red numbers are the classification scores. (b): the shared feature map for both regression and classification; (c): the result of shared feature; (d) and (e): the regression feature map and classification feature map of RRD; (f): the result of RRD.

首次使用Oriented Response Convolution来做文字检测

方法概述

本文方法是SSD进行修改，除了修改输出预测4个点坐标偏移量来检测倾斜文本外，还利用了ORN来提取旋转敏感的文字特征，然后在分类分支增加最大池化来提取针对分类不敏感的特征。

方法细节

网络结构

该网络结构由SSD改造，不同的是原来的多层融合侧边连接是普通的卷积，但这里换成了RSR。每一个RSR分为两个部分，第一部分是把卷积改成多种不同方向的oriented convolution。第二部分是用来做predicition，包括regression和classification两个分支。classification分支的不同地方在于多了一个oriented response pooling。

Figure 2: Architecture of RRD. (a) The rotation-sensitive backbone follows the main architecture of SSD while changing its convolution into oriented response convolution. (b) The outputs of rotation-sensitive backbone are rotation-sensitive feature maps, followed by two branches: one for regression and another for classification based on oriented response pooling. Note that the inception block is optional.