目标检测 -- faster rcnn
Faster RCNN
-
第一阶段:
images ---> backbone ---> rpn
rpn_locs: [b, 22500, 4] rpn_scores: [b, 22500, 2] 22500=50 * 50 * 9
-
根据真实数据生成数据
先验框anchors: [22500, 4] true-box: [m, 4]
gt_rpn_loc: 通过iou找出每个先验框最适合的真实框,计算此时先验框与之对应真实框的偏移(dx, dy, dw, dh) shape[22500,4]
gt_rpn_label: 设置iou阈值 判断先验框是否被建议(-1, 0,1)shape[22500]
-
计算loss
rpn_loc_loss 边框回归损失:通过gt_rpn_label>0取出相应位置的gt_rpn_loc与rpn_locs,_smooth_l1_loss
rpn_cls_loss 类别损失:rpn_scores与gt_rpn_label交叉熵损失
rois:过滤(宽高比),排序(rpn_scores)以及nms(可设置nms_thresh)。 选择n_post_nms个建议框 [600, 4] *b
-
-
第二阶段
rpn --> head
-
根据真实数据与建议框生成数据
sample_roi, gt_roi_loc, gt_roi_label = self.proposal_target_creator(roi, bbox, label, self.loc_normalize_mean, self.loc_normalize_std)
class ProposalTargetCreator(object):
def __init__(self, n_sample=128, pos_ratio=0.5, pos_iou_thresh=0.5, neg_iou_thresh_high=0.5, neg_iou_thresh_low=0):
sample_roi [128, 4], box形式
gt_roi_loc [128, 4], 偏移
gt_roi_label [128] -
-
roi进入分类网络
sample_roi(rpn的输出)与feature(backbone 的输出)输入head [roi_pooling, 分类层]
roi_cls_loc, roi_score = self.faster_rcnn.head(torch.unsqueeze(feature, 0), sample_roi, sample_roi_index, img_size)
roi_cls_loc:[1, 128, 84] -->[128,21,4]
roi_score: [1, 128, 21]-->[128, 21]
-
计算损失
roi_loc: [128,4] 从roi_cls_loc中取(根据建议框的种类,取出对应的回归预测结果)
roi_loc = roi_cls_loc[torch.arange(0, n_sample), gt_roi_label]
roi_loc_loss = _fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label.data, self.roi_sigma)
roi_cls_loss = nn.CrossEntropyLoss()(roi_score[0], gt_roi_label)
模型预测:
inputs: [1,3,600,600]
outputs:
-
roi_cls_locs, [1, 300, 84]
-
roi_scores, [1, 300, 21]
-
rois, [300, 4]
-
roi_indices [300]
300为设置的值 n_test_post_nms=300,
roi_cls_locs, roi_scores, rois, _ = self.model(images)
解码:
roi_cls_loc from(roi_cls_locs) [1, 300, 84] --> [300,84] --> [300,21,4]
roi from(rois) [300, 4] --> [300,1,4] --> [300,21,4]
转换为bbox (x1,y1,x2,y2)= {[(x3,y3,x4,y4)转为中心点形式]+(dx,dy,dw,dh)} 转为顶点形式
cls_bbox = loc2bbox(roi.reshape((-1, 4)), roi_cls_loc.reshape((-1, 4)))
cls_bbox = cls_bbox.view([-1, (self.num_classes), 4])
cls_bbox [300, 21, 4]
防止预测框超出图片范围
计算每个建议框的置信度以及最有可能的类别
class_conf, class_pred = torch.max(F.softmax(roi_scores, dim=-1), dim=-1)
通过阈值过滤一部分
conf_mask = (class_conf >= score_thresh)
nms过滤