目标检测 -- faster rcnn

Faster RCNN

模型训练:

  • 第一阶段:

    images ---> backbone ---> rpn

    rpn_locs: [b, 22500, 4] rpn_scores: [b, 22500, 2] 22500=50 * 50 * 9

    1. 根据真实数据生成数据

      先验框anchors: [22500, 4] true-box: [m, 4]

      gt_rpn_loc: 通过iou找出每个先验框最适合的真实框,计算此时先验框与之对应真实框的偏移(dx, dy, dw, dh) shape[22500,4]

      gt_rpn_label: 设置iou阈值 判断先验框是否被建议(-1, 0,1)shape[22500]

    2. 计算loss

      rpn_loc_loss 边框回归损失:通过gt_rpn_label>0取出相应位置的gt_rpn_loc与rpn_locs,_smooth_l1_loss

      rpn_cls_loss 类别损失:rpn_scores与gt_rpn_label交叉熵损失

    rois:过滤(宽高比),排序(rpn_scores)以及nms(可设置nms_thresh)。 选择n_post_nms个建议框 [600, 4] *b

  • 第二阶段

    rpn --> head

    1. 根据真实数据与建议框生成数据

    sample_roi, gt_roi_loc, gt_roi_label = self.proposal_target_creator(roi, bbox, label, self.loc_normalize_mean, self.loc_normalize_std)
    class ProposalTargetCreator(object):
    def __init__(self, n_sample=128, pos_ratio=0.5, pos_iou_thresh=0.5, neg_iou_thresh_high=0.5, neg_iou_thresh_low=0):
     
    sample_roi [128, 4], box形式
    gt_roi_loc [128, 4],   偏移
    gt_roi_label [128]
  1. roi进入分类网络

    sample_roi(rpn的输出)与feature(backbone 的输出)输入head [roi_pooling, 分类层]

roi_cls_loc, roi_score = self.faster_rcnn.head(torch.unsqueeze(feature, 0), sample_roi, sample_roi_index, img_size)
roi_cls_loc:[1, 128, 84] -->[128,21,4]
roi_score: [1, 128, 21]-->[128, 21]
  1. 计算损失

roi_loc: [128,4] 从roi_cls_loc中取(根据建议框的种类,取出对应的回归预测结果)

  roi_loc = roi_cls_loc[torch.arange(0, n_sample), gt_roi_label]
  roi_loc_loss = _fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label.data, self.roi_sigma)
roi_cls_loss = nn.CrossEntropyLoss()(roi_score[0], gt_roi_label)

模型预测:

inputs: [1,3,600,600]

outputs:

  1. roi_cls_locs, [1, 300, 84]

  2. roi_scores, [1, 300, 21]

  3. rois, [300, 4]

  4. roi_indices [300]

    300为设置的值 n_test_post_nms=300,

roi_cls_locs, roi_scores, rois, _ = self.model(images)

解码:

roi_cls_loc from(roi_cls_locs) [1, 300, 84] --> [300,84] --> [300,21,4]
roi from(rois) [300, 4] --> [300,1,4] --> [300,21,4]
转换为bbox (x1,y1,x2,y2)= {[(x3,y3,x4,y4)转为中心点形式]+(dx,dy,dw,dh)} 转为顶点形式
cls_bbox = loc2bbox(roi.reshape((-1, 4)), roi_cls_loc.reshape((-1, 4)))
cls_bbox = cls_bbox.view([-1, (self.num_classes), 4])
cls_bbox [300, 21, 4]
防止预测框超出图片范围
计算每个建议框的置信度以及最有可能的类别
class_conf, class_pred = torch.max(F.softmax(roi_scores, dim=-1), dim=-1)
通过阈值过滤一部分
conf_mask = (class_conf >= score_thresh)
nms过滤

 

 

 

posted @ 2021-09-29 11:04  CeasonCing  阅读(73)  评论(0)    收藏  举报