Fork me on GitHub

测试博客

Retinaface

Created by Hanyz@2020/12/4

1、主干网络

主干网络用于提取输入图片的特征

主干网络有两种选择:

ResNet50(效果好)和MobileNetV1(速度快),采用深度可分离卷积DW,用于减少参数计算量

本文采用了MobileNetV1-0.25( alpha取0.25 ),相比MobilenetV1,通道数压缩了4倍

主干网络示意图:

image-20201204140140762

对于最后三个shape输出的的特征,传输到下一部分,会进行加强特征提取

代码部分

封装模块:

def conv_bn(inp, oup, stride=1, leaky=0):
    return nn.Sequential(
        nn.Conv2d(inp,oup,3,stride,1,bias=False),
        nn.BatchNorm2d(oup),
        nn.LeakyReLU(negative_slope=leaky, inplace=True)
    )

def conv_dw(inp, oup, stride, leaky=0.1):
    return nn.Sequential(
        #dw卷积部分:
        nn.Conv2d(inp,inp,3,stride,1,groups=inp,bias=False),
		nn.BatchNorm2d(inp),
        nn.LeakyReLU(negative_slope=leaky, inplace=True)
        
        #1x1修改通道部分:	
        nn.Conv2d(inp,oup,1,1,1,bias=False),
        nn.BatchNorm2d(oup),
        nn.LeakyReLU(negative_slope=leaky, inplace=True)
    )

定义整体网络:

class MobileNetV1(nn.Module):
    def __init__(self):
        super(MobileNetV1, self).__init__()
        
        # 640 x 640 x 3 -> 80 x 80 x 64
        self.stage1 = nn.Sequential(
        	conv_bn(3, 8, 2,leaky=0.1),  # 3
       		conv_dw(8, 16, 1),   # 7
            
        	conv_dw(16, 32, 2),  # 11
        	conv_dw(32, 32, 1),  # 19
            
        	conv_dw(32, 64, 2),  # 27
        	conv_dw(64, 64, 1),  # 43   --->C3
        )
        
        # 80 x 80 x 64 -> 40 x 40 x 128
        self.stage2 = nn.Sequential(
        	conv_dw(64, 128, 2),   # 43 + 16 = 59
        	conv_dw(128, 128, 1),  # 59 + 32 = 91
            conv_dw(128, 128, 1),  # 91 + 32 = 123
            conv_dw(128, 128, 1),  # 123 + 32 = 155
            conv_dw(128, 128, 1),  # 155 + 32 = 187
            conv_dw(128, 128, 1),  # 187 + 32 = 219   --->C4
        )
        
        # 40 x 40 x 128 -> 20 x 20 x 256
        self.stage3 = nn.Sequential(
        	conv_dw(128, 256, 2),  # 219 +3 2 = 241
        	conv_dw(256, 256, 1),  # 241 + 64 = 301   --->C5
        )
        
        self.avg = nn.AdaptiveAvgPool2d((1,1))
        self.fc = nn.Linear(256, 1000)
        
	def forward(self, x):
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.avg(x)
        x = x.view(-1, 256)
        x = self.fc(x)
        return x

若conv_xx第三个参数stride设为2,即进行了下采样操作

2、FPN特征金字塔

image-20201204141046114

image-20201204141124494

主要完成的是特征融合的功能

p3,p4,p5都是64通道,代码如下:

class FPN(nn.Module):
    def __init__(self,in_channels_list,out_channels):
        super(FPN,self).__init__()
        leaky = 0
        if (out_channels <= 64):
            leaky = 0.1
        self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
        self.output2 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky)
        self.output3 = conv_bn1X1(in_channels_list[2], out_channels, stride = 1, leaky = leaky)

        self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky)
        self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky)

    def forward(self, inputs):
        
        inputs = list(inputs.values())

        output1 = self.output1(inputs[0])
        output2 = self.output2(inputs[1])
        output3 = self.output3(inputs[2])

        up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode="nearest")  #上采样
        output2 = output2 + up3  #Add
        output2 = self.merge2(output2)  #特征整合,就是经过了通道数不变的卷积,BN和relu

        up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode="nearest")  #上采样
        output1 = output1 + up2  #Add
        output1 = self.merge1(output1)  #特征整合,就是经过了通道数不变的卷积,BN和relu

        out = [output1, output2, output3]
        return out

另外,C1,C2,C3是由以下代码获取到的:

self.body = _utils.IntermediateLayerGetter(backbone, cfg['return_layers'])

3、SSH进一步加强特征提取

image-20201204181411174

SSH的思想,就是使用三个并行结构,利用3x3卷积的堆叠代替5x5和7x7的卷积。(类似Inception模块)

class SSH(nn.Module):
    def __init__(self, in_channel, out_channel):
        super(SSH, self).__init__()
        assert out_channel % 4 == 0
        leaky = 0
        if (out_channel <= 64):
            leaky = 0.1
        
        # in_channel = 64 , out_channel = 32 + 16 + 16 = 64
        self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1)

        self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky)
        self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)

        self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky)
        self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)

    def forward(self, inputs):
        conv3X3 = self.conv3X3(inputs)

        conv5X5_1 = self.conv5X5_1(inputs)  #相当于conv7X7_1
        conv5X5 = self.conv5X5_2(conv5X5_1)

        conv7X7_2 = self.conv7X7_2(conv5X5_1)
        conv7X7 = self.conv7x7_3(conv7X7_2)

        # 进行堆叠,保证三个路径的通道数之和为out_channel
        out = torch.cat([conv3X3, conv5X5, conv7X7], dim=1)  
        out = F.relu(out)
        return out

然后,对FPN获得的C1,C2,C3做3次SSH:

in_channels_stage2 = cfg['in_channel']
in_channels_list = [
    in_channels_stage2 * 2,
    in_channels_stage2 * 4,
    in_channels_stage2 * 8,
]
    
out_channels = cfg['out_channel']
self.fpn = FPN(in_channels_list,out_channels)

# 上面定义
self.ssh1 = SSH(out_channels, out_channels)
self.ssh2 = SSH(out_channels, out_channels)
self.ssh3 = SSH(out_channels, out_channels)

# 下面SSH实现
feature1 = self.ssh1(fpn[0])
feature2 = self.ssh2(fpn[1])
feature3 = self.ssh3(fpn[2])
features = [feature1, feature2, feature3] #获得三个有效特征层

4、从特征层预测结果

image-20201204141740391

预测结构有三个:分类预测结果; 框的回归预测结果; 人脸关键点的回归预测结果

1、分类预测结果用于判断先验框内是否包含物体,本文以1x1卷积,将SSH通道数调整为num_anchors * 2;用于代表每个每个先验框包含人脸的概率

2、框的回归,用于对先验框进行调整,获取预测狂,需要4个参数来调整,可以用1x1卷积,将SHH通道调整为

num_anchors * 4;用于代表每个先验框的调整参数

3、人脸关键点回归预测,对先验框进行调整,获得人脸关键点(每个人脸5个关键点),每个人脸关键点要两个参数,利用1x1卷积,将SHH通道调整为num_anchors * 5 * 2;用于代表每个先验框每个人脸的调整

5、细节解释

5.1 整体过程:

​ 三个最终的特征层,相当于将图片划分将输入进来的特征图,分成了不同大小的网格,每个网格上会存在着若干个先验框(默认存在两个正方形先验框),判断正方形内部是否包含物体,如果包含物体,就对正方形进行调整,获得最终的预测框。

image-20201204183028453

1、Class part

​ num_anchors默认为2,将SSH通道数设置为num_anchors * 2,*2 的意思是,如果序号为0的内容比较大,则代表框内没有要找的物体(人脸),如果序号为1的内容比较大,则代表人脸存在的可能比较大。

代码部分:

def __init__(self,inchannels=512,num_anchors=2):
    super(ClassHead,self).__init__()
    self.num_anchors = num_anchors
    self.conv1x1 = nn.Conv2d(inchannels,self.num_anchors*2,kernel_size=(1,1),stride=1,padding=0)

def forward(self,x):
    out = self.conv1x1(x)
    out = out.permute(0,2,3,1).contiguous()  #将通道数放到最后一个维度,方便之后处理

    return out.view(out.shape[0], -1, 2) # 将中间的1,2维度reshape到一起

最后的out,第一维度为batch_size,第二维度为先验框,第三个维度就是每个先验框是否包含人脸的概率

def _make_class_head(self,fpn_num=3,inchannels=64,anchor_num=2):
	classhead = nn.ModuleList()
    for i in range(fpn_num):         			 						        
    	classhead.append(ClassHead(inchannels,anchor_num))
    return classhead

2、Box part

​ 将SSH通道数设置为num_anchors * 4,*4 的意思是先验框调整参数,分别是0,1,用于调整预测框的中心。以及2,3,用于调整预测框宽和高。

class BboxHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=2):
        super(BboxHead,self).__init__()
        self.conv1x1 = nn.Conv2d(inchannels,num_anchors*4,kernel_size=(1,1),stride=1,padding=0)

    def forward(self,x):
        out = self.conv1x1(x)
        out = out.permute(0,2,3,1).contiguous()

        return out.view(out.shape[0], -1, 4)

下方应用:

def _make_bbox_head(self,fpn_num=3,inchannels=64,anchor_num=2):
    bboxhead = nn.ModuleList()
    for i in range(fpn_num):
        bboxhead.append(BboxHead(inchannels,anchor_num))
    return bboxhead

3、Face part

​ 将SSH通道数设置为num_anchors * 2 * 5,*5 的意思是5个人脸关键点。*2 的意思是调整参数,对先验框中心调整,获得5个人脸关键点位置。

代码:

class LandmarkHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=2):
        super(LandmarkHead,self).__init__()
        self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0)

    def forward(self,x):
        out = self.conv1x1(x)
        out = out.permute(0,2,3,1).contiguous()

        return out.view(out.shape[0], -1, 10) #第三维度,每个人脸关键点对应两个调整位置参数

应用:

def _make_landmark_head(self,fpn_num=3,inchannels=64,anchor_num=2):
    landmarkhead = nn.ModuleList()
    for i in range(fpn_num):
        landmarkhead.append(LandmarkHead(inchannels,anchor_num))
    return landmarkhead

最终,对有效特征层进行循环,利用三个Part对三个有效特征层处理,获得最终预测结果,进行堆叠:

bbox_regressions = torch.cat([self.BboxHead[i](feature) 
                             for i, feature in enumerate(features)], dim=1)
classifications = torch.cat([self.ClassHead[i](feature) 
                             for i, feature in enumerate(features)], dim=1)
ldm_regressions = torch.cat([self.LandmarkHead[i](feature) 
                             for i, feature in enumerate(features)], dim=1)
if self.phase == 'train':
    output = (bbox_regressions, classifications, ldm_regressions)
else:
    output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)
        
return output

5.2 先验框的详解与可视化

class Anchors(object):

以下为Anchors类定义的内容:

def __init__(self, cfg, image_size=None, phase='train'):
    super(Anchors, self).__init__()
    self.min_sizes = cfg['min_sizes'] #先验框的基础边长
    self.steps = cfg['steps'] #长和宽压缩的倍数
    self.clip = cfg['clip'] #是否归一化
    self.image_size = image_size #图像大小
    # 循环三次,计算三个特征层各自的高和宽
    self.feature_maps = [[ceil(self.image_size[0]/step), 			
                          ceil(self.image_size[1]/step)] for step in self.steps]

在config.py中:

‘min_sizes’:先验框基础边长,设置为[16,32],[64,128],[256,512]三种

'steps':[8, 16, 32] ,即长和宽的压缩比,32即压缩了5倍

'clip':可以理解为归一化,选择是否将值固定在0和1之间

def get_anchors(self):
    anchors = []
    for k, f in enumerate(self.feature_maps): #对所有特征层进行循环
        min_sizes = self.min_sizes[k]
        # 每个网格点2个先验框,都是正方形
        for i, j in product(range(f[0]), range(f[1])):
            for min_size in min_sizes:  #将先验框映射到网格点上
                s_kx = min_size / self.image_size[1]
                s_ky = min_size / self.image_size[0]
                dense_cx = [x * self.steps[k] / self.image_size[1] for x in [j+ 0.5]]
                dense_cy = [y * self.steps[k] / self.image_size[0] for y in [i+ 0.5]]
                    for cy, cx in product(dense_cy, dense_cx):
                        anchors += [cx, cy, s_kx, s_ky] #添加至anchors列表中
    
    # 以下只是展示方便,实际使用还是按上面的anchor格式
    anchors = np.reshape(anchors,[-1,4]) #将[中/心,宽/高]的格式调整为[左上角,右下角]

    output = np.zeros_like(anchors[:,:4])
    output[:,0] = anchors[:,0] - anchors[:,2]/2
    output[:,1] = anchors[:,1] - anchors[:,3]/2
    output[:,2] = anchors[:,0] + anchors[:,2]/2
    output[:,3] = anchors[:,1] + anchors[:,3]/2

    if self.clip:
        output = np.clip(output, 0, 1)
    return output

Vision_for_anchors.py ”效果图:

image-20201204200509877

绘制的为20x20的特征层,也就是最深的特征层。两个红框为左上角第一个元素的两个先验框,先验框的初始边长就是设置的'min_size',接着该算法判断框中是否包含人脸,先验框调整位置,并获得人脸关键点。

总结:先验框就是预先设定好的,在图片上的框,网络的预测结果只会对先验框判断并进行调整

5.3 先验框的调整(困难

问题1、如何对先验框调整获得最终预测框

函数名:decode

参数列表:

‘loc’: 网络回归结果的位置,尺寸为n*4,输入为 mbox_loc = np.random.randn(800,4)

'priors':输入为 anchors(序号,中心1 中心2 宽 高),尺寸为n*4

priors[:, :2]相当于原中心,loc[:, :2] * variances[0] * priors[:, 2:]相当于偏移,两者相加获得调整的中心

其中,先获得loc中序号为0,1的内容,乘上常数variances(0.1),作标准化作用,接着乘priors

在config.py,'variances':[0.1, 0.2]作为常系数

priors[:, 2:]相当于原宽高,接着对宽高的调整,获取到loc[:, 2:]即序号为2,3,取指数后乘原宽高,得到调整宽高

最后将中心与宽高 拼接,boxes的尺寸为n * 4。

def decode(loc, priors, variances):
    # 中心解码,宽高解码
    # 获得调整后的中心与宽高
    boxes = torch.cat((priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
                    priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1)
    
    # 转化为左上角位置、右下角位置的格式
    boxes[:, :2] -= boxes[:, 2:] / 2
    boxes[:, 2:] += boxes[:,  :2]
    return boxes

调整图示:

image-20201205003358506

问题2、如何对先验框中心进行调整,获得5个人脸关键点

参数列表:

'pre':人脸预测点,pre的尺寸为n * 10,其中每个关键点有两个参数

'priors':先验框,尺寸为 n * 4,priors[:, :2]代表先验框的中心,priors[:, 2:]代表先验框的宽高

关键点的两个参数,乘以常系数0.1,再乘以先验框的长宽,加上先验框的中心,得到人脸关键点的调整中心

接着,按列拼接

def decode_landm(pre, priors, variances):
    # 关键点解码
    landms = torch.cat((priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:],
                        priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:],
                        ), dim=1)
    return landms

5.4 预测过程

predict.py:

retinaface = Retinaface()

while True:
    img = input('Input image filename:')

    image = cv2.imread(img)
    if image is None:
        print('Open Error! Try again!')
        continue
    else:
        image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        
        r_image = retinaface.detect_image(image)

        r_image = cv2.cvtColor(r_image,cv2.COLOR_RGB2BGR)
        cv2.imshow("after",r_image)
        cv2.waitKey(0)

detect_iamge.py:

  def detect_image(self, image):
        # 绘制人脸框
        old_image = image.copy()

        image = np.array(image,np.float32)
        im_height, im_width, _ = np.shape(image)

        # 它的作用是将归一化后的框坐标转换成原图的大小
        scale = torch.Tensor([np.shape(image)[1], np.shape(image)[0], 
                              np.shape(image)[1], np.shape(image)[0]])
        scale_for_landmarks = torch.Tensor([np.shape(image)[1], np.shape(image)[0], 
                                            np.shape(image)[1], np.shape(image)[0],
                                            np.shape(image)[1], np.shape(image)[0], 
                                            np.shape(image)[1], np.shape(image)[0],
                                            np.shape(image)[1], np.shape(image)[0]])

        # pytorch 的预处理,将通道重新返回到图片的第一个维度
        image = preprocess_input(image).transpose(2, 0, 1)
        # 增加batch_size维度
        image = torch.from_numpy(image).unsqueeze(0)
        # 计算先验框
        anchors = Anchors(self.cfg, image_size=(im_height, im_width)).get_anchors()

        with torch.no_grad():
            if self.cuda:
                scale = scale.cuda()
                scale_for_landmarks = scale_for_landmarks.cuda()
                image = image.cuda()
                anchors = anchors.cuda()
			
            # 对应三个part的预测结果
            # loc为box的预测结果,conf为class的预测结果,landms为人脸特征点的预测结果
            loc, conf, landms = self.net(image)  # forward pass
            
            boxes = decode(loc.data.squeeze(0), anchors, self.cfg['variance'])
            boxes = boxes * scale # 转换到原图的坐标上
            boxes = boxes.cpu().numpy()

            # 提取序号为1(即[:,1:2]),是先验框为人脸的概率
            conf = conf.data.squeeze(0)[:,1:2].cpu().numpy()
            
            landms = decode_landm(landms.data.squeeze(0), anchors, self.cfg['variance'])
            landms = landms * scale_for_landmarks # 转换到原图的坐标上
            landms = landms.cpu().numpy()

            boxes_conf_landms = np.concatenate([boxes,conf,landms],-1) # 堆叠
            
            boxes_conf_landms = non_max_suppression(boxes_conf_landms, self.confidence) # 非极大值抑制,置信度默认为0.5
    
        for b in boxes_conf_landms:
            text = "{:.4f}".format(b[4])
            b = list(map(int, b))
            # 以下b[0/1/2/3]四个参数,是框的位置
            cv2.rectangle(old_image, (b[0], b[1]), (b[2], b[3]), (0, 0, 255), 2)
            cx = b[0]
            cy = b[1] + 12
            cv2.putText(old_image, text, (cx, cy),
                        cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))

            # landms绘制,以下10个参数为人脸关键点位置
            cv2.circle(old_image, (b[5], b[6]), 1, (0, 0, 255), 4) #左眼
            cv2.circle(old_image, (b[7], b[8]), 1, (0, 255, 255), 4) #有眼
            cv2.circle(old_image, (b[9], b[10]), 1, (255, 0, 255), 4) #鼻子
            cv2.circle(old_image, (b[11], b[12]), 1, (0, 255, 0), 4) #左嘴角
            cv2.circle(old_image, (b[13], b[14]), 1, (255, 0, 0), 4) #右嘴角
        return old_image

NMS定义:筛选出一定区域内属于同一种类得分最大的框

NMS非极大值抑制图示:

image-20201205003205784

IoU代码:

def iou(b1,b2):
    b1_x1, b1_y1, b1_x2, b1_y2 = b1[0], b1[1], b1[2], b1[3]
    b2_x1, b2_y1, b2_x2, b2_y2 = b2[:, 0], b2[:, 1], b2[:, 2], b2[:, 3]

    inter_rect_x1 = np.maximum(b1_x1, b2_x1)
    inter_rect_y1 = np.maximum(b1_y1, b2_y1)
    inter_rect_x2 = np.minimum(b1_x2, b2_x2)
    inter_rect_y2 = np.minimum(b1_y2, b2_y2)
    
    inter_area = np.maximum(inter_rect_x2 - inter_rect_x1, 0) * \
                 np.maximum(inter_rect_y2 - inter_rect_y1, 0)
    
    area_b1 = (b1_x2-b1_x1)*(b1_y2-b1_y1)
    area_b2 = (b2_x2-b2_x1)*(b2_y2-b2_y1)
    
    iou = inter_area/np.maximum((area_b1+area_b2-inter_area),1e-6)
    return iou

非极大值抑制代码:

def non_max_suppression(boxes, conf_thres=0.5, nms_thres=0.3):
    detection = boxes
    # 1、找出该图片中得分大于门限函数的框。在进行重合框筛选前就进行得分的筛选可以大幅度减少框的数量。
    mask = detection[:,4] >= conf_thres
    detection = detection[mask]
    if not np.shape(detection)[0]:
        return []

    best_box = []
    scores = detection[:,4]
    # 2、根据得分对框进行从大到小排序。
    arg_sort = np.argsort(scores)[::-1]
    detection = detection[arg_sort]

    while np.shape(detection)[0]>0:
        # 3、每次取出得分最大的框,计算其与其它所有预测框的重合程度,重合程度过大的则剔除。
        best_box.append(detection[0])
        if len(detection) == 1:
            break
        ious = iou(best_box[-1],detection[1:])  #计算该框与其他框的重合度IoU
        detection = detection[1:][ious<nms_thres]

    return np.array(best_box)

5.5 训练自己的人脸检测算法

本文所使用的数据集是widerface数据集

数据集目录:

data

widerface

train,val

images(文件夹)

0--Parade:存放的都是图片,每个图片里都带有人脸

1--Handshaking

...

label.txt

label.txt文件解析:

image-20201205005024344

0~3:前两个数值为人脸框中心的x、y坐标,后两个数值为人脸框的宽和高;

4~19:人脸关键点的坐标,每两个为一个关键点坐标,不同的人脸关键点用1.0或0.0分隔;

-2:如果是1.0或0.0,则代表有人脸关键点 / 如果是-1,则表示无法标注人脸关键点;

-1:置信度( 待定 );

具体操作步骤:

1、配置环境:pytorch == 1.2.0

2、将数据集覆盖根目录下的data文件夹,结构如上。

3、注意backbone和权重文件的对应,修改backbone和model_path,使用mobilenet为主干特征提取网络的示例如下:

#-------------------------------#
#   主干特征提取网络的选择
#   mobilenet或者resnet50
#-------------------------------#
backbone = "mobilenet"
training_dataset_path = './data/widerface/train/label.txt'

if backbone == "mobilenet":
    cfg = cfg_mnet
elif backbone == "resnet50":  
    cfg = cfg_re50
else:
    raise ValueError('Unsupported backbone - `{}`, Use mobilenet, resnet50.'.format(backbone))

...
model_path = "model_data/Retinaface_mobilenet0.25.pth"

4、根据自己需要选择从头开始训练还是在已经训练好的权重下训练,需要修改train.py文件下的代码:

------若要从头开始训练需要将pretrained设置为True,并且注释train.py里面的权值载入部分

backbone = "mobilenet"
#-------------------------------#
#   是否使用主干特征提取网络
#   的预训练权重
#-------------------------------#
pretrained = True

model = RetinaFace(cfg=cfg, pretrained = pretrained).train()

------若要在在已经训练好的权重下训练,将pretrained设置为False

pretrained = False

backbone = "mobilenet"
#-------------------------------------------#
#   权值文件的下载请看README
#   权值和主干特征提取网络一定要对应
#-------------------------------------------#
model = RetinaFace(cfg=cfg, pretrained = pretrained).train()

### 以下代码在pretrained设置为False时保留,为True时注释
model_path = "model_data/Retinaface_mobilenet0.25.pth"
# 加快模型训练的效率
print('Loading weights into state dict...')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_dict = model.state_dict()
pretrained_dict = torch.load(model_path, map_location=device)
pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) ==  np.shape(v)}
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)
print('Finished!')

最终,生成的权重文件会Save至log文件夹。

5、在retinaface.py文件里面,在如下部分修改model_path和backbone使其对应训练好的文件。

_defaults = {
    "model_path": 'model_data/Retinaface_mobilenet0.25.pth',
    "confidence": 0.5,
    "backbone": "mobilenet",
    "cuda": True #注意是否用GPU
}

6、运行predict.py,输入img/timg.jpg;也可利用video.py可进行摄像头检测。

最终预测图效果:

image-20201205012146418 ![](https://img2020.cnblogs.com/blog/2290241/202101/2290241-20210126131803565-108591682.png)

posted @ 2021-01-26 13:17  把明天没收  阅读(130)  评论(0编辑  收藏  举报