【pytorch基础】基于训练的pytorch模型转换为onnx模型并测试

前言

模型部署的过程中，不同的硬件可能支持不同的模型框架，本文介绍pytorch模型文件转换为onnx模型文件的实现过程，主要是基于Pytorch_Unet的实现过程，训练模型转换为onnx模型，并测试onnx的效果；

操作步骤

1. 基于训练完成的pth文件转换为onnx模型；

2. check和验证onnx模型；

3. 基于输入数据测试onnx模型；

实现过程

1. 基于训练完成的pth文件转换为onnx模型；

模型是基于Unet网络构建，基于Carvana数据集进行训练；

import io
import torch
import torch.onnx
from unet import UNet
import onnx
import onnxruntime
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
from utils.dataset import BasicDataset

View Code

转换过程

def test():
  model = UNet(n_channels=3, n_classes=1)
  batch_size = 1
  input_shape = (3, 640, 959)
  # Initialize model with the pretrained weights
  map_location = lambda storage, loc: storage
  if torch.cuda.is_available():
      map_location = None
  loaded_model = torch.load(pthfile, map_location=map_location)
  model.load_state_dict(loaded_model)
  # set the model to inference mode
  model.eval()
  # data type nchw
  x = torch.rand(batch_size, *input_shape)
  input_names = ['input']
  output_names = ['output']
  # # Export the model
  torch.onnx.export(model,               # model being run
                    x,                         # model input (or a tuple for multiple inputs)
                    onnxpath,   # where to save the model (can be a file or file-like object)
                    export_params=True,        # store the trained parameter weights inside the model file
                    opset_version=12,          # the ONNX version to export the model to
                    do_constant_folding=True,  # whether to execute constant folding for optimization
                    input_names = ['input'],   # the model's input names
                    output_names = ['output'], # the model's output names
                    dynamic_axes={'input' : {0 : 'batch_size'},    # variable lenght axes
                                  'output' : {0 : 'batch_size'}})

View Code

输入数据等

pthfile = 'xxx/Pytorch-UNet/checkpoints/CP_epoch5.pth'
onnxpath = './unet.onnx'
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

View Code

最后会得到onnx模型文件；

注意，模型的输入大小和测试的输入数据一致；

注意，在导出模型之前，请先调用torch_model.eval()或torch_model.train(False)，以将模型转换为推理模式，这一点很重要。这是必需的，因为像dropout或batchnorm这样的运算符在推断和训练模式下的行为会有所不同。

注意，除非指定为动态轴，否则输入尺寸将在导出的 ONNX 图中固定为所有输入尺寸。

在此示例中，我们使用输入batch_size=1导出模型，但随后在torch.onnx.export()的dynamic_axes参数中将第一维指定为动态。因此，导出的模型将接受大小为[batch_size,3, 640, 959]的输入，其中batch_size可以是可变的。

2. check和验证onnx模型；

check模型：

onnx.checker.check_model(onnx_model)验证模型的结构并确认模型具有有效的架构。

通过检查模型的版本，图的结构以及节点及其输入和输出，可以验证 ONNX 图的有效性。如果有效，则输出为None。

 # check model
  onnx_model = onnx.load(onnxpath)
  check = onnx.checker.check_model(onnx_model)
  print('check: ', check)

验证模型是否匹配：

验证 ONNX 运行时和 PyTorch 正在为网络计算相同的值。

  # check model whether match
  ort_session = onnxruntime.InferenceSession(onnxpath)
  # compute ONNX Runtime output prediction
  ort_inputs = {ort_session.get_inputs()[0].name:to_numpy(x)}
  ort_outs = ort_session.run(None, ort_inputs)
  # compare ONNX Runtime and PyTorch results
  torch_out = model(x)
  print('tor_out: ', torch_out.shape)
  np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)
  print("Exported model has been tested with ONNXRuntime, and the result looks good!")

PyTorch 和 ONNX 运行时的输出在数值上与给定的精度(rtol/ atol)匹配。

注意，测试数据时和模型的输入大小一致的。

问题，为什么模型的输出是 ort_outs[0]，比模型预想的输出多出一个维度呢？？？？

验证转换前后模型数据是否一致，注意的是模型是否使用正确；

  # Initialize model with the pretrained weights
  map_location = lambda storage, loc: storage
  if torch.cuda.is_available():
      map_location = None
  loaded_model = torch.load(pthfile, map_location=map_location)
  model.load_state_dict(loaded_model)
  # set the model to inference mode
  model.eval()

3. 基于输入数据测试onnx模型；

import io
import torch
import torch.onnx
from unet import UNet
import onnx
import onnxruntime
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
from utils.dataset import BasicDataset

pthfile = 'xxx/Pytorch-UNet/checkpoints_carvana/CP_epoch5.pth'
onnxpath = './unet.onnx'
imgpath = 'xxx/Pytorch-UNet/output/0cdf5b5d0ce1_01.jpg'
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

def test_onnx():
  full_img = Image.open(imgpath)
  ort_session = onnxruntime.InferenceSession(onnxpath)
  scale_factor = 0.5
  img = torch.from_numpy(BasicDataset.preprocess(full_img, scale_factor))
  img = img.to(device=device, dtype=torch.float32)
  img.unsqueeze_(0)
  # ONNX RUNTIME
  ort_inputs = {ort_session.get_inputs()[0].name:to_numpy(img)}
  ort_outs = ort_session.run(None, ort_inputs)  # list.
  # post process.
  img_out = ort_outs[0]
  img_out = torch.from_numpy(img_out)
  # probs = torch.nn.functional.softmax(img_out, dim=1)
  probs = torch.sigmoid(img_out)
  probs = probs.squeeze(0)
  tf = transforms.Compose(
      [
          transforms.ToPILImage(),
          transforms.Resize(full_img.size[1]),
          transforms.ToTensor()
      ]
  )
  probs = tf(probs.cpu())
  full_mask = probs.squeeze().cpu().numpy()
  mask_thres = 0.5;
  mask_out = (full_mask > mask_thres)
  # save image
  img_out = Image.fromarray((mask_out*255).astype(np.uint8))
  img_out.save('./img/onnx_img.jpg')

if __name__ == '__main__':
  test_onnx()

View Code

问题

1. ONNX版本问题；参考here;

File "xxx/miniconda3/envs/open_mmlab/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 80, in _parse_arg
raise RuntimeError("Failed to export an ONNX attribute '" + v.node().kind() +
RuntimeError: Failed to export an ONNX attribute 'onnx::Cast', since it's not constant, please try to make things (e.g., kernel size) static if possible

查询ONNX版本：

import onnx（或onnxruntime）onnx .__ version __（或onnxruntime .__ version__）

主要原因是ONNX的版本或者pytorch版本的问题；

注意事项

1）转换onnx模型的时候，模型的输入大小是你的输入数据的大小；

那么调用onnx模型时候的输入是不是也要保持一致呢？？？？？

2）注意export模型的ONNX的版本是否正确；

3）转换完成之后记得check是否正确，以及验证和pytorch模型的结果是否匹配；

4）测试数据的完成需要基于训练过程完成对应的预处理和后处理过程；

5）注意转换过程中的数据类型，特别是tensor/numpy等；

6）注意模型导出和推断的时候需要首先确定模型是在eval模式；

问题

1. ONNX导出的结果和 pytorch模型输出的结果不一致，原因？？？

这个问题很重要！！！这个问题很重要！！！这个问题很重要！！！

有博主的回答是：

查了pytorch官方文档后发现，这里的upsample只支持nearest一种模式，而我用的是bilinear，在改变了这个之后，结果就对的齐了。
建议：先去官方文档看一下哪些算子支持哪些算子不支持，以及别用Function函数，得用torch.nn里面的层。

我猜想可能是网络中的某些操作过程在pytorch和onnxruntime中实现不一致吧。。。还没解决，怎么溯源呢？？

我的这个问题是验证使用的模型有误，需要认真！！！

2. 为什么ONNX模型的输出是 ort_outs[0]，比模型预想的输出多出一个维度呢？？？

3. 目前使用的是单个输入输出的情况，也可以有多输入输出，需要将数据写成tuple或者list的形式；

另外，其他人测试过程中出现这样的错误，明明输入是三个，但是input_names输出只有两个，原因是另一个变量是个常量，模型中有tensor转到item的过程，所以出错；

4. 动态轴的指定，需要指定某个变量name对应的动态维度及name；

参考

1. Pytorch_Unet_github;

2. Pytorch_ONNX_doc;

3. Carvana_dataset;

4. Unet;

5. github_onnx_q;

6. pytorch转onnx 模型输出对不齐；

完

posted on 2021-06-01 18:06 鹅要长大阅读(7176) 评论(3) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

鹅要长大

【pytorch基础】基于训练的pytorch模型转换为onnx模型并测试

公告

导航