pytorch使用过程问题汇总

1.DecompressionBombWarning: Image size (92680344 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.DecompressionBombWarning,

　　日期： 2021-01-27

　　原因是图像尺寸超过PLI 默认读取图像尺寸。

　　一种方法是更改阈值上限参考

from PIL import Image
Image.MAX_IMAGE_PIXELS = 2300000000 # 更改阈值像素上限

　　第二种方法，使用其它的库，读入之后，缩放再给PIL使用。

2.列表保存生成的特征数据时，显存溢出

　　问题代码

    for data in enumerate(testLoader): # 这边返回的是（执行次数，__getitem__返回数据） 。 Loader的batch_size= 1
        print(data[0])
        classNO = data[1][1]
        testImage= data[1][0]
        filePath =data[1][2]
        testEncode = model.encode(testImage.cuda())
        featureList.append((testEncode,classNO ,filePath))

大概运行25次，显存溢出。用的是ResNet152做迁移学习。显存是8G ，之前训练网络的时候，批次=25，显存就会溢出。对照，显然每次加载一幅图像生成特征值后，该次的网络占用没有释放。猜测虽然是eval ，但是每次运行后将testEncode 存储。系统检测到数据没释放，对应生成特征的网络过程也没释放。

　　解决办法如下

 testEncode = model.encode(testImage.cuda()).detach().cpu()   # 调用detach之后，存储新建副本，与网络无关。

3. 加载预训练模型并删除指定数据

　　加载预训练模型的时候，会碰到预训练模型类别数目和实际类别不一致。网上搜索，有如下方法

1）创建模型时，设置类别数目和预训练模型一致，赋值完毕后，再更改输出头

        checkpoint = torch.load(config.PREMODELPATH, map_location='cpu')
        #输出的层名称
        classifiers = "head"
        msg = model.load_state_dict(checkpoint['model'], strict=False)
        model.head = 新建输出头

　　此处，给出参考“timm”的代码。先删除预训练模型的头，然后直接赋值。

 checkpoint = torch.load(config.PREMODELPATH, map_location='cpu')
        #输出的层名称
        classifiers = "head"
        if classifiers is not None:
            if isinstance(classifiers, str):
                classifiers = (classifiers,)
            #类数目不一致，则删除输出层
            if config.MODEL.NUM_CLASSES != 1000:
                for classifier_name in classifiers:
                    # completely discard fully connected if model num_classes doesn't match pretrained weights
                    del checkpoint["model"][classifier_name + '.weight']
                    del checkpoint["model"][classifier_name + '.bias']

4.could not export Python function call 'SwishImplementation'. Remove calls to Python functions before ……

原因是efficientnet使用 Swish 激活函数默认是高效版本实现，该版本没有"torch.jit.ScriptModule"子类实现。

此处，提到导出onnx也会遇到类似问题

查阅资料，efficientnet的代码库提供了算子切换接口

class EfficientNet(nn.Module):


 　　def set_swish(self, memory_efficient=True):
        """Sets swish function as memory efficient (for training) or standard (for export).

        Args:
            memory_efficient (bool): Whether to use memory-efficient version of swish.
        """
        self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
        for block in self._blocks:
            block.set_swish(memory_efficient)

#在jit代码调用之前，先调用

local_model.set_swish(False) # swish从默认高效内存模型更改为一般形式以便导出。
之后

sm = torch.jit.trace(local_model,example) #导出成功，但报 Tried to access nonexistent attribute or method 'expand_ratio' of type 'Tuple[int, int, List[int], int, int, int, float, bool]'.: 导出数据和一般python误差精度范围内，有
8个值误差过大，最大差到1.6……

5.TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:

With rtol=1e-05 and atol=1e-05, found 8 element(s) (out of 8) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.8556110858917236 (4.123325824737549 vs. 3.267714738845825), which occurred at index (0, 0).
_check_trace(

原因是：模型默认是 train模式，运行过程，参数会更改。

模式使用eval()模式,加载模型权重,删去模型中关于 cuda 操作.

解决办法：调用 model.eval()

6.Tried to access nonexistent attribute or method 'expand_ratio' of type 'Tuple[int, int, List[int], int, int, int, float, bool]'.:

File "C:\ProgramData\Anaconda3\envs\pytorch1.8.1\lib\site-packages\efficientnet_pytorch\model.py", line 104
        # Expansion and Depthwise Convolution
        x = inputs
        if self._block_args.expand_ratio != 1:
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

猜测是 expand_ratio 参数类型不能导出。

posted @ 2021-01-27 15:18 飘零_未知的坚持阅读(1301) 评论(0) 收藏举报

刷新页面返回顶部

飘零_未知的坚持

pytorch使用过程问题汇总

4.could not export Python function call 'SwishImplementation'. Remove calls to Python functions before ……

6.Tried to access nonexistent attribute or method 'expand_ratio' of type 'Tuple[int, int, List[int], int, int, int, float, bool]'.:

公告