MindSpore报错 Ascend 环境下ReduceMean不支持8维及其以上的输入

1 报错描述

1.1 系统环境

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 基本信息

1.2.1 脚本

训练脚本是通过构建ReduceMean算子网络,对axis1进行求平均值归约。脚本如下:

 01 class Net(nn.Cell):
 02     def __init__(self, axis, keep_dims):
 03         super().__init__()
 04         self.reducemean = ops.ReduceMean(keep_dims=keep_dims)
 05         self.axis = axis
 06     def construct(self, input_x):
 07         return self.reducemean(input_x, self.axis)
 08 net = Net(axis=(1,), keep_dims=True)
 09 x = Tensor(np.random.randn(1, 2, 3, 4, 5, 6, 7, 8, 9), mindspore.float32)
 10 out = net(x)
 11 print("out shape: ", out.shape)

1.2.2 报错

这里报错信息如下:

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    out = net(x)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/nn/cell.py", line 574, in __call__
    out = self.compile_and_run(*args)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/nn/cell.py", line 975, in compile_and_run
    self.compile(*inputs)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/nn/cell.py", line 948, in compile
    jit_config_dict=self._jit_config_dict)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/mindspore/common/api.py", line 1092, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: Single op compile failed, op: reduce_mean_d_1629966128061146056_6
 except_msg: 2022-07-15 01:36:29.720449: Query except_msg:Traceback (most recent call last):
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/te_fusion/parallel_compilation.py", line 1469, in run
    relation_param=self._relation_param)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/te_fusion/fusion_manager.py", line 1283, in build_single_op
    compile_info = call_op()
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/te_fusion/fusion_manager.py", line 1270, in call_op
    opfunc(*inputs, *outputs, *new_attrs, **kwargs)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 537, in _in_wrapper
    formal_parameter_list[i][1], op_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 516, in _check_one_op_param
    _check_input(op_param, param_name, param_type, op_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 299, in _check_input
    _check_input_output_dict(op_param, param_name, op_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 223, in _check_input_output_dict
    param_name=param_name)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 689, in check_shape
    _check_shape_range(max_rank, min_rank, param_name, shape)
  File "/root/archiconda3/envs/lh37_ascend/lib/python3.7/site-packages/tbe/common/utils/para_check.py", line 727, in _check_shape_range
    % (error_info['param_name'], min_rank, max_rank, len(shape)))
RuntimeError: ({'errCode': 'E80012', 'op_name': 'reduce_mean_d', 'param_name': 'input_x', 'min_value': 0, 'max_value': 8, 'real_value': 9}, 'In op, the num of dimensions of input/output[input_x] should be inthe range of [0, 8], but actually is [9].')

原因分析

我们看报错信息,在RuntimeError中,'In op, the num of dimensions of input/output[input_x] should be inthe range of [0, 8], but actually is [9].'意思是ReduceMean的输入为维度应该大于等于0,小于等于8,但实际值为9,显然超过了ReduceMean算子支持的维度。在官网中对ReduceSum也做了输入维度限制说明:
image.png

2 解决方法

基于上面已知的原因,很容易做出如下修改:

 01 class Net(nn.Cell):
 02     def __init__(self, axis, keep_dims):
 03         super().__init__()
 04         self.reducemean = ops.ReduceMean(keep_dims=keep_dims)
 05         self.axis = axis
 06     def construct(self, input_x):
 07         return self.reducemean(input_x, self.axis)
 08 net = Net(axis=(1,), keep_dims=True)
 09 x = Tensor(np.random.randn(2, 3, 4, 5, 6, 7, 8, 9), mindspore.float32)
 10 out = net(x)
 11 print("out shape: ", out.shape)

此时执行成功,输出如下:

out shape: (2, 1, 4, 5, 6, 7, 8, 9)

3 总结

定位报错问题的步骤:
1、找到报错的用户代码行:out = net(x);
2、根据日志报错信息中的关键字,缩小分析问题的范围:should be in the range of [0, 8], but actually is [10];
3、需要重点关注变量定义、初始化的正确性。

4 参考文档

4.1 ReduceMean算子API接口

posted @ 2022-07-15 16:27  Skytier  阅读(76)  评论(0)    收藏  举报