RuntimeError: default_program(24): error: extra text after expected end of number

详细报错

Traceback (most recent call last):
  File "eval_roberta_qa.py", line 24, in <module>
    output = model(input_ids, attention_mask, token_type_ids)
  File "/home/rzhang/miniconda3/envs/vamc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: default_program(24): error: extra text after expected end of number

default_program(29): error: extra text after expected end of number

2 errors detected in the compilation of "default_program".

nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_mul_div_add(float* t0, float* t1, float* aten_add, float* aten_mul) {
{
  float t1_1 = __ldg(t1 + (512 * blockIdx.x + threadIdx.x) % 384);
  if (blockIdx.x<1 ? 1 : 0) {
    if (512 * blockIdx.x + threadIdx.x<384 ? 1 : 0) {
      if (blockIdx.x<1 ? 1 : 0) {
        aten_mul[512 * blockIdx.x + threadIdx.x] = t1_1 * -3.402823466385289e+38.f;
      }
    }
  }
  float v = __ldg(t0 + ((512 * blockIdx.x + threadIdx.x) % 384 + 384 * (((512 * blockIdx.x + threadIdx.x) / 384) % 384)) + 147456 * (((512 * blockIdx.x + threadIdx.x) / 147456) % 12));
  aten_add[(((512 * blockIdx.x + threadIdx.x) % 384 + ((512 * blockIdx.x + threadIdx.x) / 1769472) * 1769472) + 384 * (((512 * blockIdx.x + threadIdx.x) / 384) % 384)) + 147456 * (((512 * blockIdx.x + threadIdx.x) / 147456) % 12)] = v / 8.f + t1_1 * -3.402823466385289e+38.f;
}
}

问题描述

我在尝试进行torchscri推理时，具体代码如下，这里output在第一次是能够正常输出的，在第二次就出现了上面的报错

model = torch.jit.load(torchscript_path)

model.to("cuda:0")
model.eval()
with open(datalist_txt, "r")as fr:
    lines = fr.readlines()
    for index in tqdm(range(len(lines))):

        data = np.load(lines[index].strip())
       
        input_ids = torch.tensor(data["input_0"], dtype=torch.int32).to("cuda:0")
        attention_mask = torch.tensor(data["input_1"], dtype=torch.int32).to("cuda:0")
        token_type_ids = torch.tensor(data["input_2"], dtype=torch.int32).to("cuda:0")
        output = model(input_ids, attention_mask, token_type_ids)
        output = output.cpu().detach().numpy()
        out_npz = {"output_0": output}
        np.savez(f"output_path/out_{str(index).zfill(6)}.npz")

原因分析：

torch版本： torch==1.8.1+cu11

博客参考：https://discuss.pytorch.org/t/second-forward-call-of-torchscripted-module-breaks-on-cuda/124291

总结而言就是torch版本太老， 必须大于1.12.0，这是默认设置nvFuser的第一个版本

装上1.13.1问题解决

posted @ 2023-10-25 19:19 xle97 阅读(399) 评论(0) 收藏举报

刷新页面返回顶部