pytorch-ctcloss在GPU上计算报错问题

转载自:https://www.gitmemory.com/issue/pytorch/pytorch/22234/511927457
https://github.com/pytorch/pytorch/issues/22234
🐛 Bug
When running ctc on the GPU, torch.nn.functional.ctc_loss expects different types for the "targets" input, depending on whether the selected backed is cudnn or the native implementation.

However, the user does not know in advance whether the cuDNN implementation will be used by PyTorch or not. This is very inconvenient since the program may crash without any clear mechanism to prevent it.

To Reproduce
Python script:

import torch

# Generate data.
logits = torch.normal(mean=torch.zeros((20, 10, 5), dtype=torch.float32))
logprobs = torch.nn.functional.log_softmax(logits, dim=-1)
targets = torch.randint(1, 10, size=(10, 4), dtype=torch.int32)
target_lengths = torch.randint(1, 5, size=(10,), dtype=torch.int32)

# All examples have the same input length, so that cuDNN can be used.
input_lengths = 20 * torch.ones((10,), dtype=torch.int32)
# Reshape targets, so that cuDNN can be used
targets = torch.cat(tuple(targets[i, :target_lengths[i]] for i in range(10)))

# CPU: OK
print(torch.nn.functional.ctc_loss(logprobs, targets, input_lengths, target_lengths))

# CUDA, PyTorch native implementation: OK
torch.backends.cudnn.enabled = False
print(torch.nn.functional.ctc_loss(logprobs.to('cuda'), targets.to('cuda'), input_lengths, target_lengths))

# CUDA, cuDNN implementation: CRASHES (cuDNN expects targets in CPU).
torch.backends.cudnn.enabled = True
print(torch.nn.functional.ctc_loss(logprobs.to('cuda'), targets.to('cuda'), input_lengths, target_lengths))
Output:

$ python minimal_ctc.py 
tensor(17.2880)
tensor(17.2880, device='cuda:0')
Traceback (most recent call last):
  File "minimal_ctc.py", line 24, in <module>
    print(torch.nn.functional.ctc_loss(logprobs.to('cuda'), targets.to('cuda'), input_lengths, target_lengths))
  File "/home/joapuipe/.virtualenvs/pytorch1.1-py3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 1813, in ctc_loss
    zero_infinity)
RuntimeError: Expected tensor to have CPU Backend, but got tensor with CUDA Backend (while checking arguments for cudnn_ctc_loss)

Expected behavior
First, the ctc_loss API should specify in the doc which device it expects the inputs. Secondly, I think it should be consistent regardless of the backend implementation used (e.g. copy to the expected device if necessary).

If the dev team does not like implicit copies, I suggest a function like: torch.backends.cudnn.will_ctc_loss_run_on_cudnn(logprobs, targets, inputs_lengths, targets_length, blank) so that the user can check whether or not cudnn will be used for the given input (i.e the same check as https://github.com/pytorch/pytorch/blob/1d705b4b075e32540293d1717b582442a66ffcce/aten/src/ATen/native/LossCTC.cpp#L344).

环境

Environment
Collecting environment information...
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.6.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: GeForce GTX 780
GPU 1: GeForce GTX 1080

Nvidia driver version: 415.27
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] nnutils-pytorch0.4.0
[pip3] numpy
1.16.4
[pip3] torch1.1.0
[pip3] torch-baidu-ctc
0.3.0
[pip3] torchvision==0.3.0

posted @ 2020-08-15 12:12  beibao  阅读(1428)  评论(0)    收藏  举报