CUDA繁忙(RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable)
动作
在Linux环境中,运行 python app.py,报错
报错信息
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ERROR: Application startup failed. Exiting.
翻译
运行时错误:CUDA错误:支持CUDA的设备正忙或不可用
排查原因
输入命令:
nvidia-smi
输出:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| ...... ...... ...... +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3920229 C python 6146MiB | +-----------------------------------------------------------------------------+
可知:
有个Python进程(PID: 3920229)占用了 GPU 0,并且使用了 6146MiB 的显存(接近 6148MiB 总使用量),导致其他程序无法使用 GPU 0
注意:
执行 python app.py 默认使用 GPU 0
解决方法
用另一个 GPU 启动 app.py 程序,比如 GPU 1,命令如下:
CUDA_VISIBLE_DEVICES=1 python app.py

浙公网安备 33010602011771号