CUDA繁忙(RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable)

动作

在Linux环境中,运行 python app.py,报错

 

报错信息

RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


ERROR:    Application startup failed. Exiting.

 

翻译

运行时错误:CUDA错误:支持CUDA的设备正忙或不可用

 

排查原因

输入命令:

nvidia-smi

输出:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
......
......
......
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   3920229      C   python                           6146MiB |
+-----------------------------------------------------------------------------+

可知:

  有个Python进程(PID: 3920229)占用了 GPU 0,并且使用了 6146MiB 的显存(接近 6148MiB 总使用量),导致其他程序无法使用 GPU 0

注意:

  执行 python app.py 默认使用 GPU 0

 

解决方法

用另一个 GPU 启动 app.py 程序,比如 GPU 1,命令如下:

CUDA_VISIBLE_DEVICES=1 python app.py

 

posted @ 2025-09-10 15:40  Alan_LJP  阅读(98)  评论(0)    收藏  举报