yolov8模型训练 执行训练命令报"OSError: [WinError 1455] 页面文件太小,无法完成操作"的问题解决

问题描述:

在进行yolov8模型训练时,在命令框中执行:yolo detect train data=E:\yolo_train_new\save\my.yaml model=E:\yolo_train_new\save\yolov8n.yaml batch=-1 epochs=3000 imgsz=800 workers=16 device=0 patience=50
命令执行后运行一段时间,报如下错误:

AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\nn\tasks.py:634: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. return torch.load(file, map_location="cpu"), file # load E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\utils\checks.py:638: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(True): AMP: checks passed ✅ E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\engine\trainer.py:271: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.scaler = torch.cuda.amp.GradScaler(enabled=self.amp) E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\utils\autobatch.py:26: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. with torch.cuda.amp.autocast(amp): AutoBatch: Computing optimal batch size for imgsz=800 AutoBatch: CUDA:0 (NVIDIA GeForce RTX 4090 Laptop GPU) 15.99G total, 0.19G reserved, 0.06G allocated, 15.74G free Params GFLOPs GPU_mem (GB) forward (ms) backward (ms) input output 2983336 19.77 0.369 32.9 110.9 (1, 3, 800, 800) list 2983336 39.54 0.659 23.3 26 (2, 3, 800, 800) list 2983336 79.07 1.242 19.33 27.01 (4, 3, 800, 800) list 2983336 158.1 2.403 28.61 33.79 (8, 3, 800, 800) list 2983336 316.3 4.903 51.42 53.69 (16, 3, 800, 800) list AutoBatch: Using batch-size 31 for CUDA:0 9.66G/15.99G (60%) ✅ train: Scanning E:\yolo_train_new\save\labels... 792 images, 0 backgrounds, 0 corrupt: 100%|██████████| 792/792 [00:00< train: New cache created: E:\yolo_train_new\save\labels.cache val: Scanning E:\yolo_train_new\save\labels.cache... 792 images, 0 backgrounds, 0 corrupt: 100%|██████████| 792/792 [00 Traceback (most recent call last): File "<string>", line 1, in <module> File "E:\yolov8study\anaconda\envs\yolov8\Lib\multiprocessing\spawn.py", line 120, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\multiprocessing\spawn.py", line 130, in _main self = reduction.pickle.load(from_parent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\__init__.py", line 262, in <module> _load_dll_libraries() File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\__init__.py", line 245, in _load_dll_libraries raise err OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\lib\cublas64_12.dll" or one of its dependencies.

尝试的解决思路:

(1)根据反馈的错误结果,直接指向的是页面内存文件大小不足,则手动修改电脑的页面文件大小。

修改过程:

第一步:查看你的 RAM 大小

  1. 查看 RAM 大小
    • 点击 Windows 左下角的“开始”按钮,搜索并打开“系统信息”或在“设置”中找到“系统”。
    • 在“系统”页面中,查找“关于”或“设备规格”,你会看到“已安装 RAM”一项,记下这个值(通常会以“GB”表示,例如 8 GB、16 GB 等)。

第二步:增加页面文件大小

进入页面文件设置

  1. 右键点击“此电脑”(或“我的电脑”),选择“属性”。
  2. 左侧点击“高级系统设置”。
  3. 在系统属性窗口中切换到“高级”选项卡。
  4. 在“性能”部分,点击“设置”按钮。

设置页面文件大小

  1. 在“性能选项”窗口切换到“高级”选项卡。

  2. 在“虚拟内存”部分,点击“更改”按钮。

  3. 取消勾选“自动管理所有驱动器的分页文件大小”。

  4. 选择你的系统盘(通常是 C: 盘),然后选择“自定义大小”,输入以下值:

    • 初始大小:建议设置为你电脑 RAM 大小的 1 倍。例如,如果你有 8 GB RAM,可以设置为 8192 MB(1 GB = 1024 MB)。

    • 最大大小:建议设置为你电脑 RAM 大小的 2 倍。例如,如果你有 8 GB RAM,可以设置为 16384 MB

    例如:

    • 如果你有 8 GB RAM:
      • 初始大小: 8192 MB
      • 最大大小: 16384 MB
    • 如果你有 16 GB RAM:
      • 初始大小: 16384 MB
      • 最大大小: 32768 MB
  5. 点击“设置”按钮,然后点击“确定”。

  6. 关闭所有窗口,可能会要求你重启电脑以使设置生效。

第三步:重启计算机

一定要重启计算机,以确保新的页面文件设置生效。

(2)修改原有训练命令参数

将命令yolo detect train data=E:\yolo_train_new\save\my.yaml model=E:\yolo_train_new\save\yolov8n.yaml batch=-1 epochs=3000 imgsz=800 workers=16 device=0 patience=50中的batch=-1 (自动选择批次)修改:

您当前的批量大小为31,可能会占用较多的显存。您可以尝试减少批量大小,例如设置为16或更小。(其实后面来看,这一步可以不做,就由系统自动根据GPU的显存自动确定一个batch的数值也没问题,一般都会自动设置为60%左右!)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

修订完以上的过程后,重新执行命令,又出现了新的问题:
AMP: checks passed ✅ E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\engine\trainer.py:271: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead. self.scaler = torch.cuda.amp.GradScaler(enabled=self.amp) train: Scanning E:\yolo_train_new\save\labels... 792 images, 0 backgrounds, 0 corrupt: 100%|██████ train: New cache created: E:\yolo_train_new\save\labels.cache val: Scanning E:\yolo_train_new\save\labels.cache... 792 images, 0 backgrounds, 0 corrupt: 100%|██ Plotting labels to runs\detect\train\labels.jpg... optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 70 weight(decay=0.0), 79 weight(decay=0.0005), 78 bias(decay=0.0) 3000 epochs... Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 1/3000 5.49G 3.83 8.042 4.294 32 800: 64%|██████▍ | 32 Traceback (most recent call last): File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\utils\data\dataloader.py", line 1243, in _try_get_data data = self._data_queue.get(timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\queue.py", line 179, in get raise Empty _queue.Empty The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "E:\yolov8study\anaconda\envs\yolov8\Scripts\yolo.exe\__main__.py", line 7, in <module> File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\cfg\__init__.py", line 567, in entrypoint getattr(model, mode)(**overrides) # default args from model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\engine\model.py", line 390, in train self.trainer.train() File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\engine\trainer.py", line 208, in train self._do_train(world_size) File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\engine\trainer.py", line 361, in _do_train for i, batch in pbar: File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\tqdm\std.py", line 1181, in __iter__ for obj in iterable: File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\ultralytics\data\build.py", line 50, in __iter__ yield next(self.iterator) ^^^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\utils\data\dataloader.py", line 701, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\utils\data\dataloader.py", line 1448, in _next_data idx, data = self._get_data() ^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\utils\data\dataloader.py", line 1402, in _get_data success, data = self._try_get_data() ^^^^^^^^^^^^^^^^^^^^ File "E:\yolov8study\anaconda\envs\yolov8\Lib\site-packages\torch\utils\data\dataloader.py", line 1256, in _try_get_data raise RuntimeError( RuntimeError: DataLoader worker (pid(s) 8232) exited unexpectedly
这时反馈说:RuntimeError: DataLoader worker (pid(s) 8232) exited unexpectedly 数据加载问题!

可能的原因及解决方案

  1. 内存不足

    • 尽管已经增加了页面文件的大小,但在数据加载过程中可能仍然会遇到内存不足的问题。可以尝试减少 workers 参数的值,例如将其设置为 8 或更低

即原有的命令:yolo detect train data=E:\yolo_train_new\save\my.yaml model=E:\yolo_train_new\save\yolov8n.yaml batch=-1 epochs=3000 imgsz=800 workers=16 device=0 patience=50

修改为:yolo detect train data=E:\yolo_train_new\save\my.yaml model=E:\yolo_train_new\save\yolov8n.yaml batch=-1 epochs=3000 imgsz=800 workers=8 device=0 patience=50

综上:如果遇到执行训练命令出错的问题,解决方法主要有两处:
(1)增加页面文件大小

这个是直接手动修改本地电脑的页面文件大小(一系列页面操作,以手动设置替换操作系统的取消自动分配)

(2)修改训练命令

(1)训练命令的workers数值如果设置的过大,会导致内存不够用,直接导致训练中断报错,无法执行下去!而设置太小会导致训练时间过长(我的是32线程,设置16都报内存不够用,设置8不报错可正确执行)

(2)训练命令的的batch建议还是设置为-1,由系统根据GPU显卡的显存自动设置批次值(一般会按照显存的60%左右进行计算)。当然设小点肯定也行,但是执行训练的时间会变长!

最终命令:yolo detect train data=E:\yolo_train_new\save\my.yaml model=E:\yolo_train_new\save\yolov8n.yaml batch=-1 epochs=3000 imgsz=800 workers=8 device=0 patience=50

注意:imgsz一般建议设置800,640,320方形图片尺寸为好!

posted @ 2025-02-07 11:19  上清风  阅读(755)  评论(0)    收藏  举报