dreamerv3-hafner实践笔记

https://github.com/danijar/dreamerv3

Docker环境配置

1. 安装 Docker Desktop

https://www.docker.com/products/docker-desktop/

2. 镜像容器文件位置

docker.desktop -> Settings -> Resources
Disk image location 配置位置,防止积压在c盘

3. 按照Docker配置环境

docker build --progress=plain -f Dockerfile -t dreamerv3-hafner .
-f 指定配置文件路径 (默认为Dockerfile,可省略)
-t 指定镜像名称
. 安装路径
--progress=plain 输出完整日志,方便错误定位

ERROR: /drivers/lab/python/pip_package/BUILD:6:1: name 'sh_binary' is not defined

错误原因
DMLab 的安装脚本(install-dmlab.sh)下载了最新版 Bazel 9.1.1,但 DMLab 的 BUILD 文件用了旧式 sh_binary规则。Bazel 8+ 默认开启 bzlmod,内置规则不再隐式可用,所以编译失败。
解决方案
DMLab 只是 DreamerV3 的一个可选环境,大部分任务(Crafter / Atari / Minecraft等)不依赖该环境,可先跳过。
Dockerfile 注释掉RUN wget -O - https://gist.githubusercontent.com/danijar/ca6ab917188d2e081a8253b3ca5c36d3/raw/installdmlab.sh | sh
其他解决方案:如果后续需要用到--configs dmlab,去掉注释,改用社区fork,github.com/google-deepmind/dmlab 官方仓库已更新支持新 Bazel

ERROR: failed to build: failed to solve: ghcr.io/nvidia/driver:7c5f8932-550.144.03-ubuntu24.04: failed to resolve source metadata for ghcr.io/nvidia/driver:7c5f8932-550.144.03-ubuntu24.04: failed to do request: Head "https://ghcr.io/v2/nvidia/driver/manifests/7c5f8932-550.144.03-ubuntu24.04": EOF

错误原因
EOF,连接被对端重置,国内访问 GitHub Container Registry 的网络问题
解决方案
挂**+手动拉取

4.Docker安装确认

docker images dreamerv3-hafner 镜像确认
docker run -it --rm --gpus all dreamerv3-hafner python -c "import jax; print('devices:', jax.devices()); print('backend:', jax.default_backend())" GPU确认
docker run -it --rm --gpus all -v D:\logdir:/logdir dreamerv3-hafner python dreamerv3/main.py --logdir /logdir/smoketest --configs crafter debug --steps 500 冒烟测试
冒烟测试样例输出

PIP freeze (subset):
nvidia-cublas-cu12==12.9.2.10
nvidia-cuda-cupti-cu12==12.9.79
nvidia-cuda-nvcc-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.9.86
nvidia-cuda-runtime-cu12==12.9.79
nvidia-cudnn-cu12==9.23.2.1
nvidia-cufft-cu12==11.4.1.4
nvidia-cusolver-cu12==11.7.5.82
nvidia-cusparse-cu12==12.5.10.65
nvidia-nccl-cu12==2.30.7
nvidia-nvjitlink-cu12==12.9.86
jax==0.4.33
jax-cuda12-pjrt==0.4.33
jax-cuda12-plugin==0.4.33
jaxlib==0.4.33
jaxtyping==0.3.11
ninjax==3.6.3
GCP instance:
Name:     NA
Hostname: NA
ID:       NA
Zone:     NA

GPUs:
name, memory.total [MiB], driver_version
NVIDIA GeForce RTX 4070, 12282 MiB, 591.86

---  ___                           __   ______ ---
--- |   \ _ _ ___ __ _ _ __  ___ _ \ \ / /__ / ---
--- | |) | '_/ -_) _` | '  \/ -_) '/\ V / |_ \ ---
--- |___/|_| \___\__,_|_|_|_\___|_|  \_/ |___/ ---
Replica: 0 / 1
Logdir: /logdir/smoketest
Run script: train
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Observations
  image            Space(uint8, shape=(64, 64, 3), low=0, high=255)
  reward           Space(float32, shape=(), low=-inf, high=inf)
  is_first         Space(bool, shape=(), low=False, high=True)
  is_last          Space(bool, shape=(), low=False, high=True)
  is_terminal      Space(bool, shape=(), low=False, high=True)
Actions
  action           Space(int32, shape=(), low=0, high=17)
Extras
  consec           Space(int32, shape=(), low=-2147483648, high=2147483647)
  stepid           Space(uint8, shape=(20,), low=0, high=255)
  dyn/deter        Space(float32, shape=(8,), low=-inf, high=inf)
  dyn/stoch        Space(float32, shape=(2, 4), low=-inf, high=inf)
JAX devices (1): [TFRT_CPU_0]
Policy devices: TFRT_CPU_0
Train devices:  TFRT_CPU_0
Initializing parameters...
Optimizer opt has 11,735 params:
         6,339 dec
         3,752 enc
           816 dyn
           297 pol
           189 rew
           189 val
           153 con
Done initializing!
Compiling 1 checkpoint groups...
Largest checkpoint group: 0 GB
Compiling train and report...
Train cost analysis:
  FLOPS:            2.2e+09
  Memory (temp):    4.0e+07
  Memory (inputs):  1.2e+06
  Memory (outputs): 1.5e+05
  Memory (code):    1.1e+06

Report cost analysis:
  FLOPS:            3.3e+08
  Memory (temp):    5.2e+06
  Memory (inputs):  6.4e+05
  Memory (outputs): 2.4e+06
  Memory (code):    1.4e+06

Done compiling!
Did not find any checkpoint.
Saving checkpoint: /logdir/smoketest/ckpt/20260622T081504F872704
Saved checkpoint.
Start training loop

--------------------[Agent Step 360]--------------------
Metrics filtered by: 'score|length|fps|ratio|train/loss/|train/rand/'
train/loss/con 0.9 / train/loss/dyn 1 / train/loss/image 1678.8 / train/loss/policy -4e-5 / train/loss/rep 1 / train/loss/repval 3.22 / train/loss/rew 1.61 / train/loss/value 0.15 / train/rand/action 1 / replay/replay_ratio 7.15 / fps/policy 41.48 / fps/train 230.45

Writing metrics: /logdir/smoketest/metrics.jsonl
Writing metrics: /logdir/smoketest/scores.jsonl

5. 正式运行

docker run -it --gpus all -v D:\logdir:/logdir --name crafter1 dreamerv3-hafner python dreamerv3/main.py --logdir /logdir/crafter_run1 --configs crafter
docker rm -f crafter1 //如果断了,清空容器再跑
Crafter:2D生存建造游戏,Minecraft的简化版本,64×64 像素世界的建造生存。以其为样例进行学习。
期望输出,表示正在跑

Done compiling!
Did not find any checkpoint.
Saving checkpoint: /logdir/crafter_run1/ckpt/20260622T082242F452072
Saved checkpoint.
Start training loop
Start JAX profiler: /logdir/crafter_run1
2026-06-22 08:24:22.069994: E external/xla/xla/python/profiler/internal/python_hooks.cc:400] Can't import tensorflow.python.profiler.trace
Stop JAX profiler
2026-06-22 08:24:37.416122: E external/xla/xla/python/profiler/internal/python_hooks.cc:400] Can't import tensorflow.python.profiler.trace

--------------------[Agent Step 1_370]--------------------
Metrics filtered by: 'score|length|fps|ratio|train/loss/|train/rand/'
episode/score 2.1 / episode/length 168 / train/loss/con 0.16 / train/loss/dyn 13.75 / train/loss/image 694.56 / train/loss/policy 0.03 / train/loss/rep 13.75 / train/loss/repval 10.36 / train/loss/rew 5.18 / train/loss/value 6.85 / train/rand/action 1 / replay/replay_ratio 113.87 / fps/policy 10.1 / fps/train 1071.76

Writing metrics: /logdir/crafter_run1/metrics.jsonl
Writing metrics: /logdir/crafter_run1/scores.jsonl


--------------------[Agent Step 1_700]--------------------
Metrics filtered by: 'score|length|fps|ratio|train/loss/|train/rand/'
episode/score 0.1 / episode/length 203 / train/loss/con 0.05 / train/loss/dyn 8.08 / train/loss/image 175.28 / train/loss/policy 0.08 / train/loss/rep 8.08 / train/loss/repval 6.24 / train/loss/rew 2.7 / train/loss/value 5.08 / train/rand/action 1 / replay/replay_ratio 520 / fps/policy 2.73 / fps/train 1399.93

其他游戏的运行方式

Atari Pong: --configs atari --task atari_pong
DMC Cartpole: --configs dmc --task dmc_cartpole
DMC Humanoid: --configs dmc --task dmc_humanoid
Procgen: --configs procgen --task procgen_coinrun
Minecraft Diamond: --configs minecraft --task minecraft_diamond

训练录像观察

DreamerV3 在训练过程中,每隔 report_every(默认 300)步会调用 agent.report(),生成一个叫 openloop/image
的视频摘要,包含:

┌─────────────────────────────────────┐
│ 真实画面(agent 看到的) │ ← 上
├─────────────────────────────────────┤
│ 解码器重构(模型脑子里的画面) │ ← 中
├─────────────────────────────────────┤
│ 误差图(哪里预测错了) │ ← 下
└─────────────────────────────────────┘
这个视频摘要通过 logger 写出,Scope 能直接播放。

Scope安装

https://github.com/danijar/scope
pip install -U scope

报错

Could not fetch URL https://pypi.org/simple/scope/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/scope/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)) - skipping
ERROR: Could not find a version that satisfies the requirement scope
ERROR: No matching distribution found for scope
错误原因:conda SSL证书过期
解决方案:conda install -c anaconda ca-certificates certifi openssl -y 更新证书

在8000端口上开启Scope

python -m scope_viewer --basedir D:/logdir --port 8000
http://localhost:8000/

错误:[32mINFO[0m: 127.0.0.1:3020 - "[1mGET /api/exp/D%3A/logdir/crafter_run1 HTTP/1.1[0m" [31m404 Not Found[0m 等类似其他路径问题

错误原因:scope本身为Linux系统编写,windows的路径不适用,并且具有冒号的编码错误
解决方案:修改源码路径

  1. server expids = [x.rsplit('/', 1)[-1] for x in folders] 改为 expids = [x.replace('\', '/').rsplit('/', 1)[-1] for x in folders] 修改斜杠和反斜杠的路径切分问题
  2. server if any(x.endswith('/scope') for x in children): 改为 if any(x.replace('\', '/').endswith('/scope') for x in children): 同上
  3. server folders = [x.removeprefix(str(basedir))[1:] for x in folders] 改为 folders = [x.replace('\', '/').removeprefix(str(basedir).replace('\', '/')) for x in folders] folders = [x[1:] if x.startswith('/') else x for x in folders] 同上
  4. filesystems name = str(path).replace(':', '').replace('//', '/').replace('/', '😂 改为 name = str(path).replace(':', '').replace('//', '/').replace('/', '_') 冒号在windows下默认为盘符,非法,改为下划线

其他解决方案:安装ubuntu二操作系统,直接放到linux里跑

Crafter游戏

为强化学习领域综合能力测试而设计的2D生存游戏,目标为在随机生成的像素开放世界中尽可能多的解锁游戏成就(22个成就的工具树状依赖)。测试目标主要为:视觉感知、长期规划、地图探索、多任务目标

设计目标

  1. Procedural generation — 每局地图随机,防止模型背地图
  2. Exploration — 稀疏奖励,迫使模型主动探索
  3. Long-term credit assignment — 资源-工具依赖树,奖励间隔可达数百步
  4. Multi-task — 22个成就(任务目标),单局达成数量越多score越高
  5. Sparse vs dense reward — 提供reward(有成就奖励)和no-reward(有无成就奖励)两种模式
  6. Fast iteration — 64×64 像素,单机可跑

游戏定义

posted @ 2026-06-22 23:23  donemeb  阅读(9)  评论(0)    收藏  举报