dreamerv3-hafner实践笔记
https://github.com/danijar/dreamerv3
Docker环境配置
1. 安装 Docker Desktop
https://www.docker.com/products/docker-desktop/
2. 镜像容器文件位置
docker.desktop -> Settings -> Resources
Disk image location 配置位置,防止积压在c盘
3. 按照Docker配置环境
docker build --progress=plain -f Dockerfile -t dreamerv3-hafner .
-f 指定配置文件路径 (默认为Dockerfile,可省略)
-t 指定镜像名称
. 安装路径
--progress=plain 输出完整日志,方便错误定位
ERROR: /drivers/lab/python/pip_package/BUILD:6:1: name 'sh_binary' is not defined
错误原因
DMLab 的安装脚本(install-dmlab.sh)下载了最新版 Bazel 9.1.1,但 DMLab 的 BUILD 文件用了旧式 sh_binary规则。Bazel 8+ 默认开启 bzlmod,内置规则不再隐式可用,所以编译失败。
解决方案
DMLab 只是 DreamerV3 的一个可选环境,大部分任务(Crafter / Atari / Minecraft等)不依赖该环境,可先跳过。
Dockerfile 注释掉RUN wget -O - https://gist.githubusercontent.com/danijar/ca6ab917188d2e081a8253b3ca5c36d3/raw/installdmlab.sh | sh
其他解决方案:如果后续需要用到--configs dmlab,去掉注释,改用社区fork,github.com/google-deepmind/dmlab 官方仓库已更新支持新 Bazel
ERROR: failed to build: failed to solve: ghcr.io/nvidia/driver:7c5f8932-550.144.03-ubuntu24.04: failed to resolve source metadata for ghcr.io/nvidia/driver:7c5f8932-550.144.03-ubuntu24.04: failed to do request: Head "https://ghcr.io/v2/nvidia/driver/manifests/7c5f8932-550.144.03-ubuntu24.04": EOF
错误原因
EOF,连接被对端重置,国内访问 GitHub Container Registry 的网络问题
解决方案
挂**+手动拉取
4.Docker安装确认
docker images dreamerv3-hafner 镜像确认
docker run -it --rm --gpus all dreamerv3-hafner python -c "import jax; print('devices:', jax.devices()); print('backend:', jax.default_backend())" GPU确认
docker run -it --rm --gpus all -v D:\logdir:/logdir dreamerv3-hafner python dreamerv3/main.py --logdir /logdir/smoketest --configs crafter debug --steps 500 冒烟测试
冒烟测试样例输出
PIP freeze (subset):
nvidia-cublas-cu12==12.9.2.10
nvidia-cuda-cupti-cu12==12.9.79
nvidia-cuda-nvcc-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.9.86
nvidia-cuda-runtime-cu12==12.9.79
nvidia-cudnn-cu12==9.23.2.1
nvidia-cufft-cu12==11.4.1.4
nvidia-cusolver-cu12==11.7.5.82
nvidia-cusparse-cu12==12.5.10.65
nvidia-nccl-cu12==2.30.7
nvidia-nvjitlink-cu12==12.9.86
jax==0.4.33
jax-cuda12-pjrt==0.4.33
jax-cuda12-plugin==0.4.33
jaxlib==0.4.33
jaxtyping==0.3.11
ninjax==3.6.3
GCP instance:
Name: NA
Hostname: NA
ID: NA
Zone: NA
GPUs:
name, memory.total [MiB], driver_version
NVIDIA GeForce RTX 4070, 12282 MiB, 591.86
--- ___ __ ______ ---
--- | \ _ _ ___ __ _ _ __ ___ _ \ \ / /__ / ---
--- | |) | '_/ -_) _` | ' \/ -_) '/\ V / |_ \ ---
--- |___/|_| \___\__,_|_|_|_\___|_| \_/ |___/ ---
Replica: 0 / 1
Logdir: /logdir/smoketest
Run script: train
Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
Observations
image Space(uint8, shape=(64, 64, 3), low=0, high=255)
reward Space(float32, shape=(), low=-inf, high=inf)
is_first Space(bool, shape=(), low=False, high=True)
is_last Space(bool, shape=(), low=False, high=True)
is_terminal Space(bool, shape=(), low=False, high=True)
Actions
action Space(int32, shape=(), low=0, high=17)
Extras
consec Space(int32, shape=(), low=-2147483648, high=2147483647)
stepid Space(uint8, shape=(20,), low=0, high=255)
dyn/deter Space(float32, shape=(8,), low=-inf, high=inf)
dyn/stoch Space(float32, shape=(2, 4), low=-inf, high=inf)
JAX devices (1): [TFRT_CPU_0]
Policy devices: TFRT_CPU_0
Train devices: TFRT_CPU_0
Initializing parameters...
Optimizer opt has 11,735 params:
6,339 dec
3,752 enc
816 dyn
297 pol
189 rew
189 val
153 con
Done initializing!
Compiling 1 checkpoint groups...
Largest checkpoint group: 0 GB
Compiling train and report...
Train cost analysis:
FLOPS: 2.2e+09
Memory (temp): 4.0e+07
Memory (inputs): 1.2e+06
Memory (outputs): 1.5e+05
Memory (code): 1.1e+06
Report cost analysis:
FLOPS: 3.3e+08
Memory (temp): 5.2e+06
Memory (inputs): 6.4e+05
Memory (outputs): 2.4e+06
Memory (code): 1.4e+06
Done compiling!
Did not find any checkpoint.
Saving checkpoint: /logdir/smoketest/ckpt/20260622T081504F872704
Saved checkpoint.
Start training loop
--------------------[Agent Step 360]--------------------
Metrics filtered by: 'score|length|fps|ratio|train/loss/|train/rand/'
train/loss/con 0.9 / train/loss/dyn 1 / train/loss/image 1678.8 / train/loss/policy -4e-5 / train/loss/rep 1 / train/loss/repval 3.22 / train/loss/rew 1.61 / train/loss/value 0.15 / train/rand/action 1 / replay/replay_ratio 7.15 / fps/policy 41.48 / fps/train 230.45
Writing metrics: /logdir/smoketest/metrics.jsonl
Writing metrics: /logdir/smoketest/scores.jsonl
5. 正式运行
docker run -it --gpus all -v D:\logdir:/logdir --name crafter1 dreamerv3-hafner python dreamerv3/main.py --logdir /logdir/crafter_run1 --configs crafter
docker rm -f crafter1 //如果断了,清空容器再跑
Crafter:2D生存建造游戏,Minecraft的简化版本,64×64 像素世界的建造生存。以其为样例进行学习。
期望输出,表示正在跑
Done compiling!
Did not find any checkpoint.
Saving checkpoint: /logdir/crafter_run1/ckpt/20260622T082242F452072
Saved checkpoint.
Start training loop
Start JAX profiler: /logdir/crafter_run1
2026-06-22 08:24:22.069994: E external/xla/xla/python/profiler/internal/python_hooks.cc:400] Can't import tensorflow.python.profiler.trace
Stop JAX profiler
2026-06-22 08:24:37.416122: E external/xla/xla/python/profiler/internal/python_hooks.cc:400] Can't import tensorflow.python.profiler.trace
--------------------[Agent Step 1_370]--------------------
Metrics filtered by: 'score|length|fps|ratio|train/loss/|train/rand/'
episode/score 2.1 / episode/length 168 / train/loss/con 0.16 / train/loss/dyn 13.75 / train/loss/image 694.56 / train/loss/policy 0.03 / train/loss/rep 13.75 / train/loss/repval 10.36 / train/loss/rew 5.18 / train/loss/value 6.85 / train/rand/action 1 / replay/replay_ratio 113.87 / fps/policy 10.1 / fps/train 1071.76
Writing metrics: /logdir/crafter_run1/metrics.jsonl
Writing metrics: /logdir/crafter_run1/scores.jsonl
--------------------[Agent Step 1_700]--------------------
Metrics filtered by: 'score|length|fps|ratio|train/loss/|train/rand/'
episode/score 0.1 / episode/length 203 / train/loss/con 0.05 / train/loss/dyn 8.08 / train/loss/image 175.28 / train/loss/policy 0.08 / train/loss/rep 8.08 / train/loss/repval 6.24 / train/loss/rew 2.7 / train/loss/value 5.08 / train/rand/action 1 / replay/replay_ratio 520 / fps/policy 2.73 / fps/train 1399.93
其他游戏的运行方式
Atari Pong: --configs atari --task atari_pong
DMC Cartpole: --configs dmc --task dmc_cartpole
DMC Humanoid: --configs dmc --task dmc_humanoid
Procgen: --configs procgen --task procgen_coinrun
Minecraft Diamond: --configs minecraft --task minecraft_diamond
训练录像观察
DreamerV3 在训练过程中,每隔 report_every(默认 300)步会调用 agent.report(),生成一个叫 openloop/image
的视频摘要,包含:
┌─────────────────────────────────────┐
│ 真实画面(agent 看到的) │ ← 上
├─────────────────────────────────────┤
│ 解码器重构(模型脑子里的画面) │ ← 中
├─────────────────────────────────────┤
│ 误差图(哪里预测错了) │ ← 下
└─────────────────────────────────────┘
这个视频摘要通过 logger 写出,Scope 能直接播放。
Scope安装
https://github.com/danijar/scope
pip install -U scope
报错
Could not fetch URL https://pypi.org/simple/scope/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/scope/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)) - skipping
ERROR: Could not find a version that satisfies the requirement scope
ERROR: No matching distribution found for scope
错误原因:conda SSL证书过期
解决方案:conda install -c anaconda ca-certificates certifi openssl -y 更新证书
在8000端口上开启Scope
python -m scope_viewer --basedir D:/logdir --port 8000
http://localhost:8000/
错误:[32mINFO[0m: 127.0.0.1:3020 - "[1mGET /api/exp/D%3A/logdir/crafter_run1 HTTP/1.1[0m" [31m404 Not Found[0m 等类似其他路径问题
错误原因:scope本身为Linux系统编写,windows的路径不适用,并且具有冒号的编码错误
解决方案:修改源码路径
- server expids = [x.rsplit('/', 1)[-1] for x in folders] 改为 expids = [x.replace('\', '/').rsplit('/', 1)[-1] for x in folders] 修改斜杠和反斜杠的路径切分问题
- server if any(x.endswith('/scope') for x in children): 改为 if any(x.replace('\', '/').endswith('/scope') for x in children): 同上
- server folders = [x.removeprefix(str(basedir))[1:] for x in folders] 改为 folders = [x.replace('\', '/').removeprefix(str(basedir).replace('\', '/')) for x in folders] folders = [x[1:] if x.startswith('/') else x for x in folders] 同上
- filesystems name = str(path).replace(':', '').replace('//', '/').replace('/', '😂 改为 name = str(path).replace(':', '').replace('//', '/').replace('/', '_') 冒号在windows下默认为盘符,非法,改为下划线
其他解决方案:安装ubuntu二操作系统,直接放到linux里跑
Crafter游戏
为强化学习领域综合能力测试而设计的2D生存游戏,目标为在随机生成的像素开放世界中尽可能多的解锁游戏成就(22个成就的工具树状依赖)。测试目标主要为:视觉感知、长期规划、地图探索、多任务目标
设计目标
- Procedural generation — 每局地图随机,防止模型背地图
- Exploration — 稀疏奖励,迫使模型主动探索
- Long-term credit assignment — 资源-工具依赖树,奖励间隔可达数百步
- Multi-task — 22个成就(任务目标),单局达成数量越多score越高
- Sparse vs dense reward — 提供reward(有成就奖励)和no-reward(有无成就奖励)两种模式
- Fast iteration — 64×64 像素,单机可跑

浙公网安备 33010602011771号