【教程】宇树强化学习框架unitree_rl_gym完整部署流程(更换IsaacLab)

一、环境搭建

我的环境：

CPU：14900k
GPU：4090D
GPU驱动：580
系统：Ubuntu22.05

参考：

https://blog.csdn.net/qq_28912651/article/details/153054568
- 完整流程
https://www.cnblogs.com/quantoublog/articles/19185985
- 环境搭建
https://cyclonedds.io/docs/cyclonedds/latest/installation/installation.html
- CycloneDDS

1.1 虚拟环境

原版unitree_rl_gym依赖IsaacGym，但已被弃用，推荐使用IsaacLab.

conda create -n unitree_rl python=3.11 -y
conda activate unitree_rl

# 后续所有python、pip相关命令都在该环境下运行

1.2 安装依赖

1.2.1 pytorch

升级pip

pip install --upgrade pip

安装对应版本的pytorch

# 这里我选择的cuda128，大家请对好版本
pip install torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128

1.2.2 Isaac Sim

pip install "isaacsim[all,extscache]==5.1.0" --extra-index-url https://pypi.nvidia.com

1.2.3 Isaac Lab

Isaac Lab 是 Isaac Gym 的继任者，是一个面向多模态机器人学习的GPU加速仿真框架。它专为大规模强化学习（RL）和模仿学习设计，强调：

高保真物理仿真
真实感渲染
模块化、可扩展架构
支持多种传感器模态（视觉、触觉、IMU等）
sim-to-real迁移能力

使用源码安装：

git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab
./isaaclab.sh --install

报错：https://github.com/isaac-sim/rl_games.git下载失败

# 手动下载并安装，如果还是下载不了（毕竟很大），可以用国内加速
git clone --branch python3.11 https://github.com/isaac-sim/rl_games.git

cd ~/rl_games

# 安装 rl_games
pip install -e .
pip install gym

# 然后重新安装lab

测试环境

./isaaclab.sh -p scripts/demos/bipeds.py
./isaaclab.sh -p scripts/demos/h1_locomotion.py

1.2.4 安装 RL 算法库 rsl_rl

库	框架	代表性任务	优势	适合场景
RL-Games	PyTorch	Isaac Gym/Isaac Lab baseline	GPU 并行优化、PPO 稳定	大规模训练，官方推荐
RSL-RL	PyTorch	ANYmal、Unitree G1/H1	locomotion 最强，ETH 出品	行走/运动控制
SKRL	PyTorch/JAX	Shadow Hand, 多智能体	支持 JAX、多智能体 MAPPO	多手、多机器人
SB3	PyTorch	教学、经典控制	文档全，社区大	入门、小规模实验

在IsaacLab目录下进行安装

cd IsaacLab
./isaaclab.sh -i rsl_rl

1.2.5 sdk2py

unitree_sdk2py 库是用来和实物机器人通信的库，如果想要将训练的模型部署到实物机器人上运行，需要安装 unitree_sdk2py

git clone https://github.com/unitreerobotics/unitree_sdk2_python.git
cd unitree_sdk2_python
pip install -e .

报错

Collecting cyclonedds==0.10.2 (from unitree_sdk2py==1.0.1)
  Downloading cyclonedds-0.10.2.tar.gz (156 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      Could not locate cyclonedds. Try to set CYCLONEDDS_HOME or CMAKE_PREFIX_PATH
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed to build 'cyclonedds' when getting requirements to build wheel

# 缺少cyclonedds环境，pip会先找适合运行环境的预编译版本，如果没有找到，则会在本机找环境变量
# 这里需要自己构建cyclonedds

下载并构建

# 这里参考官方文档和宇树文档：
# https://cyclonedds.io/docs/cyclonedds/latest/installation/installation.html
# https://github.com/unitreerobotics/unitree_sdk2_python
sudo apt-get install git cmake gcc

git clone https://github.com/eclipse-cyclonedds/cyclonedds -b releases/0.10.x 
cd cyclonedds && mkdir build install && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
cmake --build . --target install
# 此时将编译后二进制文件放到 cyclonedds/install 下

设置环境变量

# 因人而异，找到自己的路径
export CYCLONEDDS_HOME="$HOME/Source/cyclonedds/install"

重装sdk2python

cd unitree_sdk2_python
pip install -e .

报错：numpy

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
isaaclab 0.47.6 requires numpy<2, but you have numpy 2.2.6 which is incompatible.
isaaclab-rl 0.4.4 requires numpy<2, but you have numpy 2.2.6 which is incompatible.
isaacsim-kernel 5.1.0.0 requires numpy==1.26.0, but you have numpy 2.2.6 which is incompatible.
numba 0.59.1 requires numpy<1.27,>=1.22, but you have numpy 2.2.6 which is incompatible.
dex-retargeting 0.4.6 requires numpy<2.0.0,>=1.21.0, but you have numpy 2.2.6 which is incompatible.
isaaclab-tasks 0.11.6 requires numpy<2, but you have numpy 2.2.6 which is incompatible.
cmeel-boost 1.83.0 requires numpy~=1.26.0; python_version >= "3.9", but you have numpy 2.2.6 which is incompatible.
Successfully installed numpy-2.2.6 unitree_sdk2py-1.0.1

# isaaclab需要 numpy<2，但是现在是numpy 2.2.6 

# 修改 setup.py ，找到 numpy，修改为："numpy==1.26.0"

setup(name='unitree_sdk2py',
      version='1.0.1',
      author='UnitreeRobotics',
      author_email='unitree@unitree.com',
      long_description=open('README.md').read(),
      long_description_content_type="text/markdown",
      license="BSD-3-Clause",
      packages=find_packages(include=['unitree_sdk2py','unitree_sdk2py.*']),
      description='Unitree robot sdk version 2 for python',
      project_urls={
            "Source Code": "https://github.com/unitreerobotics/unitree_sdk2_python",
      },
      python_requires='>=3.8',
      install_requires=[
            "cyclonedds==0.10.2",
            "numpy==1.26.0",
            "opencv-python",
      ],
      )

二、训练

主要参考：

https://blog.csdn.net/qq_28912651/article/details/153054568
- 搭建、介绍
https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_existing_scripts.html
- 官方文档，脚本、命令

整体流程为：

Train → Play → Sim2Sim → Sim2Real

Train：在 Isaac Lab 任务上并行仿真训练策略（默认无界面更快）

Play：加载训练好的 checkpoint 在仿真中回放/可视化

Sim2Sim：把导出的策略放到其它仿真器（例如 Mujoco）验证迁移

Sim2Real：把策略部署到实物机器人（需调试模式/安全防护）

2.1 启动

这里我们使用Isaac Lab 下 G1机器人的官方示例

# 进入Isaac Lab，搜索其支持的G1示例
cd IsaacLab
./isaaclab.sh -p scripts/environments/list_envs.py | grep "G1"

# 或者用 grep 直接搜索
grep -R "G1" -n source/isaaclab_tasks

# 最终得到：
Isaac-Velocity-Rough-G1-v0  
Isaac-Velocity-Rough-G1-Play-v0
Isaac-Velocity-Flat-G1-v0
Isaac-Velocity-Flat-G1-Play-v0 

# 每个任务名称包含了各自的含义
# Velocity：任目标是速度跟踪，训练机器人跟随给定的线速度和角速度指令移动
# Rough：崎岖地形
# Flat：平坦地形
# Play：任务适用于回放/可视化（渲染更友好）

这里因为我们要训练，因此选择非play版本

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
  --task Isaac-Velocity-Rough-G1-v0 --headless --num_envs 4096 --seed 0
  
# 训练过程中，可以通过以下命令查看GPU占用
watch -n 0.5 nvidia-smi

参数说明
- -task：
  - 任务 ID
- --headless：
  - 禁用GUI，显著提高速度，且方便远程
- --num_envs：并行环境数量（GPU 吃得开可以往上加），24GB 显存的卡通常能跑 2k~8k，显存不够就降低
  --seed：
  - 训练的随机种子。为了确保实验的可重现性，相同的代码、相同的种子、相同的环境设置应该得到相似的结果。
- --experiment_name g1_rough / --run_name run1：
  - 控制日志目录名
- --max_iterations 10000：
  - 最大迭代数
- --sim_device cuda:0 / --rl_device cuda:0|cpu：
  - 仿真和 RL 的计算设备（默认会用 GPU）
- --resume：
  - 从最近一次 checkpoint 继续
这里为了演示，所以开启了GUI

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
  --task Isaac-Velocity-Rough-G1-v0 --num_envs 4096 --seed 0

效果：

这里我测试了一下，3000轮迭代次数，开GUI需要两个小时，不开GUI需要1个小时（4090D）

2.2 进程中监控

2.2.1 checkpoint

checkpoint，每50轮保存一个，方便后续回放和中断后恢复

# 路径
cd ~/Source/IsaacLab/logs/rsl_rl/g1_rough/2025-xx-xx

# 说明
logs/rsl_rl/<task>/<date_time>_<run_name>/model_<iter>.pt  

# 如：
-rw-rw-r-- 1 zhaoshuai zhaoshuai 7.5M 11月  4 11:48 model_0.pt
-rw-rw-r-- 1 zhaoshuai zhaoshuai 7.5M 11月  4 11:51 model_100.pt
-rw-rw-r-- 1 zhaoshuai zhaoshuai 7.5M 11月  4 11:53 model_150.pt
-rw-rw-r-- 1 zhaoshuai zhaoshuai 7.5M 11月  4 11:55 model_200.pt
-rw-rw-r-- 1 zhaoshuai zhaoshuai 7.5M 11月  4 11:49 model_50.pt

2.2.2 可视化

TensorFlow是TensorFlow官方开发的可视化工具套件，用于图形化展示机器学习模型的计算图结构，监控训练过程中的各项指标。

./isaaclab.sh -p -m tensorboard.main --logdir=logs

# 输出中会包含一个控制面板链接
TensorBoard 2.20.0 at http://localhost:6006/ (Press CTRL+C to quit)

# 在训练机上访问（或者有端口映射，在外网访问）

每张子图含义说明：

横坐标
- 训练步数
纵坐标
- 平均每回合 (Episode) 的奖励值（Reward Value）。表示模型在最近完成的回合中，从该特定奖励子项中获得的平均分数。
每张子图都是一个奖励子项
- 围绕鼓励机器人精确移动、惩罚错误的行为和能量消耗
- 正面奖励项的曲线应该持续上升，表明机器人的速度跟踪越来越好
- 惩罚项（如 action_rate_I2）的值通常是负值。其曲线应该向 0 或一个较小的负值收敛，表明惩罚在减小，机器人行为更平稳或更高效。

2.3 回放 / 导出策略

该部分均参考自：https://blog.csdn.net/qq_28912651/article/details/153054568

2.3.1 回放最近一次（会自动加载最近的模型）：

# 回放最近一次
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
  --task Isaac-Velocity-Rough-G1-v0 --num_envs 32
 
# 指定实验/模型
#  <exp> : g1_rough
#  <run> : 具体的时间戳，如2025-11-04_08-22-31
#  <N>   : model_N.pt中的N
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
  --task Isaac-Velocity-Rough-G1-v0 \
  --num_envs 32 \
  --experiment_name <exp> --load_run <run> --checkpoint <N>
  
# 回放默认是有界面的（除非加 --headless）。--num_envs=32 是为了并行多实例查看效果；显卡不够可以降到 8/16

2.3.2 指定某次实验与 checkpoint：

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
  --task Isaac-Velocity-Rough-G1-v0 \
  --num_envs 16 \
  --checkpoint ~/Source/IsaacLab/logs/rsl_rl/g1_rough/2025-11-04_08-22-31/model_900.pt

可以加 --headless --video --video_length 200 让它无界面渲染并录视频（需要先装 ffmpeg）

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \
  --task Isaac-Velocity-Rough-G1-v0 \
  --num_envs 8 --headless \
  --checkpoint ~/Source/IsaacLab/logs/rsl_rl/g1_rough/2025-11-04_08-22-31/model_900.pt \
  --video --video_length 300

2.3.3 继续训练 / 从某个模型续训

续训最近一次（同目录下的最后一个模型）：

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
  --task Isaac-Velocity-Rough-G1-v0 \
  --headless --num_envs 1024 \
  --resume

指定从某个 checkpoint 开始：

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \
  --task Isaac-Velocity-Rough-G1-v0 \
  --headless --num_envs 1024 \
  --experiment_name g1_rough \
  --load_run 2025-11-04_08-22-31 \
  --checkpoint 900 \
  --resume

2.3.4 导出策略（给 Sim2Sim/部署）

运行一次 play（上面的任一回放命令）后，会自动把 Actor 网络导出到 exported 中，后续 Sim2Sim（Mujoco 等）或 Sim2Real 可以直接引用这个导出的 policy_*.pt

ll ./logs/rsl_rl/g1_rough/2025-11-04_11-47-11/exported/                           

-rw-rw-r-- 1 zhaoshuai zhaoshuai 1.3M 11月  4 13:46 policy.onnx
-rw-rw-r-- 1 zhaoshuai zhaoshuai 1.3M 11月  4 13:46 policy.pt

Actor：
- 策略网络（Policy Network）或行动者网络；
- 在强化学习中，Actor 是决定机器人行为的核心神经网络。它接收来自环境的观测（Observation）（如关节角度、速度、IMU数据等）作为输入，然后输出机器人应该采取的动作（Action）（如关节目标位置或力矩）；
- 它是训练结果的核心部分。当加载一个checkpoint时，主要关注的就是从中提取 Actor 网络的权重。
exported：
- 导出目录
- 由 Isaac Lab 框架自动创建和使用的目录，用于存放精简和格式化后的策略文件
- 如：IsaacLab / logs / rsl_rl / g1_rough / 2025-11-04_08-22-31 /
policy_*.pt：
- 导出的策略文件；
- 只包含 Actor 网络权重和结构的 PyTorch 文件。
- Play 脚本执行后，会自动从加载的 checkpoint 中提取 Actor 网络的 state_dict，并将其保存为这个文件。

我们在导出时，可通过 TensorBoard 看 reward 曲线，选平均回报最高且稳定的迭代或选取靠后的Episode进行观测。

三、Sim2Sim

Sim2Sim，从仿真到仿真，指的是将一个强化学习策略（Policy）在一个仿真环境A（如Isaac Lab）中训练好后，部署到另一个仿真环境B（如Mujoco）中的过程。

主要有以下几个目的

验证策略的可移植性和泛化性
- 测试鲁棒性：不同的仿真器有不同的物理引擎、数值求解器和渲染管线。在 Isaac Lab 训练的策略，如果能在另一个物理引擎（如 MuJoCo）中表现良好，则表明该策略不依赖于特定仿真器的数值特性，具有更强的鲁棒性。
- 物理模型的差异：仿真器之间对摩擦、碰撞和重力等参数的实现方式可能存在微小差异。Sim2Sim 可以验证策略对这些差异的容忍度。
利用不同仿真器的优势
- 性能/速度： Isaac Lab 擅长大规模并行训练。策略训练完成后，可以在一个轻量级或无需图形界面的仿真器中进行快速、高频率的部署和测试，以节省资源。
- 特定功能：某些仿真器可能在特定方面（如接触建模、特定传感器模拟）具有优势。可以在 Sim A 训练基础能力，在 Sim B 验证特殊功能。
为 Sim2Real 做准备
- 基准测试： Sim2Sim 过程可以作为 Sim2Real 部署前的最后一次测试。如果在两个高质量的仿真器之间迁移失败了，那么它几乎不可能在真实世界中工作。
- 格式和接口：导出 policy_*.pt 文件的过程就是为了将策略网络转换为一个通用且干净的接口，这是部署到真实机器人系统（Sim2Real）的第一步。

对于 IsaacLab 来说，Sim2Sim有两种常见的方式：

IsaacLab → 另一种 Isaac 场景/配置（最稳）
IsaacLab → Mujoco（或其它仿真器）

由于Mujoco和IsaacLab遍历关节的顺序不同，导致直接导入IsaacLab生成的policy.pt机器人会直接倒地，此时我们需要修改IsaacLab的关节生成顺序，此处参考：

https://blog.csdn.net/weixin_52162723/article/details/145665659

导出所有关节脚本：

# scripts/export/export_g1_dof.py
# -*- coding: utf-8 -*-
"""
导出 Unitree G1 在 Isaac Lab 中的关节（DOF）名称顺序为 JSON。
执行完成后自动关闭模拟器与 Omniverse App。

./isaaclab.sh -p export_script/export_g1_dof.py \
  --variant g1 \
  --prim "/World/Origin.*/Robot" \
  --out export/g1_isaaclab_dof_order.json

"""

import argparse
import json
import os
from isaaclab.app import AppLauncher

# ========== 命令行参数 ==========
parser = argparse.ArgumentParser(description="Export Unitree G1 joint/DOF order from Isaac Lab (Interactive articulation).")
AppLauncher.add_app_launcher_args(parser)
parser.add_argument("--out", type=str, default="export/g1_isaaclab_dof_order.json", help="输出 JSON 文件路径")
parser.add_argument("--prim", type=str, default="/World/Origin.*/Robot", help="G1 资产 prim_path（支持通配）")
parser.add_argument("--variant", type=str, default="g1",
                    choices=["g1", "g1-min", "g1-29dof", "g1-inspire"],
                    help="选择 G1 配置变体")
parser.add_argument("--headless", action="store_true", help="Run training without the UI.")
args_cli = parser.parse_args()

# ========== 启动 Omniverse App ==========
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

# ========== 导入 Isaac Lab 模块 ==========
import torch
import isaacsim.core.utils.prims as prim_utils
import isaaclab.sim as sim_utils
from isaaclab.assets import Articulation
from isaaclab.sim import SimulationContext

from isaaclab_assets import (
    G1_CFG,
    G1_MINIMAL_CFG,
    G1_29DOF_CFG,
    G1_INSPIRE_FTP_CFG,
)

# ========== 函数部分 ==========

def _select_g1_cfg(variant: str):
    if variant == "g1":
        return G1_CFG
    if variant == "g1-min":
        return G1_MINIMAL_CFG
    if variant == "g1-29dof":
        return G1_29DOF_CFG
    if variant == "g1-inspire":
        return G1_INSPIRE_FTP_CFG
    raise ValueError(f"Unknown variant: {variant}")

def design_scene(prim_path: str, variant: str) -> tuple[dict, list[list[float]]]:
    """搭建最小场景并放置 G1 articulation。"""
    # 地面与光照
    cfg = sim_utils.GroundPlaneCfg()
    cfg.func("/World/defaultGroundPlane", cfg)
    cfg = sim_utils.DomeLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
    cfg.func("/World/Light", cfg)

    origins = [[0.0, 0.0, 0.0]]
    prim_utils.create_prim("/World/Origin1", "Xform", translation=origins[0])

    g1_cfg = _select_g1_cfg(variant).copy()
    g1_cfg.prim_path = prim_path
    robot = Articulation(cfg=g1_cfg)
    return {"robot": robot}, origins

def export_joint_names(robot: Articulation, out_path: str):
    """导出 DOF 名称顺序"""
    names = None
    if hasattr(robot, "data") and hasattr(robot.data, "joint_names") and robot.data.joint_names is not None:
        try:
            names = list(robot.data.joint_names)
        except Exception:
            names = None

    if not names:
        try:
            names = list(robot.articulation_view.get_dof_names())
        except Exception:
            names = None

    if not names:
        raise RuntimeError("未能获取关节/DOF 名称，请确认 G1 资产正确加载。")

    os.makedirs(os.path.dirname(out_path) or ".", exist_ok=True)
    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(names, f, ensure_ascii=False, indent=2)

    print("\n[INFO] G1 DOF 顺序：")
    for i, n in enumerate(names):
        print(f"{i:02d}: {n}")
    print(f"\n[OK] 已导出到: {out_path}")

def main():
    """主流程：构建场景、导出 DOF 名称"""
    sim_cfg = sim_utils.SimulationCfg(device=args_cli.device)
    sim = SimulationContext(sim_cfg)
    sim.set_camera_view([2.5, 0.0, 4.0], [0.0, 0.0, 2.0])

    # 构建场景并加载 G1
    scene_entities, scene_origins = design_scene(args_cli.prim, args_cli.variant)
    scene_origins = torch.tensor(scene_origins, device=sim.device)

    sim.reset()
    print("[INFO]: Setup complete...")

    robot = scene_entities["robot"]

    # 确保数据缓冲刷新
    root_state = robot.data.default_root_state.clone()
    root_state[:, :3] += scene_origins
    robot.write_root_pose_to_sim(root_state[:, :7])
    robot.write_root_velocity_to_sim(root_state[:, 7:])
    robot.reset()
    robot.write_data_to_sim()
    sim.step()
    robot.update(sim.get_physics_dt())

    export_joint_names(robot, args_cli.out)

# ========== 主程序入口 ==========
if __name__ == "__main__":
    try:
        main()
    except Exception as e:
        print(f"[ERROR] 执行出错：{e}")
    finally:
        try:
            # 关闭模拟器与 App
            print("[INFO] 正在关闭模拟器与 Omniverse 应用...")
            simulation_app.close()
        except Exception:
            pass
        print("[DONE] 任务完成，应用已退出。")

四、Sim2Real

等待部署后补充。

posted @ 2025-11-04 16:10 小拳头呀阅读(12) 评论(0) 收藏举报

刷新页面返回顶部

不想重名的小拳头