StarAI——Lerobot机械臂算法复现之ACT篇
StarAI——Lerobot机械臂算法复现之ACT篇
理论部分(选看)
为了缩短篇幅,相关理论部分可以查看:
实践部分
1.0系统要求
操作系统:Linux(推荐 Ubuntu 20.04+)或 macOS
Python 版本:3.8+
GPU:NVIDIA GPU(推荐 RTX 3070 或更高),至少 6GB 显存
内存:至少 16GB RAM
存储空间:至少 30GB 可用空间
1.1录制数据集
这部分的教程我已经在上一节StarAI——Lerobot机械臂教程中介绍了,本篇就不再介绍
数据质量要求
最少 50 个 episode 用于基本训练
推荐 200+ episode 以获得最佳效果
每个 episode 应包含完整的任务执行
多视角图像(至少 2 个摄像头)
高质量的动作标注
1.2修改数据集录制算法(选看)
这部分内容是参考B站一位工程师对ACT算法进行改进,我将进行对StarAI分支进行适配
首先打开src/lerobot/datasets/lerobot_dataset.py:
1.2.1 在LeRobotDataset类中增加新方法
# aug:增加均值滤波
def actions_mean_filtering(self, raw_actions: list[list[float]], mean_num: int = 5) -> list[list[float]]:
"""
对动作序列做一维均值滤波:
raw_actions: [action_dim][T]
返回同形状的平滑结果。
"""
action_dim = len(raw_actions)
T = len(raw_actions[0])
filter_actions = [[0.0] * T for _ in range(action_dim)]
for i in range(T):
for d in range(action_dim):
if i < mean_num or i > T - mean_num - 1:
# 头尾不过滤,直接保持原值
filter_actions[d][i] = raw_actions[d][i]
continue
total = 0.0
# 后面 mean_num 个点
for j in range(i + 1, i + 1 + mean_num):
total += raw_actions[d][j]
# 前面 mean_num 个点
for j in range(1, 1 + mean_num):
total += raw_actions[d][i - j]
filter_actions[d][i] = total / (mean_num * 2.0)
return filter_actions
1.2.2修改LeRobotDataset类下的save_episode方法:
def save_episode(
self,
episode_data: dict | None = None,
parallel_encoding: bool = True,
) -> None:
"""
This will save to disk the current episode in self.episode_buffer.
Video encoding is handled automatically based on batch_encoding_size:
- If batch_encoding_size == 1: Videos are encoded immediately after each episode
- If batch_encoding_size > 1: Videos are encoded in batches.
Args:
episode_data (dict | None, optional): Dict containing the episode data to save. If None, this will
save the current episode in self.episode_buffer, which is filled with 'add_frame'. Defaults to
None.
parallel_encoding (bool, optional): If True, encode videos in parallel using ProcessPoolExecutor.
Defaults to True on Linux, False on macOS as it tends to use all the CPU available already.
"""
episode_buffer = episode_data if episode_data is not None else self.episode_buffer
validate_episode_buffer(episode_buffer, self.meta.total_episodes, self.features)
# size and task are special cases that won't be added to hf_dataset
episode_length = episode_buffer.pop("size")
tasks = episode_buffer.pop("task")
episode_tasks = list(set(tasks))
episode_index = episode_buffer["episode_index"]
# aug:对 action 做平滑(如果存在 'action' 这个键)
if "action" in episode_buffer:
# 假设 episode_buffer['action'] 现在是一个 list,长度 T,每个元素是长度 action_dim 的 list/np.array
T = len(episode_buffer["action"])
if T > 0:
action_dim = len(episode_buffer["action"][0])
# 先转成 [action_dim][T]
raw_actions = [[episode_buffer["action"][t][d] for t in range(T)] for d in range(action_dim)]
# 调用你刚定义的均值滤波
filtered = self.actions_mean_filtering(raw_actions, mean_num=5)
# 再写回 episode_buffer['action'],恢复成 [T][action_dim]
for t in range(T):
for d in range(action_dim):
episode_buffer["action"][t][d] = filtered[d][t]
episode_buffer["index"] = np.arange(self.meta.total_frames, self.meta.total_frames + episode_length)
episode_buffer["episode_index"] = np.full((episode_length,), episode_index)
# Update tasks and task indices with new tasks if any
self.meta.save_episode_tasks(episode_tasks)
# Given tasks in natural language, find their corresponding task indices
episode_buffer["task_index"] = np.array([self.meta.get_task_index(task) for task in tasks])
修改效果:
- 每次保存 episode 前,action 这条时间序列会被均值滤波一次
测试:
-
测试方案:使用同一个pick and place任务测试并观察修改前后动作的平滑效果。由于在record函数中,是在保存episode的时候才对轨道进行滤波。那么此时在rerun窗口观测的就是未滤波的轨迹。保存的轨迹是滤波后的。所以只需要写一个程序来读取保存的episode,便可以观察到滤波的效果。
-
测试方法:通过修改
lerobot-replay代码,使得replay拥有播放机械臂运动轨迹的功能:
** 点击查看代码——修改后的lerobot_replay.py **
import logging
import time
from dataclasses import asdict, dataclass
from pathlib import Path
from pprint import pformat
import rerun as rr
from lerobot.configs import parser
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.processor import (
make_default_robot_action_processor,
)
from lerobot.robots import ( # noqa: F401
Robot,
RobotConfig,
bi_openarm_follower,
bi_so_follower,
earthrover_mini_plus,
hope_jr,
koch_follower,
make_robot_from_config,
omx_follower,
openarm_follower,
reachy2,
so_follower,
unitree_g1,
)
from lerobot.utils.constants import ACTION
from lerobot.utils.import_utils import register_third_party_plugins
from lerobot.utils.robot_utils import precise_sleep
from lerobot.utils.utils import (
init_logging,
log_say,
)
from lerobot.utils.visualization_utils import init_rerun, log_rerun_data
@dataclass
class DatasetReplayConfig:
# Episode to replay.
episode: int
# Dataset identifier. By convention it should match '{hf_username}/{dataset_name}' (e.g. `lerobot/test').
# If using a local dataset, this can be None and root should be provided.
repo_id: str | None = None
# Root directory where the dataset will be stored (e.g. 'dataset/path').
root: str | Path | None = None
# Limit the frames per second. By default, uses the policy fps.
fps: int = 30
@dataclass
class ReplayConfig:
robot: RobotConfig
dataset: DatasetReplayConfig
# Use vocal synthesis to read events.
play_sounds: bool = True
# Display data in Rerun
display_data: bool = False
# Display data on a remote Rerun server
display_ip: str | None = None
# Port of the remote Rerun server
display_port: int | None = None
# Whether to display compressed images in Rerun
display_compressed_images: bool = False
@parser.wrap()
def replay(cfg: ReplayConfig):
init_logging()
logging.info(pformat(asdict(cfg)))
# Initialize Rerun if enabled (same as record)
display_compressed_images = False
if cfg.display_data:
init_rerun(session_name="replay", ip=cfg.display_ip, port=cfg.display_port)
display_compressed_images = (
True
if (cfg.display_data and cfg.display_ip is not None and cfg.display_port is not None)
else cfg.display_compressed_images
)
robot_action_processor = make_default_robot_action_processor()
robot = make_robot_from_config(cfg.robot)
dataset = LeRobotDataset(cfg.dataset.repo_id, root=cfg.dataset.root, episodes=[cfg.dataset.episode])
# Filter dataset to only include frames from the specified episode since episodes are chunked in dataset V3.0
episode_frames = dataset.hf_dataset.filter(lambda x: x["episode_index"] == cfg.dataset.episode)
actions = episode_frames.select_columns(ACTION)
robot.connect()
try:
log_say("Replaying episode", cfg.play_sounds, blocking=True)
for idx in range(len(episode_frames)):
start_episode_t = time.perf_counter()
action_array = actions[idx][ACTION]
action = {}
for i, name in enumerate(dataset.features[ACTION]["names"]):
action[name] = action_array[i]
robot_obs = robot.get_observation()
processed_action = robot_action_processor((action, robot_obs))
_ = robot.send_action(processed_action)
# Log to Rerun if enabled (same pattern as record)
if cfg.display_data:
log_rerun_data(
observation=robot_obs, action=action, compress_images=display_compressed_images
)
dt_s = time.perf_counter() - start_episode_t
precise_sleep(max(1 / dataset.fps - dt_s, 0.0))
finally:
robot.disconnect()
def main():
register_third_party_plugins()
replay()
if __name__ == "__main__":
main()
-
测试效果:
修改前:

修改后:

-
测试结果:
均值滤波后还是有一定效果的,对于一些尖峰状的噪声有一定抑制效果。我测试了好几论效果都不明显,这是我找了效果比较明显的一组了
2.对ACT算法进行改进(选看)
首先进入到src/lerobot/policies/act/modeling_act.py:
2.1修改reset方法
def reset(self):
"""This should be called whenever the environment is reset."""
if self.config.temporal_ensemble_coeff is not None:
self.temporal_ensembler.reset()
else:
self._action_queue = deque([], maxlen=2*self.config.n_action_steps)
self.last_action_list = []
self.last_action = None
2.2新增方法:
#aug:方法
def begin_mutation_filter(self, actions):
"""动作突变检测与线性插值"""
if self.last_action is None:
return
first_action = actions[0][0].cpu().tolist()
diff = [abs(a - b) for a, b in zip(first_action, self.last_action)]
max_increment = 0.06
add_point_num = int(max(diff) / max_increment)
if add_point_num > 0:
add_point_increment = [x / add_point_num for x in diff]
add_point = self.last_action
for i in range(0, add_point_num):
add_point = [a + b for a, b in zip(add_point, add_point_increment)]
tensor = torch.tensor([[add_point]], device=actions.device)
self._action_queue.extend(tensor)
#aug:方法
def actions_mean_filtering(self):
"""将轨迹均值滤波"""
mean_actions = [] # 均值滤波后的轨迹
mean_num = 8 # 均值滤波取的前后点数
action_step_list = []
action_num = len(self._action_queue)
# 将tensor转成list
for i in range(0, action_num):
action_step_list.append(self._action_queue[i].cpu().tolist())
for i in range(0, action_num):
# 最后 mean_num 个直接保留
if i > action_num - mean_num - 1:
mean_actions.append(action_step_list[i][0])
continue
action_total = action_step_list[i][0][:]
for k in range(0, len(action_total)):
action_total[k] = 0
# 后 mean_num 个
for j in range(i + 1, i + 1 + mean_num):
for k in range(0, len(action_total)):
action_total[k] += action_step_list[j][0][k]
# 前 mean_num 个
if i < mean_num + 1:
if len(self.last_action_list) == 0:
mean_actions.append(action_step_list[i][0])
continue
else:
# 前面个数不够 mean_num,从上一次规划的轨迹点来均值
for j in range(0, i):
for k in range(0, len(action_total)):
action_total[k] += action_step_list[j][0][k]
for j in range(1, mean_num + 1 - i):
for k in range(0, len(action_total)):
action_total[k] += self.last_action_list[-j][0][k]
else:
for j in range(1, 1 + mean_num):
for k in range(0, len(action_total)):
action_total[k] += action_step_list[i - j][0][k]
action_mean = []
for k in range(0, len(action_total)):
action_mean.append(action_total[k] / (mean_num * 2.0))
mean_actions.append(action_mean)
2.3修改select_action方法
@torch.no_grad()
def select_action(self, batch: dict[str, Tensor]) -> Tensor:
"""Select a single action given environment observations.
This method wraps `select_actions` in order to return one action at a time for execution in the
environment. It works by managing the actions in a queue and only calling `select_actions` when the
queue is empty.
"""
self.eval() # keeping the policy in eval mode as it could be set to train mode while queue is consumed
if self.config.temporal_ensemble_coeff is not None:
actions = self.predict_action_chunk(batch)
action = self.temporal_ensembler.update(actions)
return action
# aug:保存上一次执行的动作用于突变检测
if len(self._action_queue) == 1:
self.last_action = self._action_queue[0].cpu().tolist()[0]
# Action queue logic for n_action_steps > 1. When the action_queue is depleted, populate it by
# querying the policy.
if len(self._action_queue) == 0:
actions = self.predict_action_chunk(batch)[:, : self.config.n_action_steps]
# aug:动作突变检测与线性插值
self.begin_mutation_filter(actions)
# `self.model.forward` returns a (batch_size, n_action_steps, action_dim) tensor, but the queue
# effectively has shape (n_action_steps, batch_size, *), hence the transpose.
self._action_queue.extend(actions.transpose(0, 1))
# aug: 均值滤波平滑
self.actions_mean_filtering()
return self._action_queue.popleft()
2.4修改forward方法:
def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, dict]:
"""Run the batch through the model and compute the loss for training or validation."""
if self.config.image_features:
batch = dict(batch) # shallow copy so that adding a key doesn't modify the original
batch[OBS_IMAGES] = [batch[key] for key in self.config.image_features]
actions_hat, (mu_hat, log_sigma_x2_hat) = self.model(batch)
l1_loss = (
F.l1_loss(batch[ACTION], actions_hat, reduction="none") * ~batch["action_is_pad"].unsqueeze(-1)
).mean()
loss_dict = {"l1_loss": l1_loss.item()}
if self.config.use_vae:
# Calculate Dₖₗ(latent_pdf || standard_normal). Note: After computing the KL-divergence for
# each dimension independently, we sum over the latent dimension to get the total
# KL-divergence per batch element, then take the mean over the batch.
# (See App. B of https://huggingface.co/papers/1312.6114 for more details).
mean_kld = (
(-0.5 * (1 + log_sigma_x2_hat - mu_hat.pow(2) - (log_sigma_x2_hat).exp())).sum(-1).mean()
)
loss_dict["kld_loss"] = mean_kld.item()
loss = l1_loss + mean_kld * self.config.kl_weight
else:
loss = l1_loss
# aug: 均值滤波平滑性损失
kernel_size = 11
padding = kernel_size // 2
x = actions_hat.transpose(1, 2)
weight = torch.ones(actions_hat.size(-1), 1, kernel_size, device=actions_hat.device) / kernel_size
filtered_x = F.conv1d(x, weight, padding=padding, groups=actions_hat.size(-1))
filtered_tensor = filtered_x.transpose(1, 2)
mean_loss = torch.abs(actions_hat - filtered_tensor).mean()
loss += mean_loss
loss_dict["mean_loss"] = mean_loss.item()
return loss, loss_dict
以上修改是为了
- 训练的时候添加了平滑损失函数loss值
- 推理的时候对动作序列的跳变点进行了线性插值,然后对整个序列进行了平滑处理
注意:所有修改都带有"aug:"注释标识
3.训练
accelerate launch --num_processes=1 $(which lerobot-train) \
--dataset.repo_id=yourdatasetdir \
--policy.type=act \
--policy.device=cuda \
--policy.chunk_size=100 \
--policy.n_action_steps=50 \
--policy.use_amp=true \
--policy.repo_id=starai/my_policy \
--batch_size=4 \
--optimizer.lr=2e-05 \
--num_workers=4 \
--output_dir=outputs/train/act_viola_test11 \
--job_name=act_viola_test \
--wandb.enable=False \
--steps=20000 \
--save_checkpoint=True \
--save_freq=5000
3.1参数讲解
3.1.1核心参数

3.1.2ACT特定参数

3.1.3训练参数

4.评估
lerobot-record \
--robot.type=lerobot_robot_viola \
--robot.port=/dev/ttyUSB1 \
--robot.cameras="{ up: {type: opencv, index_or_path: /dev/video6, width: 640, height: 480, fps: 30, fourcc: "MJPG"},front: {type: opencv, index_or_path: /dev/video8, width: 640, height: 480, fps: 30, fourcc: "MJPG"}}" \
--robot.id=my_awesome_staraiviola_arm \
--display_data=false \
--dataset.repo_id=starai/eval_record-test \
--dataset.single_task="Pick up the yellow cube to the white box" \
--policy.path=outputs/train/act_viola_test1/checkpoints/pretrained_model
常见问题 (FAQ)
Q: ACT 与其他模仿学习方法相比有什么优势?
A: ACT 的主要优势包括:
减少复合误差:通过预测动作块减少误差累积
提高成功率:在精细操作任务上表现优异
端到端训练:无需手工设计特征
多模态融合:有效融合视觉和状态信息
Q: 如何选择合适的 chunk_size?
A: chunk_size 的选择取决于任务特性:
快速任务:chunk_size = 10-30
中等任务:chunk_size = 50-100
慢速任务:chunk_size = 100-200
一般建议从 50 开始尝试
Q: 训练需要多长时间?
A: 训练时间取决于多个因素:
数据集大小:100 episodes 约需 4-8 小时(RTX 3070)
模型复杂度:更大的模型需要更长时间
硬件配置:更好的 GPU 可显著减少训练时间
收敛要求:通常需要 50000-100000 步
Q: 如何处理多摄像头数据?
A: 多摄像头处理建议:
摄像头选择:选择信息互补的视角
特征融合:在特征层面进行融合
注意力机制:让模型学习关注重要视角
计算资源:注意多摄像头会增加计算负担
Q: 如何提升模型的泛化能力?
A: 提升泛化能力的方法:
数据多样性:收集不同条件下的数据
数据增强:使用图像和动作增强技术
正则化:适当的权重衰减和 dropout
域随机化:在仿真中使用域随机化技术
多任务学习:在多个相关任务上联合训练

使用StarAI机械臂复现ACT算法。万字解析,深入浅出!!
浙公网安备 33010602011771号