2025 年 12月 20 日随笔档案 - Brain404

2025年12月20日

SFTDataset：Verl 单轮Dataset vs Verl 多轮Dataset vs Parallel-R1 Dataset

摘要：使用verl进行sft的命令大致为：单机多卡： #!/bin/bash set -x nproc_per_node=4 save_path="./checkpoints" torchrun --standalone --nnodes=1 --nproc_per_node=$nproc_per_no 阅读全文

posted @ 2025-12-20 22:01 Brain404 阅读(31) 评论(0) 推荐(0)

AgentLoop（Verl）vs ParallelThinkingAgentLoopV3（Parallel-R1） vs ToRL

摘要：最近看到了一篇很有意思的论文Parallel-R1，是用RL训练一个并行推理的模型，大概的格式为： <模型推理过程> 突然生成一个<parallel>，进入多路径推理 <parallel> <path> ... </path> 每一条推理路径之间互相不可见（使用attention mask mask 阅读全文

posted @ 2025-12-20 21:49 Brain404 阅读(35) 评论(0) 推荐(0)

RLLM工具：Python 沙箱（LCB沙箱）

摘要：主函数定义在/rllm/tools/code_tools/python_interpreter.py中 def _init_backend(self): """初始化沙箱""" # 默认使用LCBPythonInterpreter if self.backend_type == "local": s 阅读全文

posted @ 2025-12-20 17:45 Brain404 阅读(9) 评论(0) 推荐(0)

rh-li

公告