2025 年 2月 10 日随笔档案 - zion03

2025年2月10日

摘要：在基于 Qwen2.5-coder 模型进行继续预训练（continual pre-trian）后，保存的模型权重，多了整整一倍（原始 Qwen2.5-coder 的 3b 模型是 5 个 GB，训练后保存的 safetensor 体积是 10 多个 GB）。刚训练完就发现这个问题了，由于用 vll 阅读全文

posted @ 2025-02-10 18:38 zion03 阅读(318) 评论(0) 推荐(0)

CD Yang

公告