随笔档案「2025年9月11日」：利用 device_map、torch.dtype、bitsandbyt... - 阳光一生

2025年9月11日

利用 device_map、torch.dtype、bitsandbytes 压缩模型参数控制使用设备

摘要： device_map# 以下内容参考 Huggingface Accelerate文档：超大模型推理方法在 HuggingFace 中有个重要的关键字是 device_map，它可以简单控制模型层部署在哪些硬件上。设置参数 device_map="auto"，Accelerate会自动检测在哪个阅读全文

posted @ 2025-09-11 17:23 阳光一生阅读(23) 评论(0) 推荐(0)

How to Install and Use vLLM

摘要： What is vLLM? vLLM is a high-performance library for LLM (Large Language Model) inference and serving. It is optimized for speed, efficiency, and ease 阅读全文

posted @ 2025-09-11 17:21 阳光一生阅读(31) 评论(0) 推荐(0)

How to Benchmark vLLM Offline Inference

摘要： Introduction to vLLM vLLM is an efficient, high-performance inference and serving engine designed for large language models (LLMs). It is optimized fo 阅读全文

posted @ 2025-09-11 17:21 阳光一生阅读(87) 评论(0) 推荐(0)

阳光一生

公告