大模型llm - 随笔分类 - linzm14

Nano-vLLM-Ascend(持续更新中)

摘要：Nano-vLLM-Ascend 项目链接：https://github.com/linzm1007/nano-vllm-ascend nano-vllm是github开源的一个gpu推理项目，基于开源版本弄的一个ascend npu版本推理小demo，旨在帮助初学者了解推理的整体流程，区别于vll 阅读全文

posted @ 2026-02-13 11:38 linzm14 阅读(245) 评论(0) 推荐(0)

sglang v0.5.5.post3 框架图

摘要：参考 https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/sglang/code-walk-through/readme-CN.md https://github.com/sgl-project/sglang/tre 阅读全文

posted @ 2025-12-10 14:38 linzm14 阅读(86) 评论(0) 推荐(0)

omniinfer vllm v0.9.0整体框架图和pangu7b模型图

摘要：参考 https://shen-shanshan.github.io/articles/vllm-v1-整体流程从请求到算子执行/ https://gitee.com/omniai/omniinfer/tree/release_v0.6.0/ https://github.com/vllm-proj 阅读全文

posted @ 2025-12-08 22:16 linzm14 阅读(886) 评论(0) 推荐(0)

Nano-vLLM-Ascend

摘要：参考 https://github.com/linzm1007/nano-vllm-ascend Nano-vLLM-Ascend nano-vllm是开源的一个gpu推理项目，基于开源版本弄的一个ascend npu版本推理小demo，旨在帮助初学者了解推理的整体流程，区别于vllm，nano-v 阅读全文

posted @ 2025-12-07 21:09 linzm14 阅读(1126) 评论(0) 推荐(0)

text-generation-webui 推理模型Qwen1.5-7B-Chat相关报错问题解决

摘要：推理代码 text-generation-webui 推理模型 Qwen1.5-7B-Chat sys infogpu： Tesla V100-PCIE-32GBpython： 3.10model：Qwen1.5-7B-Chatdocker docker run -it --rm --gpus='" 阅读全文

posted @ 2024-05-09 11:23 linzm14 阅读(2349) 评论(0) 推荐(0)

LLaMA-Factory 训练 Llama3-Chinese-8B-Instruct 相关报错问题解决

摘要：模型路径 up主为 llama中文社区模型地址 https://www.modelscope.cn/models/FlagAlpha/Llama3-Chinese-8B-Instruct/summary sys info gpu： Tesla V100-PCIE-32GB python： 3.10 阅读全文

posted @ 2024-05-09 11:19 linzm14 阅读(2161) 评论(0) 推荐(0)

vllm 通过不同的chat_template推理部署常见qwen、chatglm、llama3等开源大模型

摘要：vllm 版本 4.0.0 镜像 vllm github 官方镜像 gpu v100 32g a800 80g openai api 方式出现的问题通过 chat-template 聊天模板解决 1 推理部署qwen系列模型测试我是谁问题：回答内容含有分词符，回答有杂论冗余内容模型文件没有阅读全文

posted @ 2024-04-26 15:52 linzm14 阅读(6296) 评论(3) 推荐(0)

linzm14

随笔分类 - 大模型llm

公告