LLM deploy vLLM SGLang FasterTransformer lmdeploy

LLM deploy for inference (tensor parallelism, data parallelism, pipeline parallelism, expert parallelism for MOE)

vLLM worker or SGLang worker or https://github.com/NVIDIA/FasterTransformer or https://github.com/InternLM/lmdeploy)

AI 正在生成文章中。。。

posted @ 2025-06-11 21:42  iTech  阅读(55)  评论(0)    收藏  举报