使用docker部署运行Qwen2.5-VL模型

使用docker在4090上部署运行Qwen2.5-VL模型

拉取docker镜像：docker pull pytorch/pytorch:2.7.0-cuda12.6-cudnn9-runtime

Qwen2.5-VL-7B-Instruct

7B模型只需要一张4090（24G显存）即可
运行docker： docker run -it -v /home/baip:/data --name baip_qwen25_vl --gpus "device=7" pytorch/pytorch:2.7.0-cuda12.6-cudnn9-runtime /bin/bash

模型下载

hugging face 地址：https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct

国内代理地址设置：export HF_ENDPOINT=https://hf-mirror.com
安装多线程下载：pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
下载命令：huggingface-cli download --resume-download Qwen/Qwen2.5-VL-7B-Instruct --local-dir /data/qwen25_vl/models/Qwen2.5-VL-7B-Instruct

国内modelscope地址：https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct

下载：pip install modelscope
modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct --local_dir /data/qwen25_vl/models/Qwen2.5-VL-7B-Instruct

使用vllm运行模型

pip安装（自动安装依赖包）：apt install build-essential
pip install vllm -i https://pypi.tuna.tsinghua.edu.cn/simple
运行：vllm serve /data/qwen25_vl/models/Qwen2.5-VL-7B-Instruct
报错后根据提示设置max-model-len：vllm serve /data/qwen25_vl/models/Qwen2.5-VL-7B-Instruct --max-model-len 52224

通过openai接口协议实现restful调用，也可以用python脚本调用api方式调用，但是初始化时间长，接口调用慢，建议通过restful形式调用。

搭建临时http服务：python3 -m http.server 8088
使用curl测试接口：
查看model名称：curl http://172.17.0.7:8000/v1/models
测试curl聊天命令：

#!/bin/bash
if [ -z "$1" ]; then
  echo "usage: $0 <image>"
  exit 1
fi
#echo $1
curl -s -X POST "http://172.17.0.7:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "/mnt/models/Qwen2.5-VL-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "详细描述这张图片上的内容。"
					},
					{
						"type": "image_url",
						"image_url": {
						"url": "http://172.17.0.1:8088/'"$1"'"
						}
					}
				]
			}
		]
	}' | jq '.choices[0].message.content'

Qwen2.5-VL-32B-Instruct

32B模型只需要四张4090
运行docker： docker run -it -v /home/baip:/data --name baip_qwen25_vl '"device=3,4,5,6"' --ipc=host pytorch/pytorch:2.7.0-cuda12.6-cudnn9-runtime /bin/bash
注意：一定要加--ipc=host，多张显卡使用共享内存/dev/shm通信（docker默认只有64M，--ipc=host表示容器将直接使用宿主机的共享内存）

模型下载

hugging face 地址：https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct

国内modelscope地址：https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct

使用vllm运行模型

vllm serve /mnt/models/Qwen2.5-VL-32B-Instruct --tensor-parallel-size 4 --max_model_len 51568

whisper-large-v3-turbo

模型下载

国内modelscope地址：https://www.modelscope.cn/models/openai-mirror/whisper-large-v3-turbo

使用vllm运行模型

pip安装（自动安装依赖包）：pip install vllm[audio] -i https://pypi.tuna.tsinghua.edu.cn/simple
运行：vllm serve /mnt/models/whisper-large-v3-turbo --served-model-name whisperV3t --task transcription --trust-remote-code --enforce-eager
注意：单个音频文件不能超过30s

测试curl聊天命令：

#!/bin/bash
if [ -z "$1" ]; then
  echo "usage: $0 <image>"
  exit 1
fi
#echo $1
curl -s -X POST "http://172.17.0.5:8000/v1/audio/transcriptions" \
	-H "Content-Type: multipart/form-data" \
	-F file="@$1" \
	-F model="whisperV3t" \
	-F response_format="text"
#需要翻译就增加-F language="zh"

posted @ 2025-06-17 17:38 evanbp 阅读(1392) 评论(0) 收藏举报

刷新页面返回顶部

使用docker部署运行Qwen2.5-VL模型

使用docker在4090上部署运行Qwen2.5-VL模型

Qwen2.5-VL-7B-Instruct

模型下载

使用vllm运行模型

Qwen2.5-VL-32B-Instruct

模型下载

使用vllm运行模型

whisper-large-v3-turbo

模型下载

使用vllm运行模型

公告