大模型部署测试

查看模型路径

curl http://127.0.0.1:8000/v1/models
{"object":"list","data":[{"id":"/data/models/Qwen1.5-14B-Chat-AWQ","object":"model","created":1768828444,"owned_by":"vllm","root":"/data/models/Qwen1.5-14B-Chat-AWQ","parent":null,"max_model_len":4096,"permission":[{"id":"modelperm-954558153c0727e8","object":"model_permission","created":1768828444,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}(py312) root@4eaebd1dd72f:/data/logs#
(py312) root@4eaebd1dd72f:/da
curl -X POST http://127.0.0.1:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "/data/models/Qwen1.5-14B-Chat-AWQ",
  "prompt": "Say hello",
  "max_tokens": 10
}'

压测命令

安装压力测试工具

pip install locust

或者用官方示例 Python 脚本

python -m vllm.entrypoints.benchmark
--model Qwen/Qwen-14B-2.5
--dtype float16
--batch-size 1
--num-batches 10
--max-seq-len 512
--use-8bit

实时监控显存/GPU使用率

watch -n 1 nvidia-smi

或者查看特定进程

nvidia-smi -i 0 -q -d MEMORY,UTILIZATION

top 或 htop 实时查看

htop

或者更精确

watch -n 1 "ps -eo pid,cmd,%cpu,%mem --sort=-%cpu | head -20"

posted @ 2026-01-19 22:17  向着朝阳  阅读(0)  评论(0)    收藏  举报