evalscope使用

一、安装

基础安装命令：

#一般安装
pip install evalscope -i https://pypi.tuna.tsinghua.edu.cn/simple/
# 安装过程中碰到因为操作系统某些组件（gcc）版本过低，导致无法安装numpy等问题，使用--prefer-binary参数
pip install evalscope --prefer-binary -i https://pypi.tuna.tsinghua.edu.cn/simple/
# 安装所有组件用all，要进行性能测试，可单独装perf
pip install evalscope[all] --prefer-binary -i https://pypi.tuna.tsinghua.edu.cn/simple/

数据集下载：

# 从魔塔上找到需要的数据集，复制下载命令，指定保存路径（会自动创建mmlu_pro）
modelscope download --dataset modelscope/MMLU-Pro --local_dir ./data/mmlu_pro

生成模版文件：template.json

# 下面的模板文件是用ds生成，指定推理框架为vllm，模型为千问3，接口类型为openai
{
  "model": "{model}",
  "messages": [
    {
      "role": "user",
      "content": "{prompt}"
    }
  ],
  "stream": true,
  "max_tokens": "{max_tokens}",
  "temperature": "{temperature}"
}

二、性能测试

初级测试命令，执行过程中可能会需要下载数据集

# 基于千问3-14B模型
evalscope perf \
  --url 'http://ip:port/v1/chat/completions' \
  --parallel 2 \
  --model 'qwen3-14b' \
  --log-every-n-query 10 \
  --read-timeout 120 \
  --connect-timeout 120 \
  --number 20 \
  --max-prompt-length 128000 \
  --min-prompt-length 128 \
  --api openai \
  --query-template @template.json \  #template.json需要在当前目录
  --dataset openqa

三、结果使用

　　如无特别指定，数据会生成到当前目录的output子目录，可以让ds帮忙解读，可以得到一些不错的评价和建议（老手估计用不着）。

# benchmark_summary.json
{
    "Time taken for tests (s)": 487.9671,
    "Number of concurrency": 2,
    "Total requests": 20,
    "Succeed requests": 20,
    "Failed requests": 0,
    "Output token throughput (tok/s)": 68.314,
    "Total token throughput (tok/s)": 77.0277,
    "Request throughput (req/s)": 0.041,
    "Average latency (s)": 48.2374,
    "Average time to first token (s)": 0.0747,
    "Average time per output token (s)": 0.0289,
    "Average inter-token latency (s)": 0.0289,
    "Average input tokens per request": 212.6,
    "Average output tokens per request": 1666.75
}

posted @ 2025-09-05 17:14 badwood 阅读(34) 评论(0) 收藏举报

刷新页面返回顶部

evalscope使用

公告