折腾笔记[40]-使用上古A100 GPU运行qwen3-30b-a3b模型
摘要
使用上古的A100-SXM4-40GB GPU通过ollama运行qwen3-30b-a3b模型.“30B-Q8 量化模型在 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”.
关键信息
- 镜像: ollama/ollama:0.6.6-rc2
- GPU: A100-SXM4-40GB
- GPU驱动: NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2
- docker: Docker version 24.0.4, build 3713ee1
- 模型: modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf
- 主机系统: Linux Tesla 5.10.0-60.18.0.50.oe2203.x86_64 #1 SMP Wed Mar 30 03:12:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
实现
1. 在docker(已配置gpu驱动)中配置ollama
docker pull ollama/ollama:0.6.6-rc2
docker run --restart=always --name ollama -v /lvm-group1/qsbye/ollama:/root/.ollama -p 11435:11434 -e "OLLAMA_HOST=0.0.0.0" -d ollama/ollama:0.6.6-rc2
2. ollama修改默认目录(防止系统盘太满)
## 一键更新系统的ollama(本质就是重新安装最新版)
curl -fsSL https://ollama.com/install.sh | sh
## 更新完验证
ollama --version
## 数据盘新建ollama数据目录
sudo mkdir -p /lvm-group1/qsbye/ollama
sudo chmod 777 -R /lvm-group1/qsbye/ollama
sudo cp /usr/share/ollama/.ollama/models /lvm-group1/qsbye/ollama
## ollama修改默认目录
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo vim /etc/systemd/system/ollama.service.d/override.conf
内容:
[Service]
Environment="OLLAMA_MODELS=/lvm-group1/qsbye/ollama/models"
User=ollama
Group=ollama
然后:
sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama
3. 下载模型
# 使用国内源(魔搭社区)
ollama pull modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf
3. 运行模型
docker exec -d -e OLLAMA_GPU_LAYERS=999 -e OLLAMA_KEEP_ALIVE=-1 -e CUDA_VISIBLE_DEVICES=0 ollama bash -c "ollama run modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"
# 另开终端看显存
watch -n1 nvidia-smi
输出:
Every 1.0s: nvidia-smi Tesla: Sun Jan 18 08:59:50 2026
Sun Jan 18 08:59:51 2026
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB Off | 00000000:82:00.0 Off | 0 |
| N/A 39C P0 43W / 400W | 36742MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3751529 C /usr/bin/ollama 36737MiB |
+-----------------------------------------------------------------------------+
4. Thinking问答测试
python -c "import requests,json,sys;[sys.stdout.write(json.loads(l)['response']) for l in requests.post('http://10.8.8.130:11435/api/generate',json={'model':'modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf','prompt':'一行python代码打印hello ollama;','stream':True},stream=True).iter_lines(decode_unicode=True) if l]"
输出:
<think>
好的,用户让我用一行Python代码打印“hello ollama”。这看起来挺简单的,但我要仔细想想有没有什么需要注意的地方。
首先,Python中打印字符串的基本语法是print("内容")。所以最直接的方式就是print("hello ollama")。不过用户可能有其他需求吗?比如是否需要考虑大小写?不过例子中的“hello ollama”是小写的,所以应该没问题。
有没有可能用户想用其他方法?比如使用变量或者转义字符?不过题目明确说是一行代码,所以应该直接使用print函数。另外,是否需要考虑Python版本?比如Python 2和3的区别,但现在的环境大多数是Python 3,所以没问题。
还有可能用户想用更复杂的表达式,比如拼接字符串?比如print("hello" + " ollama"),但这样反而更复杂,不如直接写字符串简单。不过用户可能只是想确认基本用法,所以直接写最简单的形式最好。
另外,检查是否有拼写错误,比如“ollama”是否正确?用户可能打错了,但按照问题描述,应该按照给出的字符串来处理。所以正确的代码应该是print("hello ollama")。
有没有其他可能?比如使用格式化字符串,比如print(f"hello ollama"),但同样,这和直接写字符串没有区别,而且更复杂。所以还是直接使用print("hello ollama")最简洁。
总结一下,用户的需求明确,只需要一行代码,所以直接使用print函数输出字符串即可。没有其他隐藏的要求,所以答案应该是这个。
</think>
···python
print("hello ollama")
···
5. 打印token速率
python -c "
import requests, json, sys, time, datetime as dt
url = 'http://10.8.8.130:11435/api/generate'
payload = {
'model': 'modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf',
'prompt': '一行python代码打印hello ollama;',
'stream': True
}
start = dt.datetime.now()
try:
r = requests.post(url, json=payload, stream=True, timeout=30)
for line in r.iter_lines(decode_unicode=True):
if not line:
continue
chunk = json.loads(line)
sys.stdout.write(chunk.get('response', ''))
sys.stdout.flush()
# 实时 token/s
cnt = chunk.get('eval_count', 0)
dur_ns = chunk.get('eval_duration', 0)
if dur_ns:
rate = cnt / (dur_ns / 1e9)
sys.stdout.write(f'\r[%.1f token/s] ' % rate)
sys.stdout.flush()
except Exception as e:
print('\nError:', e, file=sys.stderr)
"
输出:
[16.0 token/s]
6. 保证ollama显存不被回收
- 设置环境变量
OLLAMA_KEEP_ALIVE=-1 - 每隔3分钟就调用一次模型(心跳)
## 如果没有装go编译器
pip install go-bin -i https://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
## go代码
vim ollama_heartbeat.go
go build ollama_heartbeat.go
chmod +x ollama_heartbeat
nohup ./ollama_heartbeat &
## 查看输出
tail nohup.out
代码:
// ollama_heartbeat.go
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"time"
)
const defaultHost = "http://127.0.0.1:11435"
const defaultModel = "modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"
func once() {
host := os.Getenv("OLLAMA_HOST")
if host == "" {
host = defaultHost
}
model := os.Getenv("OLLAMA_MODEL")
if model == "" {
model = defaultModel
}
body, _ := json.Marshal(map[string]interface{}{
"model": model,
"prompt": "你好",
"stream": true,
})
req, err := http.NewRequest("POST", host+"/api/generate", bytes.NewReader(body))
if err != nil {
fmt.Printf("[%s] heartbeat fail: %v\n", time.Now().Format("01-02 15:04:05"), err)
return
}
req.Header.Set("Content-Type", "application/json")
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Do(req)
if err != nil {
fmt.Printf("[%s] heartbeat fail: %v\n", time.Now().Format("01-02 15:04:05"), err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Printf("[%s] heartbeat fail: status=%d\n", time.Now().Format("01-02 15:04:05"), resp.StatusCode)
return
}
// 流式读取,累加字节数
reader := bufio.NewReader(resp.Body)
total := 0
for {
line, err := reader.ReadBytes('\n')
if err == io.EOF {
break
}
if err != nil {
fmt.Printf("[%s] heartbeat fail while reading: %v\n", time.Now().Format("01-02 15:04:05"), err)
return
}
total += len(line)
}
fmt.Printf("[%s] heartbeat ok, %d bytes\n", time.Now().Format("01-02 15:04:05"), total)
}
func main() {
for {
once()
time.Sleep(3 * time.Minute)
}
}
输出:
[01-18 10:17:50] heartbeat ok, 17346 bytes
[01-18 10:20:59] heartbeat ok, 18120 bytes
7. 观察问答时的功率波动及单次问答token总量及能量消耗
代码:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
实时采集 Ollama 推理期间的 GPU 功率,统计 token 总量与能耗,
并保存为 CSV 后绘图输出 JPG。
"""
import subprocess
import csv
import time
import datetime as dt
import requests
import sys
from PIL import Image, ImageDraw, ImageFont
# -------------------- 参数 --------------------
URL = "http://10.8.8.130:11435/api/generate"
MODEL = "modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"
PROMPT = "Please introduce yourself in one sentence."
# 时间戳
ts = dt.datetime.now().strftime("%Y%m%d_%H%M%S")
csv_file = f"ollama_statistics_{ts}.csv"
jpg_file = f"ollama_statistics_{ts}.jpg"
# -------------------- 功率采样 --------------------
def get_gpu_power():
"""返回当前 GPU 功耗(W)"""
out = subprocess.check_output(
["nvidia-smi", "--query-gpu=power.draw", "--format=csv,noheader,nounits"],
text=True,
)
return float(out.strip())
# -------------------- 推理 + 采样 --------------------
power_samples = [] # [(timestamp, power_W), ...]
total_tokens = 0
payload = {
"model": MODEL,
"prompt": PROMPT,
"stream": True,
}
print("Starting inference and power sampling...")
start_time = dt.datetime.now()
# 推理前采样 50 次
for _ in range(50):
power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))
time.sleep(0.1)
# 流式推理
try:
resp = requests.post(URL, json=payload, stream=True, timeout=60)
for line in resp.iter_lines(decode_unicode=True):
if not line:
continue
chunk = line.strip()
# 简单计 token
total_tokens += 1
# 采样
power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))
time.sleep(0.01)
except Exception as e:
print("Inference error:", e, file=sys.stderr)
# 推理后再采样 50 次
for _ in range(50):
power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))
time.sleep(0.1)
elapsed = (dt.datetime.now() - start_time).total_seconds()
avg_power = sum(p[1] for p in power_samples) / len(power_samples)
energy_wh = avg_power * elapsed / 3600 # Wh
# -------------------- 保存 CSV --------------------
with open(csv_file, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["timestamp", "power_W"])
writer.writerows(power_samples)
writer.writerow([])
writer.writerow(["total_tokens", total_tokens])
writer.writerow(["elapsed_s", elapsed])
writer.writerow(["avg_power_W", avg_power])
writer.writerow(["energy_Wh", energy_wh])
print(f"Saved {csv_file}")
# -------------------- 绘图 --------------------
W, H = 800, 400
img = Image.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(img)
powers = [p[1] for p in power_samples]
times = [p[0] for p in power_samples]
# 坐标轴范围
margin = 60
x0, y0 = margin, margin
x1, y1 = W - margin, H - margin
min_p, max_p = min(powers), max(powers)
pad = (max_p - min_p) * 0.1
min_p, max_p = min_p - pad, max_p + pad
# 折线坐标
coords = [
(
x0 + (i / (len(powers) - 1)) * (x1 - x0),
y1 - ((p - min_p) / (max_p - min_p)) * (y1 - y0),
)
for i, p in enumerate(powers)
]
# 边框
draw.rectangle([x0, y0, x1, y1], outline="black")
# 折线
for i in range(len(coords) - 1):
draw.line([coords[i], coords[i + 1]], fill="blue", width=2)
# 标题
title = f"Ollama Power Sampling tokens={total_tokens} energy={energy_wh:.2f} Wh"
draw.text((W // 2, 10), title, fill="black", anchor="mt")
# 轴标签
draw.text((x0, y0 - 10), f"{max_p:.1f} W", fill="black", anchor="lt")
draw.text((x0, y1 + 5), f"{min_p:.1f} W", fill="black", anchor="lt")
draw.text((x0 - 5, y1), times[0][-8:], fill="black", anchor="rt")
draw.text((x1, y1), times[-1][-8:], fill="black", anchor="lt")
img.save(jpg_file)
print(f"Plotted {jpg_file}")
查看数据:
python -m http.server 8888
sudo firewall-cmd --permanent --add-port=8888/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --list-all
访问: [http://10.8.8.130:8888].
数据:
| 功率 |
|---|
![]() |
timestamp,power_W
2026-01-18T10:44:49.482,48.52
2026-01-18T10:44:49.594,50.4
2026-01-18T10:44:49.713,48.52
2026-01-18T10:44:49.823,48.52
2026-01-18T10:44:49.934,48.52
2026-01-18T10:44:50.044,48.52
2026-01-18T10:44:50.157,48.52
2026-01-18T10:44:50.267,48.52
2026-01-18T10:44:50.378,48.52
2026-01-18T10:44:50.489,48.52
2026-01-18T10:44:50.600,48.52
2026-01-18T10:44:50.710,50.37
2026-01-18T10:44:50.834,48.52
2026-01-18T10:44:50.944,48.52
2026-01-18T10:44:51.055,48.52
2026-01-18T10:44:51.167,48.52
2026-01-18T10:44:51.277,48.52
2026-01-18T10:44:51.388,48.52
2026-01-18T10:44:51.499,48.52
2026-01-18T10:44:51.610,48.52
2026-01-18T10:44:51.720,48.52
2026-01-18T10:44:51.832,50.37
2026-01-18T10:44:51.957,48.52
2026-01-18T10:44:52.068,48.52
2026-01-18T10:44:52.178,48.52
2026-01-18T10:44:52.289,48.52
2026-01-18T10:44:52.399,48.52
2026-01-18T10:44:52.509,48.52
2026-01-18T10:44:52.620,48.52
2026-01-18T10:44:52.730,48.52
2026-01-18T10:44:52.842,48.52
2026-01-18T10:44:52.952,50.37
2026-01-18T10:44:53.078,49.03
2026-01-18T10:44:53.189,48.52
2026-01-18T10:44:53.299,48.52
2026-01-18T10:44:53.409,48.52
2026-01-18T10:44:53.519,48.52
2026-01-18T10:44:53.630,48.52
2026-01-18T10:44:53.740,48.52
2026-01-18T10:44:53.849,48.52
2026-01-18T10:44:53.959,48.52
2026-01-18T10:44:54.070,50.4
2026-01-18T10:44:54.198,49.03
2026-01-18T10:44:54.308,48.52
2026-01-18T10:44:54.420,48.52
2026-01-18T10:44:54.530,48.52
2026-01-18T10:44:54.640,48.52
2026-01-18T10:44:54.750,48.52
2026-01-18T10:44:54.859,48.52
2026-01-18T10:44:54.969,48.52
2026-01-18T10:44:55.185,50.37
2026-01-18T10:44:55.247,65.36
2026-01-18T10:44:55.308,65.36
2026-01-18T10:44:55.368,57.65
2026-01-18T10:44:55.427,51.72
2026-01-18T10:44:55.488,51.72
2026-01-18T10:44:55.548,62.63
2026-01-18T10:44:55.608,62.63
2026-01-18T10:44:55.668,60.33
2026-01-18T10:44:55.728,51.3
2026-01-18T10:44:55.788,51.3
2026-01-18T10:44:55.848,62.21
2026-01-18T10:44:55.908,62.21
2026-01-18T10:44:55.975,61.27
2026-01-18T10:44:56.035,52.63
2026-01-18T10:44:56.095,52.63
2026-01-18T10:44:56.164,66.71
2026-01-18T10:44:56.225,50.79
2026-01-18T10:44:56.315,50.37
2026-01-18T10:44:56.406,50.37
2026-01-18T10:44:56.478,57.65
2026-01-18T10:44:56.544,52.63
2026-01-18T10:44:56.606,52.63
2026-01-18T10:44:56.666,66.29
2026-01-18T10:44:56.726,66.29
2026-01-18T10:44:56.786,50.79
2026-01-18T10:44:56.848,54.42
2026-01-18T10:44:56.908,54.42
2026-01-18T10:44:56.968,68.59
2026-01-18T10:44:57.028,68.59
2026-01-18T10:44:57.088,53.05
2026-01-18T10:44:57.148,54.42
2026-01-18T10:44:57.206,54.42
2026-01-18T10:44:57.277,69.48
2026-01-18T10:44:57.338,51.72
2026-01-18T10:44:57.399,59.91
2026-01-18T10:44:57.490,59.91
2026-01-18T10:44:57.521,59.91
2026-01-18T10:44:57.589,58.07
2026-01-18T10:44:57.651,53.48
2026-01-18T10:44:57.711,53.48
2026-01-18T10:44:57.777,69.01
2026-01-18T10:44:57.843,51.3
2026-01-18T10:44:57.912,51.3
2026-01-18T10:44:57.973,68.59
2026-01-18T10:44:58.033,68.59
2026-01-18T10:44:58.108,50.79
2026-01-18T10:44:58.184,68.08
2026-01-18T10:44:58.244,51.3
2026-01-18T10:44:58.306,51.3
2026-01-18T10:44:58.375,59.44
2026-01-18T10:44:58.439,59.44
2026-01-18T10:44:58.500,55.35
2026-01-18T10:44:58.620,55.35
2026-01-18T10:44:58.642,55.35
2026-01-18T10:44:58.681,67.23
2026-01-18T10:44:58.743,67.23
2026-01-18T10:44:58.804,53.98
2026-01-18T10:44:58.867,53.05
2026-01-18T10:44:58.948,53.05
2026-01-18T10:44:59.027,49.0
2026-01-18T10:44:59.102,65.87
2026-01-18T10:44:59.170,53.09
2026-01-18T10:44:59.242,53.09
2026-01-18T10:44:59.333,54.93
2026-01-18T10:44:59.404,64.42
2026-01-18T10:44:59.468,52.16
2026-01-18T10:44:59.537,52.16
2026-01-18T10:44:59.606,63.57
2026-01-18T10:44:59.669,53.05
2026-01-18T10:44:59.754,53.05
2026-01-18T10:44:59.791,63.57
2026-01-18T10:44:59.853,63.57
2026-01-18T10:44:59.922,49.42
2026-01-18T10:44:59.983,56.28
2026-01-18T10:45:00.046,56.28
2026-01-18T10:45:00.107,63.57
2026-01-18T10:45:00.170,50.79
2026-01-18T10:45:00.231,50.79
2026-01-18T10:45:00.291,60.37
2026-01-18T10:45:00.351,60.37
2026-01-18T10:45:00.412,62.63
2026-01-18T10:45:00.471,51.3
2026-01-18T10:45:00.533,51.3
2026-01-18T10:45:00.593,59.95
2026-01-18T10:45:00.653,59.95
2026-01-18T10:45:00.714,62.63
2026-01-18T10:45:00.774,52.63
2026-01-18T10:45:00.877,62.67
2026-01-18T10:45:00.901,62.67
2026-01-18T10:45:00.960,62.67
2026-01-18T10:45:01.020,60.77
2026-01-18T10:45:01.081,51.72
2026-01-18T10:45:01.157,51.72
2026-01-18T10:45:01.221,61.27
2026-01-18T10:45:01.283,51.72
2026-01-18T10:45:01.343,51.72
2026-01-18T10:45:01.402,61.74
2026-01-18T10:45:01.469,61.74
2026-01-18T10:45:01.545,49.84
2026-01-18T10:45:01.607,62.25
2026-01-18T10:45:01.669,62.25
2026-01-18T10:45:01.729,57.65
2026-01-18T10:45:01.789,52.16
2026-01-18T10:45:01.849,52.16
2026-01-18T10:45:01.910,63.57
2026-01-18T10:45:02.002,60.33
2026-01-18T10:45:02.032,60.33
2026-01-18T10:45:02.092,52.16
2026-01-18T10:45:02.151,52.16
2026-01-18T10:45:02.212,62.25
2026-01-18T10:45:02.273,62.25
2026-01-18T10:45:02.333,60.37
2026-01-18T10:45:02.393,51.3
2026-01-18T10:45:02.453,51.3
2026-01-18T10:45:02.513,61.31
2026-01-18T10:45:02.575,61.31
2026-01-18T10:45:02.638,60.37
2026-01-18T10:45:02.700,52.67
2026-01-18T10:45:02.767,52.67
2026-01-18T10:45:02.836,62.21
2026-01-18T10:45:02.917,61.74
2026-01-18T10:45:02.997,50.79
2026-01-18T10:45:03.128,66.71
2026-01-18T10:45:03.153,66.71
2026-01-18T10:45:03.203,49.84
2026-01-18T10:45:03.267,49.84
2026-01-18T10:45:03.338,66.71
2026-01-18T10:45:03.409,53.05
2026-01-18T10:45:03.481,53.05
2026-01-18T10:45:03.553,50.79
2026-01-18T10:45:03.619,57.14
2026-01-18T10:45:03.690,57.14
2026-01-18T10:45:03.760,49.42
2026-01-18T10:45:03.825,59.44
2026-01-18T10:45:03.889,59.44
2026-01-18T10:45:03.970,49.42
2026-01-18T10:45:04.031,64.42
2026-01-18T10:45:04.102,50.37
2026-01-18T10:45:04.202,50.79
2026-01-18T10:45:04.267,50.79
2026-01-18T10:45:04.330,59.02
2026-01-18T10:45:04.390,59.02
2026-01-18T10:45:04.451,63.06
2026-01-18T10:45:04.512,50.79
2026-01-18T10:45:04.573,50.79
2026-01-18T10:45:04.634,61.31
2026-01-18T10:45:04.695,61.31
2026-01-18T10:45:04.757,60.81
2026-01-18T10:45:04.817,51.72
2026-01-18T10:45:04.878,51.72
2026-01-18T10:45:04.941,65.36
2026-01-18T10:45:05.002,65.36
2026-01-18T10:45:05.063,57.14
2026-01-18T10:45:05.125,53.09
2026-01-18T10:45:05.190,53.09
2026-01-18T10:45:05.254,64.94
2026-01-18T10:45:05.340,52.16
2026-01-18T10:45:05.437,56.28
2026-01-18T10:45:05.503,56.28
2026-01-18T10:45:05.565,60.37
2026-01-18T10:45:05.639,53.98
2026-01-18T10:45:05.703,53.98
2026-01-18T10:45:05.765,62.21
2026-01-18T10:45:05.836,51.72
2026-01-18T10:45:05.897,51.72
2026-01-18T10:45:05.957,67.66
2026-01-18T10:45:06.022,49.84
2026-01-18T10:45:06.083,49.84
2026-01-18T10:45:06.153,57.14
2026-01-18T10:45:06.214,57.14
2026-01-18T10:45:06.274,58.07
2026-01-18T10:45:06.337,53.05
2026-01-18T10:45:06.398,68.08
2026-01-18T10:45:06.524,68.08
2026-01-18T10:45:06.546,56.72
2026-01-18T10:45:06.581,56.72
2026-01-18T10:45:06.643,53.48
2026-01-18T10:45:06.705,53.48
2026-01-18T10:45:06.766,67.66
2026-01-18T10:45:06.827,67.66
2026-01-18T10:45:06.889,50.37
2026-01-18T10:45:06.953,56.28
2026-01-18T10:45:07.015,56.28
2026-01-18T10:45:07.086,53.48
2026-01-18T10:45:07.146,53.48
2026-01-18T10:45:07.207,53.48
2026-01-18T10:45:07.264,68.08
2026-01-18T10:45:07.333,68.08
2026-01-18T10:45:07.390,52.16
2026-01-18T10:45:07.459,54.42
2026-01-18T10:45:07.518,54.42
2026-01-18T10:45:07.579,52.16
2026-01-18T10:45:07.668,52.16
2026-01-18T10:45:07.715,52.16
2026-01-18T10:45:07.776,67.23
2026-01-18T10:45:07.838,67.23
2026-01-18T10:45:07.899,50.37
2026-01-18T10:45:07.960,55.35
2026-01-18T10:45:08.021,55.35
2026-01-18T10:45:08.083,69.01
2026-01-18T10:45:08.145,52.67
2026-01-18T10:45:08.206,52.67
2026-01-18T10:45:08.267,60.81
2026-01-18T10:45:08.328,60.81
2026-01-18T10:45:08.388,67.23
2026-01-18T10:45:08.466,55.35
2026-01-18T10:45:08.527,55.35
2026-01-18T10:45:08.588,67.66
2026-01-18T10:45:08.648,52.67
2026-01-18T10:45:08.710,53.05
2026-01-18T10:45:08.830,53.05
2026-01-18T10:45:08.890,69.53
2026-01-18T10:45:08.958,52.67
2026-01-18T10:45:09.014,52.67
2026-01-18T10:45:09.071,58.59
2026-01-18T10:45:09.137,58.59
2026-01-18T10:45:09.221,52.12
2026-01-18T10:45:09.283,67.23
2026-01-18T10:45:09.348,67.23
2026-01-18T10:45:09.411,54.93
2026-01-18T10:45:09.476,58.07
2026-01-18T10:45:09.549,58.07
2026-01-18T10:45:09.615,52.16
2026-01-18T10:45:09.676,58.07
2026-01-18T10:45:09.736,58.07
2026-01-18T10:45:09.797,51.72
2026-01-18T10:45:09.935,51.72
2026-01-18T10:45:09.957,51.72
2026-01-18T10:45:09.984,58.07
2026-01-18T10:45:10.048,58.07
2026-01-18T10:45:10.110,62.25
2026-01-18T10:45:10.171,51.72
2026-01-18T10:45:10.242,51.72
2026-01-18T10:45:10.302,67.23
2026-01-18T10:45:10.364,50.37
2026-01-18T10:45:10.424,50.37
2026-01-18T10:45:10.486,56.28
2026-01-18T10:45:10.547,56.28
2026-01-18T10:45:10.613,64.94
2026-01-18T10:45:10.679,51.72
2026-01-18T10:45:10.743,51.72
2026-01-18T10:45:10.804,67.66
2026-01-18T10:45:10.878,50.37
2026-01-18T10:45:10.941,50.37
2026-01-18T10:45:11.070,69.01
2026-01-18T10:45:11.130,49.42
2026-01-18T10:45:11.192,55.35
2026-01-18T10:45:11.263,55.35
2026-01-18T10:45:11.325,56.72
2026-01-18T10:45:11.386,53.09
2026-01-18T10:45:11.448,53.09
2026-01-18T10:45:11.510,68.08
2026-01-18T10:45:11.573,68.08
2026-01-18T10:45:11.635,50.37
2026-01-18T10:45:11.722,63.57
2026-01-18T10:45:11.784,51.3
2026-01-18T10:45:11.846,51.3
2026-01-18T10:45:11.908,64.94
2026-01-18T10:45:11.973,64.94
2026-01-18T10:45:12.034,53.98
2026-01-18T10:45:12.097,55.35
2026-01-18T10:45:12.203,67.23
2026-01-18T10:45:12.232,67.23
2026-01-18T10:45:12.284,50.79
2026-01-18T10:45:12.346,50.79
2026-01-18T10:45:12.416,59.44
2026-01-18T10:45:12.437,59.44
2026-01-18T10:45:12.547,49.03
2026-01-18T10:45:12.658,49.03
2026-01-18T10:45:12.768,49.03
2026-01-18T10:45:12.878,49.03
2026-01-18T10:45:12.988,49.03
2026-01-18T10:45:13.099,49.03
2026-01-18T10:45:13.209,50.82
2026-01-18T10:45:13.420,49.03
2026-01-18T10:45:13.532,49.03
2026-01-18T10:45:13.641,48.52
2026-01-18T10:45:13.752,48.52
2026-01-18T10:45:13.861,48.52
2026-01-18T10:45:13.971,48.52
2026-01-18T10:45:14.082,48.52
2026-01-18T10:45:14.193,48.52
2026-01-18T10:45:14.303,49.03
2026-01-18T10:45:14.413,50.4
2026-01-18T10:45:14.543,48.52
2026-01-18T10:45:14.653,48.52
2026-01-18T10:45:14.763,48.52
2026-01-18T10:45:14.874,48.52
2026-01-18T10:45:14.985,48.52
2026-01-18T10:45:15.095,49.03
2026-01-18T10:45:15.205,49.03
2026-01-18T10:45:15.315,48.52
2026-01-18T10:45:15.425,48.52
2026-01-18T10:45:15.534,50.82
2026-01-18T10:45:15.665,48.52
2026-01-18T10:45:15.775,48.52
2026-01-18T10:45:15.885,48.52
2026-01-18T10:45:15.995,48.52
2026-01-18T10:45:16.106,48.52
2026-01-18T10:45:16.217,49.03
2026-01-18T10:45:16.327,48.52
2026-01-18T10:45:16.437,48.52
2026-01-18T10:45:16.546,48.52
2026-01-18T10:45:16.657,50.4
2026-01-18T10:45:16.788,48.52
2026-01-18T10:45:16.898,48.52
2026-01-18T10:45:17.009,48.52
2026-01-18T10:45:17.119,48.52
2026-01-18T10:45:17.229,49.03
2026-01-18T10:45:17.339,48.52
2026-01-18T10:45:17.449,48.52
2026-01-18T10:45:17.558,48.52
2026-01-18T10:45:17.670,48.52
2026-01-18T10:45:17.781,50.37
2026-01-18T10:45:17.908,49.03
2026-01-18T10:45:18.017,48.52
total_tokens,267
elapsed_s,28.644992
avg_power_W,55.060354223433244
energy_Wh,0.4381120572909476
8. 分析
从这份数据可以得出以下结论(所有数值均为单次问答):
-
时间
- 总耗时 ≈ 28.6 s
- 采样点 267 个 → 平均 10.7 次/s,完整覆盖推理前-中-后三个阶段。
-
Token 效率
- 共 267 个 token(含输入+输出,流式逐 token 返回)
- 吞吐 ≈ 267 / 28.6 ≈ 9.3 token/s
- 每 token 延迟 ≈ 107 ms
-
功耗
- 基线待机 48–50 W
- 推理峰值 69.5 W,平均 55.1 W,抬升约 15 W
- 动态范围 21 W(48 W → 69 W)
-
能耗
- 总能量 0.438 Wh(≈ 1.58 kJ)
- 单 token 能耗 0.438 Wh / 267 ≈ 1.64 mWh
- 按 0.6 ¥/kWh 估算,电费 ≈ 0.00026 ¥(0.026 分钱)
-
能效比
- 9.3 token/s ÷ 55 W ≈ 0.17 token/J
- 或 6 J/token,相当于点亮 6 W LED 灯泡 1 秒。
-
对比参考
- 同尺寸纯 GPU 量化模型通常 10–20 token/s,此处 9.3 token/s 略低,可能受网络/API 开销或 CPU 预处理限制。
- 单 token 1.64 mWh 与文献中 30 B 级量化模型 1–3 mWh 相符,属于正常水平。
一句话总结
“30B-Q8 量化模型在上古 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”

使用上古的A100-SXM4-40GB GPU通过ollama运行qwen3-30b-a3b模型.“30B-Q8 量化模型在 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”.

浙公网安备 33010602011771号