IPC + 软中断 + 硬中断
下面给你最直接、最专业、一次性汇总的:
✅ A vs B:IPC + 软中断 + 硬中断 完整对比总表(核心差异一眼看懂)
🔥 1. IPC 对比(perf stat)
| 指标 | A | B | 结论 |
|---|---|---|---|
| IPC(instructions per cycle) | 0.77 | 0.76 | 两台 IPC 都偏低(正常业务应 ≥1.2)。B 比 A 更低一点。 |
| CPU 运行频率 | 1.777 GHz | 2.184 GHz | B 的频率更高,但效率反而没明显提升 → 有调度/中断瓶颈。 |
| context-switch | 857K | 300K | A 切换更多(3 倍)→ 更多线程/更抖动。 |
| cpu-migrations | 很低 | 极高(42K vs 167K) | B CPU 迁移极高 → 调度开销更大(对应 SCHED 高)。 |
⭐IPC 核心结论
-
A:被中断和 RCU 拖慢
-
B:被调度和 NET_TX / tasklet 拖慢
🔥 2. 软中断(softirq)对比
##(1)A 的软中断占比
RCU 34.35%
NET_RX 21.35%
SCHED 20.39%
TIMER 20.00%
NET_TX 0.10%
TASKLET 3.78%
##(2)B 的软中断占比
SCHED 24.48%
RCU 23.33%
NET_RX 17.57%
TIMER 14.26%
NET_TX 11.86%
TASKLET 8.37%
##(3)软中断关键差异(直接结论)
| 软中断类型 | A | B | 结论 |
|---|---|---|---|
| RCU | 高很多(+11%) | 低 | A 内核 RCU 回调压力大(内核态忙)。 |
| NET_RX | 高(+4%) | 低 | A 收包更多。 |
| TIMER | 高(+6%) | 低 | A 内核定时器负担重。 |
| NET_TX | 极低(0.1%) | 11.86% | B 明显发包更高(主动发送)。 |
| TASKLET | 3.7% | 8.3% | B 驱动层负载更重。 |
| SCHED | 20% | 24% | B 线程/调度负载更重。 |
⭐软中断核心结论
-
A:更偏接收型压力(RX/RCU/TIMER → 系统层负载大)
-
B:更偏发送/调度压力(TX/tasklet/SCHED → 业务线程与网卡发送压力大)
🔥 3. 硬中断(IRQ)控制器对比(最关键)
##(1)A 的特征
-
总中断数:534,798,109,781
-
CPU:47 核
-
IRQ 源数量:247
-
中断集中在编号 68157xxx-edge,数值普遍 1%~5% 区间
→ 高度分布,NUMA/IRQ balance 看起来 OK
📌 A 的硬中断均衡,没有明显集中热点。
这更印证:
→ A 的瓶颈来自 softirq(RCU、NET_RX)而不是硬 IRQ。
##(2)B 的特征
-
总中断数:1,734,028,374,169(A 的 3.2 倍)
-
CPU:39 核(比 A 少)
-
IRQ 源数量:142(比 A 少)
-
中断集中在 41943xx-edge,每个占比 2~3%
-
整体 IRQ 数量远大于 A(多 3 倍)
📌 B 的硬 IRQ 明显更多(3 倍),而 CPU 反而更少 → 每核 IRQ 压力暴增。
这非常贴合 softirq 里的结果:
-
B 的 NET_TX、TASKLET、SCHED 都偏高
→ 驱动路径 + 硬中断过多 → softirq 处理链更繁重
⭐最终一张总对比表(你要的一次性汇总)
| 维度 | A | B | 结论 |
|---|---|---|---|
| IPC | 0.77 | 0.76 | 两台都低;B 更低一点。 |
| context-switch | 高 | 低 | A 线程切换更多,调度更抖。 |
| cpu-migrations | 低 | 极高 | B 的调度压力更大。 |
| RCU softirq | 34%(极高) | 23% | A 内核态压力大。 |
| NET_RX | 高 | 低 | A 收包更多,像是「被打更多」。 |
| NET_TX | 极低 | 11.8% | B 发包多,是「主动作业更多」。 |
| TASKLET | 3.8% | 8.3% | B 驱动层非常忙。 |
| SCHED | 20% | 24% | B 线程调度压力更大。 |
| TIMER | 高 | 低 | A 定时器中断更多。 |
| 总硬中断 | 5.3e11 | 1.7e12(3.2 倍) | B 硬中断爆炸式更高。 |
| IRQ 源数量 | 247 | 142 | A 的设备/队列更多但中断均匀;B fewer but heavier。 |
| CPU 核数 | 47 | 39 | B 核数更少但中断更多 → 负担更大。 |
🔥 最终三句话结论(非常关键)
① A 的瓶颈是:RCU + NET_RX → 内核负载为主,偏被动压力。
② B 的瓶颈是:NET_TX + TASKLET + SCHED + 中断更多 → 网络驱动 + 业务线程调度压力。
③ B 的硬中断 3 倍于 A,但 CPU 反而更少,是导致 softirq(NET_TX/TASKLET/SCHED)占比最高的核心原因。
如需进一步,我可以给你:
✔ 结合 ES / Logstash 的具体业务推断原因
✔ 提供 irqbalance 优化方案
✔ 给 NUMA 绑定网卡队列脚本
✔ softirq + perf top 深度剖析指令路径
#!/bin/bash
# 完全动态检测的软中断统计脚本
awk '
BEGIN {
# 初始化所有计数器
delete softirq_counts
delete softirq_names
num_types = 0
}
{
if (NR == 1) {
# 动态获取CPU数量
ncpus = NF - 1
next
}
# 获取当前行的软中断类型
current_type = $1
gsub(/:/, "", current_type) # 移除可能存在的冒号
# 如果是新发现的类型,添加到类型列表
if (!(current_type in softirq_counts)) {
num_types++
softirq_names[num_types] = current_type
}
# 为每种类型累加所有CPU核心的中断数
for (i=2; i<=NF; i++) {
softirq_counts[current_type] += $i
}
}
END {
# 计算总数
total = 0
for (type in softirq_counts) {
total += softirq_counts[type]
}
if (total == 0) {
print "错误: 未读取到任何软中断数据"
print "可能原因:"
print "1. /proc/softirqs 文件格式不符合预期"
print "2. 没有足够的权限读取文件"
print "3. 系统未产生任何软中断"
exit 1
}
# 打印详细报告
printf "%-12s %15s %10s\n", "中断类型", "总计", "占比"
printf "==========================================\n"
# 按计数排序输出
sort_cmd = "sort -k2,2nr"
for (i=1; i<=num_types; i++) {
type = softirq_names[i]
if (type in softirq_counts) {
count = softirq_counts[type]
ratio = (count / total) * 100
printf "%-12s %15d %9.2f%%\n", type, count, ratio | sort_cmd
}
}
close(sort_cmd)
printf "==========================================\n"
printf "%-12s %15d %10s\n", "总计", total, "100.00%"
# 添加诊断建议
print "\n诊断建议:"
for (type in softirq_counts) {
if (softirq_counts[type] > total * 0.3) { # 占比超过30%的类型
printf "高负载类型: %s (%.2f%%)\n", type, (softirq_counts[type]/total)*100
print_suggestions(type)
}
}
}
function print_suggestions(type) {
if (type ~ /NET_/) {
print " - 网络优化建议:"
print " * 检查网络负载: ethtool -S eth0"
print " * 增加队列数量: ethtool -L eth0 combined <N>"
print " * 启用RSS: ethtool -X eth0 equal <N>"
print " * 检查网络中断均衡: cat /proc/interrupts | grep eth0"
}
else if (type == "TIMER" || type ~ /TIMER/) {
print " - 定时器优化建议:"
print " * 检查时钟源: cat /sys/devices/system/clocksource/clocksource0/current_clocksource"
print " * 考虑使用TSC时钟源: echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource"
print " * 调整tickless模式: 在内核启动参数添加 nohz_full=cpulist"
}
else if (type == "RCU" || type ~ /RCU/) {
print " - RCU优化建议:"
print " * 调整RCU参数: sysctl -w kernel.rcupdate.rcu_cpu_stall_timeout=30"
print " * 检查RCU状态: cat /proc/rcu/rcu*/gpstats"
print " * 考虑调整RCU回调批处理大小: 修改内核参数 rcutree.rcu_min_cb_interval"
}
else if (type == "SCHED" || type ~ /SCHED/) {
print " - 调度优化建议:"
print " * 调整调度粒度: sysctl -w kernel.sched_min_granularity_ns=1000000"
print " * 检查调度统计: cat /proc/schedstat"
print " * 考虑使用性能调控器: cpupower frequency-set -g performance"
}
else if (type == "TASKLET" || type ~ /TASKLET/) {
print " - Tasklet优化建议:"
print " * 检查驱动负载: cat /proc/interrupts"
print " * 考虑升级相关硬件驱动"
print " * 检查ksoftirqd进程CPU使用: top -p \$(pgrep ksoftirqd | tr \"\n\" \",\" | sed \"s/,\$//\")"
}
else if (type ~ /BLOCK/ || type ~ /IO_/) {
print " - 块设备/IO优化建议:"
print " * 检查存储设备队列深度: cat /sys/block/sd*/queue/nr_requests"
print " * 优化IO调度器: echo deadline > /sys/block/sd*/queue/scheduler"
print " * 检查磁盘负载: iostat -x 1"
}
else {
print " - 通用优化建议:"
print " * 检查系统日志: dmesg | grep -i error"
print " * 考虑升级内核版本"
print " * 检查硬件健康状况: sensors, smartctl -a /dev/sdX"
}
print ""
}
' /proc/softirqs
#!/bin/bash
# node_analysis_report.sh
# Node CPU/Load/Disk分析报告(完整、整齐对齐、表格化)
# Disk I/O 每隔3秒采样3次,排除 avg-cpu 干扰
echo "============================================"
echo "Node CPU Load Analysis Report"
echo "============================================"
# ------------------- SYSTEM LOAD -------------------
echo "==== SYSTEM LOAD ===="
now=$(date +"%H:%M:%S")
uptime_info=$(uptime -p)
load1=$(uptime | awk -F'load average: ' '{print $2}' | awk -F',' '{print $1}')
cores=$(nproc)
printf "%s up %s, load average: %s\n" "$now" "$uptime_info" "$load1"
printf "CPU cores: %d\n" "$cores"
printf "1-min Load: %s\n\n" "$load1"
# ------------------- 1. TOP CPU PROCESS GROUPS -------------------
echo "============================================"
echo "1. TOP CPU PROCESS GROUPS (by COMMAND)"
printf "%-20s %-8s %-12s %-12s %-12s\n" "COMMAND" "CPU%" "CORES_USED" "PROC_COUNT" "VOL_CTX/NONVOL_CTX"
echo "--------------------------------------------------------------------------------"
ps -eo pid,comm,%cpu --no-headers | \
awk '
{
pid=$1
cmd=$2
cpu=$3
if (!(pid in pid_seen)) {
pid_seen[pid]=1
cpu_sum[cmd]+=cpu
proc_count[cmd]++
vol=0
nonvol=0
status_file="/proc/"pid"/status"
while((getline line < status_file) > 0){
if(line ~ /^voluntary_ctxt_switches:/) {split(line,a," "); vol=a[2]}
if(line ~ /^nonvoluntary_ctxt_switches:/) {split(line,a," "); nonvol=a[2]}
}
close(status_file)
vol_ctx[cmd]+=vol
nonvol_ctx[cmd]+=nonvol
}
}
END {
for(c in cpu_sum){
cores_used=cpu_sum[c]/100
printf "%-20s %-8.2f %-12d %-12s %-12s\n", c, cpu_sum[c], cores_used, proc_count[c], vol_ctx[c]"/"nonvol_ctx[c]
}
}' | sort -k2 -nr | head -20
# ------------------- 2. PROCESS GROUP COUNTS WITH THREADS -------------------
echo
echo "============================================"
echo "2. PROCESS GROUP COUNTS WITH THREADS"
printf "%-20s %-12s %-12s\n" "COMMAND" "PROC_COUNT" "THREAD_COUNT"
echo "------------------------------------------------"
declare -A PROC_COUNT
declare -A THREAD_COUNT
while read -r pid cmd; do
PROC_COUNT["$cmd"]=$(( ${PROC_COUNT["$cmd"]:-0} + 1 ))
if [[ -d "/proc/$pid/task" ]]; then
threads=$(ls -1 /proc/$pid/task | wc -l)
THREAD_COUNT["$cmd"]=$(( ${THREAD_COUNT["$cmd"]:-0} + threads ))
fi
done < <(ps -eo pid,comm --no-headers)
for cmd in "${!PROC_COUNT[@]}"; do
printf "%-20s %-12d %-12d\n" "$cmd" "${PROC_COUNT[$cmd]}" "${THREAD_COUNT[$cmd]}"
done | sort -k2 -nr | head -30
# ------------------- 3. PROCESS NAME/STATE SUMMARY ←(新增模块)
echo
echo "============================================"
echo "3. PROCESS NAME / STATE STATISTICS"
echo "COUNT NAME STATE"
echo "------------------------------------------------"
for pid in /proc/[0-9]*; do
status="$pid/status"
[[ -r "$status" ]] || continue
name=$(grep "^Name:" "$status" | awk '{print $2}')
state=$(grep "^State:" "$status" | awk '{print $2}')
[[ -n "$name" && -n "$state" ]] && echo "$name $state"
done \
| sort \
| uniq -c \
| sort -nk1 \
| awk '{printf "%-6s %-25s %-10s\n", $1, $2, $3}'
# ------------------- 4. DISK I/O STATISTICS (原 3,编号后移)
echo
echo "============================================"
echo "4. DISK I/O STATISTICS (2s interval, 2 samples)"
echo "-------------------------------------------------------------------------------------"
iostat -xz 2 2
# ------------------- 5. ANALYSIS HINTS -------------------
echo
echo "============================================"
echo "5. ANALYSIS HINTS"
echo "--------------------------------------------"
echo "- CPU% 高的进程组可能是性能瓶颈,关注 CORES_USED 和 PROC_COUNT"
echo "- Load 高但 CPU% 不高,可能是 I/O 等待或阻塞"
echo "- 如果 load > CPU cores,总体系统可能 CPU 饱和"
echo "- 检查 DISK I/O %util、await,分析是否存在瓶颈"
echo "- 多线程进程已去重 PID,避免重复累加 CPU%"
echo "- 可结合 top/htop/perf 等工具进一步分析热点函数和系统瓶颈"
[root@styx39 es-ops]# cat /opt/flannelcpuset.sh
#!/bin/bash
NIC="eth0"
CPULIST="2-15"
RMEM_MAX=16777216
WMEM_MAX=16777216
TCP_RMEM="4096 87380 16777216"
TCP_WMEM="4096 87380 16777216"
RPS_CPUS="ffff"
RPS_FLOW_ENTRIES=32768
echo "=== 网络优化: $NIC ==="
# -----------------------------
# 1. 查询默认值
# -----------------------------
echo "[1] 查询默认值"
echo "[查询] Offload:"
ethtool -k $NIC | grep -E 'tcp-segmentation|generic-segmentation|generic-receive'
echo "[查询] IRQ:"
for irq in $(grep "$NIC" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
echo "$irq: $(cat /proc/irq/$irq/smp_affinity_list)"
done
echo "[查询] TCP/UDP buffer:"
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem
echo "[查询] RPS/RFS:"
for q in /sys/class/net/$NIC/queues/rx-*; do
echo "$q: RPS=$(cat $q/rps_cpus) FLOW=$(cat $q/rps_flow_cnt)"
done
# -----------------------------
# 2. 设置优化参数
# -----------------------------
echo "[2] 设置优化参数"
echo "[设置] Offload"
ethtool -K $NIC tso on gso on gro on
echo "[设置] IRQ 绑核"
for irq in $(grep "$NIC" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
CURRENT=$(cat /proc/irq/$irq/smp_affinity_list)
[ "$CURRENT" != "$CPULIST" ] && echo $CPULIST > /proc/irq/$irq/smp_affinity_list
done
echo "[设置] TCP/UDP buffer"
CURRENT_RMEM=$(sysctl -n net.core.rmem_max)
CURRENT_WMEM=$(sysctl -n net.core.wmem_max)
if [ "$CURRENT_RMEM" -ne "$RMEM_MAX" ] || [ "$CURRENT_WMEM" -ne "$WMEM_MAX" ]; then
sysctl -w net.core.rmem_max=$RMEM_MAX
sysctl -w net.core.wmem_max=$WMEM_MAX
sysctl -w net.ipv4.tcp_rmem="$TCP_RMEM"
sysctl -w net.ipv4.tcp_wmem="$TCP_WMEM"
fi
echo "[设置] RPS/RFS"
for q in /sys/class/net/$NIC/queues/rx-*; do
CURRENT_RPS=$(cat $q/rps_cpus)
CURRENT_FLOW=$(cat $q/rps_flow_cnt)
if [ "$CURRENT_RPS" != "$RPS_CPUS" ] || [ "$CURRENT_FLOW" != "$RPS_FLOW_ENTRIES" ]; then
echo $RPS_CPUS > $q/rps_cpus
echo $RPS_FLOW_ENTRIES > $q/rps_flow_cnt
fi
done
# -----------------------------
# 3. 打印设置后状态
# -----------------------------
echo "[3] 设置完成后的状态"
echo "网卡 offload 状态:"
ethtool -k $NIC | grep -E 'tcp-segmentation|generic-segmentation|generic-receive'
echo "IRQ 绑定:"
for irq in $(grep "$NIC" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
echo "$irq: $(cat /proc/irq/$irq/smp_affinity_list)"
done
echo "TCP/UDP buffer:"
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem
echo "RPS/RFS:"
for q in /sys/class/net/$NIC/queues/rx-*; do
echo "$q: RPS=$(cat $q/rps_cpus) FLOW=$(cat $q/rps_flow_cnt)"
done
echo "=== 优化完成 ==="
#!/bin/bash
# 硬件中断统计终极版(100%动态适配所有Linux系统)
awk '
BEGIN {
print "中断统计报告(按中断控制器和设备分类)"
print "================================================================"
printf "%-20s %-30s %12s %10s\n", "控制器类型", "设备/描述", "中断计数", "占比(%)"
print "----------------------------------------------------------------"
}
NR == 1 {
ncpus = NF - 1
next
}
/^[ \t]*$/ { next }
/:/ && $1 ~ /^[0-9]+:/ {
irq_num = $1
irq_num = substr(irq_num, 1, length(irq_num) - 1)
# 统计中断数
sum = 0
for (i = 2; i <= 1 + ncpus; i++) {
sum += $i
}
if (sum == 0) next
# 获取控制器类型和设备名(倒数第2列和最后一列)
controller = $(NF - 1)
device = $NF
# 合并部分字段,比如 MSI控制器带空格的情况
if (controller ~ /^PCI|MSI|IO-APIC|Reschedule|Function/) {
controller = ""
for (j = NF - 2; j >= 2 + ncpus && $(j) !~ /^[0-9]+$/; j--) {
controller = $(j) " " controller
}
gsub(/[ \t]+$/, "", controller)
}
total_irqs += sum
key = controller "|" device
counts[key] += sum
ctrl_totals[controller] += sum
raw_lines[key] = $0
}
END {
if (total_irqs == 0) {
print "错误:未能读取有效中断数据"
print "可能原因:"
print "1. /proc/interrupts 文件格式不符合预期"
print "2. 当前系统没有产生硬件中断"
print "3. 需要 root 权限访问该数据"
print "原始文件样例:"
system("head -n 5 /proc/interrupts")
exit 1
}
PROCINFO["sorted_in"] = "@val_num_desc"
for (key in counts) {
split(key, parts, "|")
ratio = (counts[key] * 100.0) / total_irqs
printf "%-20s %-30s %12d %9.2f%%\n",
parts[1], parts[2], counts[key], ratio
}
print "\n中断控制器汇总:"
print "-----------------------------------------------"
PROCINFO["sorted_in"] = "@val_num_desc"
for (ctrl in ctrl_totals) {
ratio = (ctrl_totals[ctrl] * 100.0) / total_irqs
printf "%-20s %12d %9.2f%%\n", ctrl, ctrl_totals[ctrl], ratio
}
print "==============================================="
printf "%-20s %12d %10s\n", "总中断数", total_irqs, "100.00%"
print "\n调试信息:"
print "1. 检测到", ncpus, "个CPU核心"
print "2. 共处理了", length(counts), "个有效中断源"
}
' /proc/interrupts
浙公网安备 33010602011771号