IPC + 软中断 + 硬中断

下面给你最直接、最专业、一次性汇总的:

A vs B:IPC + 软中断 + 硬中断 完整对比总表(核心差异一眼看懂)


🔥 1. IPC 对比(perf stat)

指标AB结论
IPC(instructions per cycle) 0.77 0.76 两台 IPC 都偏低(正常业务应 ≥1.2)。B 比 A 更低一点。
CPU 运行频率 1.777 GHz 2.184 GHz B 的频率更高,但效率反而没明显提升 → 有调度/中断瓶颈。
context-switch 857K 300K A 切换更多(3 倍)→ 更多线程/更抖动。
cpu-migrations 很低 极高(42K vs 167K) B CPU 迁移极高 → 调度开销更大(对应 SCHED 高)。

⭐IPC 核心结论

  • A:被中断和 RCU 拖慢

  • B:被调度和 NET_TX / tasklet 拖慢


🔥 2. 软中断(softirq)对比

##(1)A 的软中断占比

RCU       34.35%
NET_RX    21.35%
SCHED     20.39%
TIMER     20.00%
NET_TX     0.10%
TASKLET    3.78%

##(2)B 的软中断占比

SCHED     24.48%
RCU       23.33%
NET_RX    17.57%
TIMER     14.26%
NET_TX    11.86%
TASKLET    8.37%

##(3)软中断关键差异(直接结论)

软中断类型AB结论
RCU 高很多(+11%) A 内核 RCU 回调压力大(内核态忙)。
NET_RX 高(+4%) A 收包更多。
TIMER 高(+6%) A 内核定时器负担重。
NET_TX 极低(0.1%) 11.86% B 明显发包更高(主动发送)。
TASKLET 3.7% 8.3% B 驱动层负载更重。
SCHED 20% 24% B 线程/调度负载更重。

⭐软中断核心结论

  • A:更偏接收型压力(RX/RCU/TIMER → 系统层负载大)

  • B:更偏发送/调度压力(TX/tasklet/SCHED → 业务线程与网卡发送压力大)


🔥 3. 硬中断(IRQ)控制器对比(最关键)

##(1)A 的特征

  • 总中断数:534,798,109,781

  • CPU:47 核

  • IRQ 源数量:247

  • 中断集中在编号 68157xxx-edge,数值普遍 1%~5% 区间
    高度分布,NUMA/IRQ balance 看起来 OK

📌 A 的硬中断均衡,没有明显集中热点。
这更印证:
→ A 的瓶颈来自 softirq(RCU、NET_RX)而不是硬 IRQ。


##(2)B 的特征

  • 总中断数:1,734,028,374,169(A 的 3.2 倍)

  • CPU:39 核(比 A 少)

  • IRQ 源数量:142(比 A 少)

  • 中断集中在 41943xx-edge,每个占比 2~3%

  • 整体 IRQ 数量远大于 A(多 3 倍)

📌 B 的硬 IRQ 明显更多(3 倍),而 CPU 反而更少 → 每核 IRQ 压力暴增。

这非常贴合 softirq 里的结果:

  • B 的 NET_TX、TASKLET、SCHED 都偏高
    驱动路径 + 硬中断过多 → softirq 处理链更繁重


⭐最终一张总对比表(你要的一次性汇总)

维度AB结论
IPC 0.77 0.76 两台都低;B 更低一点。
context-switch A 线程切换更多,调度更抖。
cpu-migrations 极高 B 的调度压力更大。
RCU softirq 34%(极高) 23% A 内核态压力大。
NET_RX A 收包更多,像是「被打更多」。
NET_TX 极低 11.8% B 发包多,是「主动作业更多」。
TASKLET 3.8% 8.3% B 驱动层非常忙。
SCHED 20% 24% B 线程调度压力更大。
TIMER A 定时器中断更多。
总硬中断 5.3e11 1.7e12(3.2 倍) B 硬中断爆炸式更高。
IRQ 源数量 247 142 A 的设备/队列更多但中断均匀;B fewer but heavier。
CPU 核数 47 39 B 核数更少但中断更多 → 负担更大。

🔥 最终三句话结论(非常关键)

① A 的瓶颈是:RCU + NET_RX → 内核负载为主,偏被动压力。

② B 的瓶颈是:NET_TX + TASKLET + SCHED + 中断更多 → 网络驱动 + 业务线程调度压力。

③ B 的硬中断 3 倍于 A,但 CPU 反而更少,是导致 softirq(NET_TX/TASKLET/SCHED)占比最高的核心原因。


如需进一步,我可以给你:
✔ 结合 ES / Logstash 的具体业务推断原因
✔ 提供 irqbalance 优化方案
✔ 给 NUMA 绑定网卡队列脚本
✔ softirq + perf top 深度剖析指令路径

 

#!/bin/bash

# 完全动态检测的软中断统计脚本
awk '
BEGIN {
    # 初始化所有计数器
    delete softirq_counts
    delete softirq_names
    num_types = 0
}
{
    if (NR == 1) {
        # 动态获取CPU数量
        ncpus = NF - 1
        next
    }
    
    # 获取当前行的软中断类型
    current_type = $1
    gsub(/:/, "", current_type)  # 移除可能存在的冒号
    
    # 如果是新发现的类型,添加到类型列表
    if (!(current_type in softirq_counts)) {
        num_types++
        softirq_names[num_types] = current_type
    }
    
    # 为每种类型累加所有CPU核心的中断数
    for (i=2; i<=NF; i++) {
        softirq_counts[current_type] += $i
    }
}
END {
    # 计算总数
    total = 0
    for (type in softirq_counts) {
        total += softirq_counts[type]
    }
    
    if (total == 0) {
        print "错误: 未读取到任何软中断数据"
        print "可能原因:"
        print "1. /proc/softirqs 文件格式不符合预期"
        print "2. 没有足够的权限读取文件"
        print "3. 系统未产生任何软中断"
        exit 1
    }
    
    # 打印详细报告
    printf "%-12s %15s %10s\n", "中断类型", "总计", "占比"
    printf "==========================================\n"
    
    # 按计数排序输出
    sort_cmd = "sort -k2,2nr"
    for (i=1; i<=num_types; i++) {
        type = softirq_names[i]
        if (type in softirq_counts) {
            count = softirq_counts[type]
            ratio = (count / total) * 100
            printf "%-12s %15d %9.2f%%\n", type, count, ratio | sort_cmd
        }
    }
    close(sort_cmd)
    
    printf "==========================================\n"
    printf "%-12s %15d %10s\n", "总计", total, "100.00%"
    
    # 添加诊断建议
    print "\n诊断建议:"
    for (type in softirq_counts) {
        if (softirq_counts[type] > total * 0.3) {  # 占比超过30%的类型
            printf "高负载类型: %s (%.2f%%)\n", type, (softirq_counts[type]/total)*100
            print_suggestions(type)
        }
    }
}

function print_suggestions(type) {
    if (type ~ /NET_/) {
        print "  - 网络优化建议:"
        print "    * 检查网络负载: ethtool -S eth0"
        print "    * 增加队列数量: ethtool -L eth0 combined <N>"
        print "    * 启用RSS: ethtool -X eth0 equal <N>"
        print "    * 检查网络中断均衡: cat /proc/interrupts | grep eth0"
    }
    else if (type == "TIMER" || type ~ /TIMER/) {
        print "  - 定时器优化建议:"
        print "    * 检查时钟源: cat /sys/devices/system/clocksource/clocksource0/current_clocksource"
        print "    * 考虑使用TSC时钟源: echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource"
        print "    * 调整tickless模式: 在内核启动参数添加 nohz_full=cpulist"
    }
    else if (type == "RCU" || type ~ /RCU/) {
        print "  - RCU优化建议:"
        print "    * 调整RCU参数: sysctl -w kernel.rcupdate.rcu_cpu_stall_timeout=30"
        print "    * 检查RCU状态: cat /proc/rcu/rcu*/gpstats"
        print "    * 考虑调整RCU回调批处理大小: 修改内核参数 rcutree.rcu_min_cb_interval"
    }
    else if (type == "SCHED" || type ~ /SCHED/) {
        print "  - 调度优化建议:"
        print "    * 调整调度粒度: sysctl -w kernel.sched_min_granularity_ns=1000000"
        print "    * 检查调度统计: cat /proc/schedstat"
        print "    * 考虑使用性能调控器: cpupower frequency-set -g performance"
    }
    else if (type == "TASKLET" || type ~ /TASKLET/) {
        print "  - Tasklet优化建议:"
        print "    * 检查驱动负载: cat /proc/interrupts"
        print "    * 考虑升级相关硬件驱动"
        print "    * 检查ksoftirqd进程CPU使用: top -p \$(pgrep ksoftirqd | tr \"\n\" \",\" | sed \"s/,\$//\")"
    }
    else if (type ~ /BLOCK/ || type ~ /IO_/) {
        print "  - 块设备/IO优化建议:"
        print "    * 检查存储设备队列深度: cat /sys/block/sd*/queue/nr_requests"
        print "    * 优化IO调度器: echo deadline > /sys/block/sd*/queue/scheduler"
        print "    * 检查磁盘负载: iostat -x 1"
    }
    else {
        print "  - 通用优化建议:"
        print "    * 检查系统日志: dmesg | grep -i error"
        print "    * 考虑升级内核版本"
        print "    * 检查硬件健康状况: sensors, smartctl -a /dev/sdX"
    }
    print ""
}
' /proc/softirqs

 

#!/bin/bash
# node_analysis_report.sh
# Node CPU/Load/Disk分析报告(完整、整齐对齐、表格化)
# Disk I/O 每隔3秒采样3次,排除 avg-cpu 干扰

echo "============================================"
echo "Node CPU Load Analysis Report"
echo "============================================"

# ------------------- SYSTEM LOAD -------------------
echo "==== SYSTEM LOAD ===="
now=$(date +"%H:%M:%S")
uptime_info=$(uptime -p)
load1=$(uptime | awk -F'load average: ' '{print $2}' | awk -F',' '{print $1}')
cores=$(nproc)

printf "%s up %s, load average: %s\n" "$now" "$uptime_info" "$load1"
printf "CPU cores: %d\n" "$cores"
printf "1-min Load: %s\n\n" "$load1"


# ------------------- 1. TOP CPU PROCESS GROUPS -------------------
echo "============================================"
echo "1. TOP CPU PROCESS GROUPS (by COMMAND)"
printf "%-20s %-8s %-12s %-12s %-12s\n" "COMMAND" "CPU%" "CORES_USED" "PROC_COUNT" "VOL_CTX/NONVOL_CTX"
echo "--------------------------------------------------------------------------------"

ps -eo pid,comm,%cpu --no-headers | \
awk '
{
    pid=$1
    cmd=$2
    cpu=$3
    if (!(pid in pid_seen)) {
        pid_seen[pid]=1
        cpu_sum[cmd]+=cpu
        proc_count[cmd]++

        vol=0
        nonvol=0
        status_file="/proc/"pid"/status"
        while((getline line < status_file) > 0){
            if(line ~ /^voluntary_ctxt_switches:/) {split(line,a," "); vol=a[2]}
            if(line ~ /^nonvoluntary_ctxt_switches:/) {split(line,a," "); nonvol=a[2]}
        }
        close(status_file)

        vol_ctx[cmd]+=vol
        nonvol_ctx[cmd]+=nonvol
    }
}
END {
    for(c in cpu_sum){
        cores_used=cpu_sum[c]/100
        printf "%-20s %-8.2f %-12d %-12s %-12s\n", c, cpu_sum[c], cores_used, proc_count[c], vol_ctx[c]"/"nonvol_ctx[c]
    }
}' | sort -k2 -nr | head -20


# ------------------- 2. PROCESS GROUP COUNTS WITH THREADS -------------------
echo
echo "============================================"
echo "2. PROCESS GROUP COUNTS WITH THREADS"
printf "%-20s %-12s %-12s\n" "COMMAND" "PROC_COUNT" "THREAD_COUNT"
echo "------------------------------------------------"

declare -A PROC_COUNT
declare -A THREAD_COUNT

while read -r pid cmd; do
    PROC_COUNT["$cmd"]=$(( ${PROC_COUNT["$cmd"]:-0} + 1 ))
    if [[ -d "/proc/$pid/task" ]]; then
        threads=$(ls -1 /proc/$pid/task | wc -l)
        THREAD_COUNT["$cmd"]=$(( ${THREAD_COUNT["$cmd"]:-0} + threads ))
    fi
done < <(ps -eo pid,comm --no-headers)

for cmd in "${!PROC_COUNT[@]}"; do
    printf "%-20s %-12d %-12d\n" "$cmd" "${PROC_COUNT[$cmd]}" "${THREAD_COUNT[$cmd]}"
done | sort -k2 -nr | head -30


# ------------------- 3. PROCESS NAME/STATE SUMMARY  ←(新增模块)
echo
echo "============================================"
echo "3. PROCESS NAME / STATE STATISTICS"
echo "COUNT  NAME                     STATE"
echo "------------------------------------------------"

for pid in /proc/[0-9]*; do
    status="$pid/status"
    [[ -r "$status" ]] || continue

    name=$(grep "^Name:" "$status" | awk '{print $2}')
    state=$(grep "^State:" "$status" | awk '{print $2}')

    [[ -n "$name" && -n "$state" ]] && echo "$name $state"
done \
| sort \
| uniq -c \
| sort -nk1 \
| awk '{printf "%-6s %-25s %-10s\n", $1, $2, $3}'


# ------------------- 4. DISK I/O STATISTICS (原 3,编号后移)
echo
echo "============================================"
echo "4. DISK I/O STATISTICS (2s interval, 2 samples)"
echo "-------------------------------------------------------------------------------------"

iostat -xz 2 2


# ------------------- 5. ANALYSIS HINTS -------------------
echo
echo "============================================"
echo "5. ANALYSIS HINTS"
echo "--------------------------------------------"
echo "- CPU% 高的进程组可能是性能瓶颈,关注 CORES_USED 和 PROC_COUNT"
echo "- Load 高但 CPU% 不高,可能是 I/O 等待或阻塞"
echo "- 如果 load > CPU cores,总体系统可能 CPU 饱和"
echo "- 检查 DISK I/O %util、await,分析是否存在瓶颈"
echo "- 多线程进程已去重 PID,避免重复累加 CPU%"
echo "- 可结合 top/htop/perf 等工具进一步分析热点函数和系统瓶颈"
[root@styx39 es-ops]# cat /opt/flannelcpuset.sh 
#!/bin/bash

NIC="eth0"
CPULIST="2-15"
RMEM_MAX=16777216
WMEM_MAX=16777216
TCP_RMEM="4096 87380 16777216"
TCP_WMEM="4096 87380 16777216"
RPS_CPUS="ffff"
RPS_FLOW_ENTRIES=32768

echo "=== 网络优化: $NIC ==="

# -----------------------------
# 1. 查询默认值
# -----------------------------
echo "[1] 查询默认值"

echo "[查询] Offload:"
ethtool -k $NIC | grep -E 'tcp-segmentation|generic-segmentation|generic-receive'

echo "[查询] IRQ:"
for irq in $(grep "$NIC" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
    echo "$irq: $(cat /proc/irq/$irq/smp_affinity_list)"
done

echo "[查询] TCP/UDP buffer:"
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem

echo "[查询] RPS/RFS:"
for q in /sys/class/net/$NIC/queues/rx-*; do
    echo "$q: RPS=$(cat $q/rps_cpus) FLOW=$(cat $q/rps_flow_cnt)"
done

# -----------------------------
# 2. 设置优化参数
# -----------------------------
echo "[2] 设置优化参数"

echo "[设置] Offload"
ethtool -K $NIC tso on gso on gro on

echo "[设置] IRQ 绑核"
for irq in $(grep "$NIC" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
    CURRENT=$(cat /proc/irq/$irq/smp_affinity_list)
    [ "$CURRENT" != "$CPULIST" ] && echo $CPULIST > /proc/irq/$irq/smp_affinity_list
done

echo "[设置] TCP/UDP buffer"
CURRENT_RMEM=$(sysctl -n net.core.rmem_max)
CURRENT_WMEM=$(sysctl -n net.core.wmem_max)
if [ "$CURRENT_RMEM" -ne "$RMEM_MAX" ] || [ "$CURRENT_WMEM" -ne "$WMEM_MAX" ]; then
    sysctl -w net.core.rmem_max=$RMEM_MAX
    sysctl -w net.core.wmem_max=$WMEM_MAX
    sysctl -w net.ipv4.tcp_rmem="$TCP_RMEM"
    sysctl -w net.ipv4.tcp_wmem="$TCP_WMEM"
fi

echo "[设置] RPS/RFS"
for q in /sys/class/net/$NIC/queues/rx-*; do
    CURRENT_RPS=$(cat $q/rps_cpus)
    CURRENT_FLOW=$(cat $q/rps_flow_cnt)
    if [ "$CURRENT_RPS" != "$RPS_CPUS" ] || [ "$CURRENT_FLOW" != "$RPS_FLOW_ENTRIES" ]; then
        echo $RPS_CPUS > $q/rps_cpus
        echo $RPS_FLOW_ENTRIES > $q/rps_flow_cnt
    fi
done

# -----------------------------
# 3. 打印设置后状态
# -----------------------------
echo "[3] 设置完成后的状态"

echo "网卡 offload 状态:"
ethtool -k $NIC | grep -E 'tcp-segmentation|generic-segmentation|generic-receive'

echo "IRQ 绑定:"
for irq in $(grep "$NIC" /proc/interrupts | awk -F: '{print $1}' | tr -d ' '); do
    echo "$irq: $(cat /proc/irq/$irq/smp_affinity_list)"
done

echo "TCP/UDP buffer:"
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem

echo "RPS/RFS:"
for q in /sys/class/net/$NIC/queues/rx-*; do
    echo "$q: RPS=$(cat $q/rps_cpus) FLOW=$(cat $q/rps_flow_cnt)"
done

echo "=== 优化完成 ==="

  

  

 #!/bin/bash
# 硬件中断统计终极版(100%动态适配所有Linux系统)

awk '
BEGIN {
    print "中断统计报告(按中断控制器和设备分类)"
    print "================================================================"
    printf "%-20s %-30s %12s %10s\n", "控制器类型", "设备/描述", "中断计数", "占比(%)"
    print "----------------------------------------------------------------"
}
NR == 1 {
    ncpus = NF - 1
    next
}
/^[ \t]*$/ { next }
/:/ && $1 ~ /^[0-9]+:/ {
    irq_num = $1
    irq_num = substr(irq_num, 1, length(irq_num) - 1)

    # 统计中断数
    sum = 0
    for (i = 2; i <= 1 + ncpus; i++) {
        sum += $i
    }
    if (sum == 0) next

    # 获取控制器类型和设备名(倒数第2列和最后一列)
    controller = $(NF - 1)
    device = $NF

    # 合并部分字段,比如 MSI控制器带空格的情况
    if (controller ~ /^PCI|MSI|IO-APIC|Reschedule|Function/) {
        controller = ""
        for (j = NF - 2; j >= 2 + ncpus && $(j) !~ /^[0-9]+$/; j--) {
            controller = $(j) " " controller
        }
        gsub(/[ \t]+$/, "", controller)
    }

    total_irqs += sum
    key = controller "|" device
    counts[key] += sum
    ctrl_totals[controller] += sum
    raw_lines[key] = $0
}
END {
    if (total_irqs == 0) {
        print "错误:未能读取有效中断数据"
        print "可能原因:"
        print "1. /proc/interrupts 文件格式不符合预期"
        print "2. 当前系统没有产生硬件中断"
        print "3. 需要 root 权限访问该数据"
        print "原始文件样例:"
        system("head -n 5 /proc/interrupts")
        exit 1
    }

    PROCINFO["sorted_in"] = "@val_num_desc"
    for (key in counts) {
        split(key, parts, "|")
        ratio = (counts[key] * 100.0) / total_irqs
        printf "%-20s %-30s %12d %9.2f%%\n",
               parts[1], parts[2], counts[key], ratio
    }

    print "\n中断控制器汇总:"
    print "-----------------------------------------------"
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (ctrl in ctrl_totals) {
        ratio = (ctrl_totals[ctrl] * 100.0) / total_irqs
        printf "%-20s %12d %9.2f%%\n", ctrl, ctrl_totals[ctrl], ratio
    }

    print "==============================================="
    printf "%-20s %12d %10s\n", "总中断数", total_irqs, "100.00%"

    print "\n调试信息:"
    print "1. 检测到", ncpus, "个CPU核心"
    print "2. 共处理了", length(counts), "个有效中断源"
}
' /proc/interrupts

posted on 2025-11-18 11:33  吃草的青蛙  阅读(21)  评论(0)    收藏  举报

导航