watchdog(二)
关于 watchdog(一)中对hardlockup的检测,
kernel/watchdog.c +/watchdog_overflow_callback中也调用了 is_hardlockup()函数,而且该函数是作为一个回调函数注册到perf_event。在watchdog_nmi_enable中调用了perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
watchdog_nmi_enable |-->wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh); |-->perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL); wd_attr = &wd_hw_attr static struct perf_event_attr wd_hw_attr = { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, .size = sizeof(struct perf_event_attr), .pinned = 1, .disabled = 1, }; 但是对代码进行跟踪后,发现这个函数应该没有注册成功(最好用板子实际检测下)。在perf_event_init中注册的pmu,其event_init均没有处理属性类型为PERF_TYPE_HARDWARE的event。所以watchdog_overflow_callback应该没有被调用。 watchdog_overflow_callback(struct perf_event *event, struct perf_sample_data *data, struct perf_sample_data *data, struct pt_regs *regs) |-->if (__this_cpu_read(watchdog_nmi_touch) == true) ...放弃该次hardlockup检测 |-->if (is_hardlockup()) ...处理hardlockup
按照文档所述“An NMI perf event is generated every "watchdog_thresh" seconds to check for hardlockups.”,但是根据代码,这个NMI perf event没有注册成功。Why?跑下板子,看看是不是分析错了。
另外:仍然没有解决没有喂watchdog所导致的system reboot问题。感觉hardlockup与softlockup虽然名子与watchdog相关,而且这两种情形也可能触发panic,进而可能导致系统reboot,但是总觉得这和我要找的watchdog问题不相关啊,好像分析的方向错了。
浙公网安备 33010602011771号