watchdog(二)

关于 watchdog(一)中对hardlockup的检测,

kernel/watchdog.c +/watchdog_overflow_callback中也调用了 is_hardlockup()函数,而且该函数是作为一个回调函数注册到perf_event。在watchdog_nmi_enable中调用了perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);

watchdog_nmi_enable
  |-->wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
  |-->perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);

wd_attr = &wd_hw_attr
static struct perf_event_attr wd_hw_attr = { 
    .type       = PERF_TYPE_HARDWARE,
    .config     = PERF_COUNT_HW_CPU_CYCLES,
    .size       = sizeof(struct perf_event_attr),
    .pinned     = 1,
    .disabled   = 1,
};

但是对代码进行跟踪后,发现这个函数应该没有注册成功(最好用板子实际检测下)。在perf_event_init中注册的pmu,其event_init均没有处理属性类型为PERF_TYPE_HARDWARE的event。所以watchdog_overflow_callback应该没有被调用。

watchdog_overflow_callback(struct perf_event *event, struct perf_sample_data *data,
                           struct perf_sample_data *data, struct pt_regs *regs)
  |-->if (__this_cpu_read(watchdog_nmi_touch) == true) ...放弃该次hardlockup检测
  |-->if (is_hardlockup()) ...处理hardlockup

 

按照文档所述“An NMI perf event is generated every "watchdog_thresh" seconds to check for hardlockups.”,但是根据代码,这个NMI perf event没有注册成功。Why?跑下板子,看看是不是分析错了。

 

另外:仍然没有解决没有喂watchdog所导致的system reboot问题。感觉hardlockup与softlockup虽然名子与watchdog相关,而且这两种情形也可能触发panic,进而可能导致系统reboot,但是总觉得这和我要找的watchdog问题不相关啊,好像分析的方向错了。

posted on 2014-10-19 12:12  阿加  阅读(1277)  评论(0)    收藏  举报

导航