# 云计算之路-阿里云上：“黑色1秒”问题与2009年Xen一个补丁的故事

（注1：文中所说的Xen补丁问题只是提供一种分析问题的思路，我们遇到的“黑色1秒”问题与有没有打这个补丁没有关系）

（注2：关于这个Xen补丁背后的故事，推荐阅读阿里云分享的博文：云计算之路：2009年Xen一个补丁背后那不为人知的故事

2009年3月20日，来自Intel的Yu Ke通过Xen-dev Mailing List给来自Citrix的Keir Fraser（负责的Xen开发者之一）发了一封邮件，提交了Xen的一个patch——cpuidle: suspend/resume scheduler tick timer during cpu idle entry/exit.

cpuidle can collaborate with scheduler to reduce unnecessary timer interrupt. For example, credit scheduler accounting timer doesn't need to be active at
idle time, so it can be stopped at cpuidle entry and resumed at cpuidle exit. This patch implements this function by adding two ops in scheduler:
tick_suspend/tick_resume, and implement them for credit scheduler.

With this patch, under idle scenario, timer interrupt frequency decreased from ~100HZ to ~10HZ, and average C state residency increase from ~10ms to larger than 100ms. Also in a two-socket machine, about 4% idle power saving is observed.

However, one issue is observed with this patch, i.e. there is soft-lockup in dom0 occasionally. This issue is still under debugging. Currently we already find a >1s VCPUOP_set_singleshot_timer timeout, which imply this may be a dom0 issue. we are working hard to figure the root cause.

Considering the very visible effect of this patch, and the issue mentioned above only occurs when cpuidle is enabled, and has no impact to normal user, we
finally decide to send out this patch to see if it is possible for 3.4 inclusion. In the bug fix phase, we will send out bug fix for the issue.

I don't really want to take the patch while it is soft locking up. I would expect linux-2.6.18-xen.hg:22 to avoid lockup warnings due to too long
singleshot timeouts (I assume you are testing with the 2.6.18 tree?).

Personally I would rather have cpuidle be enabled by default (or even always with no disable option) and get existing Cx benefits for everyone, rather
than have a slightly broken cpuidle option.

Is there a reason not to turn on cpuidle by default now? Or even enable and
then remove the boot option?

Right, I am testing it with 2.6.18 tree. I am also looking into the dom0 code, to see if should change dom0.

There is no obvious obstacle to turn on cpuidle by default. According to our testing and measurement, cpuidle is pretty stable now, maybe it is time to enable it by default.

Keir Fraser的坚持打动了Yu Ke，决定寻找并填掉这个坑，并且他也希望默认开启cpuidle。

2009年3月31日，Keir Fraser向Xen代码库提交了Yu Ke完成的补丁代码。在这个提交中，只字未提坑的事。

2009年3月26日Yu Ke向Keir Fraser发了一封邮件并提交了填好坑的代码：

Hi Keir,

This attached is the version 2 of the patch. The major update is fixing the soft-lockup issue.

The root cause is: sched_tick_suspend will call __stop_timer and may raise TIMER_SOFTIRQ if the timer deadline is changed. In this case, the assumption of
no softirq pending in acpi_processor_idle is broken, and later the hpet broadcast wakeup IPI will not be delivered to this CPU, since its softirq pending bit is set. Then the CPU will be sleeping until random external interrupt happen. To fix this issue, the sched_tick_suspend is moved before the softirq pending bit checking, to keep the assumption correct.

I also measure the performance by SPECJbb in dom0, no performance degradation observed.

Best Regards
Ke

if ( softirq_pending(smp_processor_id()) )
do_softirq();

sched_tick_suspend();
//...

2009年Xen那个的补丁引发的“黑色1秒”问题的解决方法出人意料地简单——只要把softirq_pending与sched_tick_suspend()的代码调换一下位置，让sched_tick_suspend()先执行。

