Duff's device
什么是 “ 达夫设备” (Duff's Device)? 这是个很棒的迂回循环展开法, 由 Tom Duff 在 Lucasfilm 时所设计。它的 “传统” 形态, 是用来复制多个字节:
register n = (count + 7) / 8; /* count > 0 assumed */
switch (count % 8)
{
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
这里 count 个字节从 from 指向的数组复制到 to 指向的内存地址 (这是个内存映射的输出寄存器, 这也是为什么它没有被增加)。它把 swtich 语句和复制 8 个字节的循环交织在一起, 从而解决了剩余字节的处理问题 (当 count 不是 8 的倍数时)。相信不相信, 象这样的把 case 标志放在嵌套在 swtich 语句内的模块中是合法的。当他公布这个技巧给 C 的开发者和世界时, Duff 注意 到 C 的 swtich 语法, 特别是 ``跌落" 行为, 一直是被争议的, 而 ``这段代码在争论中形成了某种论据, 但我不清楚是赞成还是反对"。
Anoop写了一个程序进行测试。转贴如下:
/* The Duff device
*
* An infamous example of how a compiler can accept code that should
* be illegal as per the language definition. To add insult to injury,
* the illegal code actually runs faster.
*
* The functions send and send2 accomplish the same goal (copying a
* string from one location to another) but send2 manages to screw
* with your head and achieve its goal much faster (on most
* architectures).
*
* The answer to the puzzle of how send2 actually works is exposed in
* the function send3 (see the comment above the function send3).
*
* This strange piece of code is named after the programmer who
* discovered this 'optimization' technique.
*
* -- Anoop Sarkar <anoop at cs.sfu.ca>
**/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
/* pick BUFLEN to be a suitably large number to show the speed
difference between send and send2 */
const size_t BUFLEN = 100000000;
void send (register char *to, register char *from, register int count)
{
do
*to++ = *from++;
while(--count>0);
}
void send2 (register char *to, register char *from, register int count)
{
register int n = (count+7)/8;
switch (count % 8)
{
case 0:
do
{
*to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while(--n>0);
}
}
/* The answer to the mystery turns out to be simple loop unfolding.
* send2 uses the semantics for switch statements in C to provide a
* mnemonic for how many assignments should occur within the body of
* the do-while loop.
*
* So why is send2 faster than send on some architectures? The
* conditional is a slow instruction to execute on many machine
* architectures.
*
* Try compiling with gcc with and without the -O3 flag. Turning the
* optimizer on (using -O3) shows the power of code optimization: send
* runs as fast as send2 with the optimizer on.
**/
int main (int argc, char **argv)
{
char *from, *to;
int i;
struct timeval before, after;
from = (char *) malloc(BUFLEN * sizeof(char));
to = (char *) malloc(BUFLEN * sizeof(char));
memset(from, 'a', (BUFLEN * sizeof(char)));
printf("array init done/n");
printf("calling send/n");
gettimeofday(&before, NULL);
send(to, from, BUFLEN);
gettimeofday(&after, NULL);
printf("secs=%d/n", after.tv_sec - before.tv_sec);
printf("calling send2/n");
gettimeofday(&before, NULL);
send2(to, from, BUFLEN);
gettimeofday(&after, NULL);
printf("secs=%d/n", after.tv_sec - before.tv_sec);
if (strcmp(from,to) == 0) {
printf("from=to/n");
} else {
printf("from!=to/n");
}
free(from);
free(to);
return(0);
}
浙公网安备 33010602011771号