Duff's device

什么是 “ 达夫设备” (Duff's Device)? 这是个很棒的迂回循环展开法, 由 Tom Duff 在 Lucasfilm 时所设计。它的 “传统” 形态, 是用来复制多个字节:

    register n = (count + 7) / 8;   /* count > 0 assumed */
    switch (count % 8)
    {
    case 0:    do { *to = *from++;
    case 7:     *to = *from++;
    case 6:     *to = *from++;
    case 5:     *to = *from++;
    case 4:     *to = *from++;
    case 3:     *to = *from++;
    case 2:     *to = *from++;
    case 1:     *to = *from++;
          } while (--n > 0);
    }

这里 count 个字节从 from 指向的数组复制到 to 指向的内存地址 (这是个内存映射的输出寄存器, 这也是为什么它没有被增加)。它把  swtich 语句和复制 8 个字节的循环交织在一起, 从而解决了剩余字节的处理问题 (当 count 不是 8 的倍数时)。相信不相信, 象这样的把  case 标志放在嵌套在 swtich 语句内的模块中是合法的。当他公布这个技巧给 C 的开发者和世界时, Duff 注意 到 C 的 swtich  语法, 特别是 ``跌落" 行为, 一直是被争议的, 而 ``这段代码在争论中形成了某种论据, 但我不清楚是赞成还是反对"。

 

Anoop写了一个程序进行测试。转贴如下:

/* The Duff device 
 *
 * An infamous example of how a compiler can accept code that should
 * be illegal as per the language definition. To add insult to injury,
 * the illegal code actually runs faster. 
 *
 * The functions send and send2 accomplish the same goal (copying a
 * string from one location to another) but send2 manages to screw
 * with your head and achieve its goal much faster (on most
 * architectures). 
 * 
 * The answer to the puzzle of how send2 actually works is exposed in
 * the function send3 (see the comment above the function send3).
 *
 * This strange piece of code is named after the programmer who
 * discovered this 'optimization' technique.
 *
 * -- Anoop Sarkar <anoop at cs.sfu.ca>
 **/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>

/* pick BUFLEN to be a suitably large number to show the speed
     difference between send and send2 */

const size_t BUFLEN = 100000000;

void send (register char *to, register char *from, register int count)
{
    do
      *to++ = *from++;
    while(--count>0);
}

void send2 (register char *to, register char *from, register int count)
{
    register int n = (count+7)/8;
    switch (count % 8)
    {
    	case 0: 
	    do 
	    {       
		*to++ = *from++;
		case 7: *to++ = *from++;
		case 6: *to++ = *from++;
		case 5: *to++ = *from++;
		case 4: *to++ = *from++;
		case 3: *to++ = *from++;
		case 2: *to++ = *from++;
		case 1: *to++ = *from++;
	    } while(--n>0);
    }
}

/* The answer to the mystery turns out to be simple loop unfolding.
 * send2 uses the semantics for switch statements in C to provide a
 * mnemonic for how many assignments should occur within the body of
 * the do-while loop. 
 *
 * So why is send2 faster than send on some architectures? The
 * conditional is a slow instruction to execute on many machine
 * architectures. 
 *
 * Try compiling with gcc with and without the -O3 flag. Turning the
 * optimizer on (using -O3) shows the power of code optimization: send
 * runs as fast as send2 with the optimizer on.
 **/ 

int main (int argc, char **argv)
{
    char *from, *to;
    int i;
    struct timeval before, after;

    from = (char *) malloc(BUFLEN * sizeof(char));
    to = (char *) malloc(BUFLEN * sizeof(char));

    memset(from, 'a', (BUFLEN * sizeof(char)));
    printf("array init done/n");

    printf("calling send/n");
    gettimeofday(&before, NULL);
    send(to, from, BUFLEN);
    gettimeofday(&after, NULL);
    printf("secs=%d/n", after.tv_sec - before.tv_sec);

    printf("calling send2/n");
    gettimeofday(&before, NULL);
    send2(to, from, BUFLEN);
    gettimeofday(&after, NULL);
    printf("secs=%d/n", after.tv_sec - before.tv_sec);

    if (strcmp(from,to) == 0) {
      printf("from=to/n");
    } else {
      printf("from!=to/n");
    }

    free(from);
    free(to);

    return(0);
}

 

posted @ 2012-07-18 11:49  jeff_nie  阅读(253)  评论(0编辑  收藏  举报