I normally do not read comp.lang.c, but Jim McKie told me that ``Duff's device'' had come up in comp.lang.c again. I have lost the version that was sent to netnews in May 1984, but I have reproduced below the note in which I originally proposed the device. (If anybody has a copy of the netnews version, I would gratefully receive a copy at research!td or td@research.att.com.)
To clear up a few points:
>>... Dollars to doughnuts this >>was written on a RISC machine. >Nope. Bell Labs Research uses VAXen and 68Ks, mostly.
I was at Lucasfilm when I invented the device.
Here then, is the original document describing Duff's device:
From research!ucbvax!dagobah!td Sun Nov 13 07:35:46 1983
Received: by ucbvax.ARPA (4.16/4.13) id AA18997; Sun, 13 Nov 83 07:35:46 pst
Received: by dagobah.LFL (4.6/4.6b) id AA01034; Thu, 10 Nov 83 17:57:56 PST
Date: Thu, 10 Nov 83 17:57:56 PST
From: ucbvax!dagobah!td (Tom Duff)
Message-Id: <8311110157.AA01034@dagobah.LFL>
To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr, ucbvax!research!rob
Consider the following routine, abstracted from code which copies an array of shorts into the Programmed IO data register of an Evans & Sutherland Picture System II:
send(to, from, count) register short *to, *from; register count; { do *to = *from++; while(--count>0); }
(Obviously, this fails if the count is zero.)
The VAX C compiler compiles the loop into 2 instructions (a movw and
a sobleq,
I think.) As it turns out, this loop was the bottleneck in
a real-time animation playback program which ran too slowly by about 50%.
The standard way to get more speed out of something like this is to unwind
the loop a few times, decreasing the number of sobleqs. When you do that,
you wind up with a leftover partial loop. I usually handle this in C with
a switch that indexes a list of copies of the original loop body. Of
course, if I were writing assembly language code, I'd just jump into the
middle of the unwound loop to deal with the leftovers. Thinking about this
yesterday, the following implementation occurred to me:
send(to, from, count) register short *to, *from; register count; { register n=(count+7)/8; switch(count%8){ case 0: do{ *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; }while(--n>0); } }Disgusting, no? But it compiles and runs just fine. I feel a combination of pride and revulsion at this discovery. If no one's thought of it before, I think I'll name it after myself.
It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully. (Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)
Many people (even bwk?) have said that the worst feature of C is that switches don't break automatically before each case label. This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.
yrs trly
Tom