<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    posts - 403, comments - 310, trackbacks - 0, articles - 7
      BlogJava :: 首頁 :: 新隨筆 :: 聯系 :: 聚合  :: 管理

    Tom Duff on Duff's Device

    Posted on 2007-11-29 16:02 ZelluX 閱讀(517) 評論(0)  編輯  收藏 所屬分類: Algorithm
    Subject: Re: Explanation, please!
    Summary: Original citation
    From: td@alice.UUCP (Tom Duff)
    Organization: AT&T Bell Laboratories, Murray Hill NJ
    Date: 29 Aug 88 20:33:51 GMT
    Message-ID: <8144@alice.UUCP>

    I normally do not read comp.lang.c, but Jim McKie told me that ``'' had come up in comp.lang.c again.  I have lost the version that was sent to netnews in May 1984, but I have reproduced below the note in which I originally proposed the device.  (If anybody has a copy of the netnews version, I would gratefully receive a copy at research!td or td@research.att.com.)

    To clear up a few points:

    1. The point of the device is to express general loop unrolling directly in C.  People who have posted saying `just use memcpy' have missed the point, as have those who have criticized it using various machine-dependent memcpy implementations as support.  In fact, the example in the message is not implementable as memcpy, nor is any computer likely to have an memcpy-like idiom that implements it.

       

    2. Somebody claimed that while the device was named for me, I probably didn't invent it.  I almost certainly did invent it.  I had definitely not seen or heard of it when I came upon it, and nobody has ever even claimed prior knowledge, let alone provided dates and times.  Note the headers on the message below:  apparently I invented the device on November 9, 1983, and was proud (or disgusted) enough to send mail to dmr Please note that I do not claim to have invented loop unrolling, merely this particular expression of it in C.

       

    3. The device is legal dpANS C.  I cannot quote chapter and verse, but Larry Rosler, who was chairman of the language subcommittee (I think), has assured me that X3J11 considered it carefully and decided that it was legal. Somewhere I have a note from dmr certifying that all the compilers that he believes in accept it.  Of course, the device is also legal C++, since Bjarne uses it in his book.

       

    4. Somebody invoked (or more properly, banished) the `false god of efficiency.'  Careful reading of my original note will put this slur to rest.  The alternative to genuflecting before the god of code-bumming is finding a better algorithm.  It should be clear that none such was available.  If your code is too slow, you must make it faster.  If no better algorithm is available, you must trim cycles.

       

    5. The same person claimed that the device wouldn't exhibit the desired speed-up.  The argument was flawed in two regards:  first, it didn't address the performance of the device, but rather the performance of one of its few uses (implementing memcpy) for which many machines have a high-performance idiom.  Second, the poster made his claims in the absence of timing data, which renders his assertion suspect.  A second poster tried the test, but botched the implementation, proving only that with diligence it is possible to make anything run slowly.

       

    6. Even Henry Spencer, who hit every other nail square on the end with the flat round thing stuck to it, made a mistake (albeit a trivial one).  Here is Henry replying to bill@proxftl.UUCP (T. William Wells):
         >>... Dollars to doughnuts this
          >>was written on a RISC machine.
          >Nope.  Bell Labs Research uses VAXen and 68Ks, mostly.
          

      I was at Lucasfilm when I invented the device.

       

    7. Transformations like this can only be justified by measuring the resulting code.  Be careful when you use this thing that you don't unwind the loop so much that you overflow your machine's instruction cache.  Don't try to be smarter than an over-clever C compiler that recognizes loops that implement block move or block clear and compiles them into machine idioms.

    Here then, is the original document describing Duff's device:

    From research!ucbvax!dagobah!td  Sun Nov 13 07:35:46 1983
    Received: by ucbvax.ARPA (4.16/4.13)  id AA18997; Sun, 13 Nov 83 07:35:46 pst
    Received: by dagobah.LFL (4.6/4.6b)  id AA01034; Thu, 10 Nov 83 17:57:56 PST
    Date: Thu, 10 Nov 83 17:57:56 PST
    From: ucbvax!dagobah!td (Tom Duff)
    Message-Id: <8311110157.AA01034@dagobah.LFL>
    To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr, ucbvax!research!rob

    Consider the following routine, abstracted from code which copies an array of shorts into the Programmed IO data register of an Evans & Sutherland Picture System II:

     
    
    send(to, from, count)
    register 
    short *to, *from;
    register count;
    {
        
    do
            
    *to = *from++;
        
    while (--count>0);
    }

    (Obviously, this fails if the count is zero.)
    The VAX C compiler compiles the loop into 2 instructions (a movw and a sobleq,
    I think.)  As it turns out, this loop was the bottleneck in a real-time animation playback program which ran too slowly by about 50%.  The standard way to get more speed out of something like this is to unwind the loop a few times, decreasing the number of sobleqs.  When you do that, you wind up with a leftover partial loop.  I usually handle this in C with a switch that indexes a list of copies of the original loop body.  Of course, if I were writing assembly language code, I'd just jump into the middle of the unwound loop to deal with the leftovers.  Thinking about this yesterday, the following implementation occurred to me:

     

    send(to, from, count)
        register 
    short *to, *from;
        register count;
    {
        register n
    =(count+7)/8;
        
    switch(count%8{
            
    case 0:    do {    *to = *from++;
            
    case 7:        *to = *from++;
            
    case 6:        *to = *from++;
            
    case 5:        *to = *from++;
            
    case 4:        *to = *from++;
            
    case 3:        *to = *from++;
            
    case 2:        *to = *from++;
            
    case 1:        *to = *from++;
            }
     while(--n>0);
        }

    }

    Disgusting, no?  But it compiles and runs just fine.  I feel a combination of pride and revulsion at this discovery.  If no one's thought of it before, I think I'll name it after myself.

    It amazes me that after 10 years of writing C there are still little corners that I haven't explored fully.  (Actually, I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.)

    Many people (even bwk?) have said that the worst feature of C is that switches don't break automatically before each case label.  This code forms some sort of argument in that debate, but I'm not sure whether it's for or against.

    yrs trly
    Tom

    主站蜘蛛池模板: 韩国亚洲伊人久久综合影院| 国产亚洲精AA在线观看SEE | 午夜a级成人免费毛片| 亚洲精品人成电影网| 亚洲人成影院在线观看 | 亚洲国产成人久久精品app| 亚洲Aⅴ在线无码播放毛片一线天 亚洲avav天堂av在线网毛片 | 丁香婷婷亚洲六月综合色| 一二三四在线观看免费高清中文在线观看 | 最新亚洲人成无码网站| 国产在线19禁免费观看| 免费播放美女一级毛片| 亚洲视频在线免费| 99在线热播精品免费99热| 久久青青草原亚洲AV无码麻豆| 无码人妻丰满熟妇区免费| 久久亚洲国产精品成人AV秋霞| 久久亚洲精品11p| 亚洲国产精品成人AV无码久久综合影院| 亚洲精品成人无限看| 华人在线精品免费观看| www亚洲一级视频com| 亚洲国产精品久久久久婷婷老年 | 亚洲爆乳无码精品AAA片蜜桃| 国产麻豆免费观看91| a一级爱做片免费| 女性自慰aⅴ片高清免费| 粉色视频免费入口| 亚洲国产精品无码久久久不卡| 59pao成国产成视频永久免费| 亚洲色成人网站WWW永久四虎 | a级毛片毛片免费观看久潮喷| 91亚洲精品第一综合不卡播放| 成年女人免费视频播放77777| 免费无遮挡无码视频在线观看| 亚洲国产精品成人久久| 免费在线看v网址| 一级毛片aaaaaa视频免费看| 精品久久免费视频| 中国在线观看免费的www| 亚洲91精品麻豆国产系列在线|