|
The analysis of MOR(MXOR) instruction implementation in MMIXWare
-- A stupid way to understand the source code.
the implementation of MOR(MXOR) is in file: mmix-arith.w
436 octa bool_mult(y,z,xor)
437 octa y,z; /* the operands */
438 bool xor; /* do we do xor instead of or? */
439 {
440 octa o,x;
441 register tetra a,b,c;
442 register int k;
443 for (k=0,o=y,x=zero_octa;o.h||o.l;k++,o=shift_right(o,8,1))
444 if (o.l&0xff) {
445 a=((z.h>>k)&0x01010101)*0xff;
446 b=((z.l>>k)&0x01010101)*0xff;
447 c=(o.l&0xff)*0x01010101;
448 if (xor) x.h^=a&c, x.l^=b&c;
449 else x.h|=a&c, x.l|=b&c;
450 }
451 return x;
452 }
It takes me several hours to understand the details.
If we treat each octabyte as a matrix, each row corresponds to a byte, then
y MOR z = z (matrix_mulitiply) y
For a=((z.h>>k)&0x01010101)*0xff;
(z.h>>k)&0x01010101 will get the four last bit in (z.h>>k). depends on the bit in last row,
((z.h>>k)&0x01010101)*0xff will expand the bit (either 0 or 1) into the whole row.
e.g.
ff
* 0x01010101
---------------
= ff
ff
ff
ff
----------------
= ffffffff
(depending on the last bit in each row of z, the result could be #ff00ff00. #ff0000ff, etc.)
similarily, b=((z.l>>k)&0x01010101)*0xff; will expand the last bit in each byte into the
whole byte.
over all, after these two step, the z becomes the replication of it's last row, since k vary
from 0 to 7, it will loop on all the rows actually.
For c=(o.l&0xff)*0x01010101, it will get the last byte in o.l and populate it to other three byte.
since it will not only or/xor h but also l. it is not necessary populate it to o.h.
one example,
let (z.h>>k)&0x01010101 = 0x01000101, then a= 0xff00ffff;
let (z.l>>k)&0x01010101 = 0x01010001, then b= 0xffff00ff;
let (o.l&0xff)=0xuv, then c= 0xuvuvuvuv;
then a&c=0xuv00uvuv;
b&c=0xuvuv00uv;
consider the elements [i,j] in result x. in this round, what value was accumalated in by operation
or(xor).
it is the jth bit in last byte of o.l & ith bit in last column of z.(do not consider looping now.)
in this round, the 64 combination of i and j, contirbute the value to the 64 bits in z.
Noticed that o loop on y from last byte to first byte. There are 8 loop/rounds, in another round.
say kth round.
the elements[i,j] will accumuate the jth bit in last (k + 1)th row & the jth bit in last (k+1)th
column.
that means the jth column in y multiply the ith row in z. it conform to the definiton for
z matrix_multiply y.
游戲和數學有密切的聯系。最近在玩九連環,感受更深。
之所以開始玩九連環,是因為在高德納的書中提到了格雷碼和九連環的關系。為了理解生成格雷碼的算法,特意買了九連環來玩。畢竟書上的
描述沒有實際玩起來那么容易理解。
通過這個游戲,我不僅會解九連環了,而且掌握的生成格雷碼的一種算法。
A detailed reading process of a piece of beautiful and trick bitwise operation code.
The following code is from MMIXWare, it is used to implement the Wyde difference between two octabyte.
in file: "mmix-arith.w"
423 tetra wyde_diff(y,z)
424 tetra y,z;
425 {
426 register tetra a=((y>>16)-(z>>16))&0x10000;
427 register tetra b=((y&0xffff)-(z&0xffff))&0x10000;
428 return y-(z^((y^z)&(b-a-(b>>16))));
429 }
It is hard to understand it without any thinking or verification, here is the process I used
to check the correctness of this algorithm.
let y = 0xuuuuvvvv;
z = 0xccccdddd; (please note the [c]s may be different hex number.)
then y>>16 = 0x0000uuuu;
z>>16 = 0x0000cccc;
then ((y>>16)-(z>>16)) = 0x1111gggg if #uuuu < #cccc or
((y>>16)-(z>>16)) = 0x0000gggg if #uuuu >= #cccc
so variable a = 0x00010000 if #uuuu < #cccc or
variable a = 0x00000000 if #uuuu >= #cccc
similarly, we can get
variable b = 0x00010000 if #vvvv < #dddd or
variable b = 0x00000000 if #vvvv >= #dddd
for (b-a-(b>>16)))), there are four different result depending on the relation between a and b.
when #uuuu >= #cccc and #vvvv >= #dddd, (b-a-(b>>16)))) = 0x00000000;
when #uuuu >= #cccc and #vvvv < #dddd, (b-a-(b>>16)))) = 0x00001111;
when #uuuu < #cccc and #vvvv >= #dddd, (b-a-(b>>16)))) = 0x11110000;
when #uuuu < #cccc and #vvvv < #dddd, (b-a-(b>>16)))) = 0x11111111;
You can see that >= map to #0000 and < map to #1111
for y-(z^((y^z)&(b-a-(b>>16)))), when (b-a-(b>>16)))) is 0x00000000, z^((y^z)&(b-a-(b>>16))) is
z^((y^z)& 0) = z^0=z, so y-(z^((y^z)&(b-a-(b>>16))))=y-z.
similarily, when (b-a-(b>>16)))) is 0x11111111, z^((y^z)&(b-a-(b>>16))) is
z^((y^z)& 1) = z^(y^z)=y, so y-(z^((y^z)&(b-a-(b>>16))))=0.
when (b-a-(b>>16)))) is 0x11110000 or 0x11110000, we can treat the y and z as two separate wydes.
each wyde in the result is correct.
You may think it is a little stupid to verify such kind of details. but for my point of view,
without such detailed analysis, I can not understand the algorithm in the code. with the hard
work like this, I successfully understand it. The pleasure deserve the effort.
I am wondering how can the author discover such a genius algorithm.
昨天晚上,瑞瑞睡到十二點的時候,爬出被子,躺在外面。結果咳嗽得很厲害,吐了好幾口。吃了枇杷膏很快就睡著了。
沒過一會,他就開始做惡夢了。下面是他的夢話:
“爸爸的手不見了”
“爸爸掉下去了”
“我親愛的爸爸不見了,我可怎么辦阿”
估計是最近洪恩的GOGO學英語的DVD看多了。
After reading the <<MMIX: A RISC Computer for the New Millennium>>, I am ispired to Create a MMIX simulator in Java.
Donald Knuth already created a high quality MMIX simulater in C, why I still bother to creating a new one in Java.
First, I want to learn more about how the computer works. I think re-implement a simulator for MMIX can
help me gain a better understanding.
Second, I want to exercise my Java skills.
After about one month's work, I realize that I can not finish it by myself. I am looking for the help.
If you are interested in MMIX and know Java, Please give me a hand.
Currently I have finished most of the instructions, but some important and complex one are not completed
yet.
I have developed a few JUnit TestCase for some instructions, but it's way far from covering all the instructions (there are 256 instructions total).
Few of the sample MMIX program in Donald Knuth's MMIXware package, such as cp.mmo, hello.mmo can be
simulated successfully, but there are much more to support.
To help on this project, first you need the access to the current source code. It's hosted on Google
code. Please follow the steps below to access the source code.
Use this command to anonymously check out the latest project source code:
# Non-members may check out a read-only working copy anonymously over HTTP.
svn checkout http://mmix.googlecode.com/svn/trunk/ mmix-read-only
If you are willint to help, please comment on this blog with your email address.
There are
many questions coming into my mind when I read the Linux kernel book and source
code. As time goes by, I become more knowledgeable than before and can address
those questions by myself, here is the first question addressed by myself.
Q: why
kernel have to map the high memory in kernel space, why not just allocate the
high memory and only map it in user process.
A: Because
kernel also need to access the high memory before it returned the allocated
memory to user process. For example, kernel must zero the page or initialized
the page for security reason. Please refer to linux device driver page 9.
Q: why not
let the clib zero the page or initialize it, it saves the kernel's effort and simplifies
the kernel.
A: besides
Requesting memory through clib, user program can also request memory through
direct System call, in this situation, the security is not guaranteed, the
information in memory will be leaked.
9/26/2008 8:57AM
Today I want to research the different ways to substitute text in the file. For records, I written them down.
1. use Ultra Edit, it is super easy for a Windows user if you have Ultra Edit installed.
use Ctrl + R to get Replacement Wizard and follow you intuition.
2. use VI in Unix.
:s/xx/yy/
will replace xx with yy.
3. use filter, such as sed, and awk in Unix.
sed -e 's/xx/yy/g' file.in > file.out
replace xx with yy in all the lines. It seem sed will not change the original input file, so I redirect the out put to file.out
1 WYSIWYM vs WYSIWYG
WYSIWYM stands for What You See is What You Mean; WYSIWYG stands for What You See is What You Get;
Microsoft -- Word is always considered as a example of WYSIWYG. Today I have a look at the tool named LyX, which is an example of WYSIWYM. From an end user's point of view, there are more similarity than difference between them.
They both display the the resulted layout on the fly; they both provide button to typeset the document.
The difference I can see between then is -- LyX use text file, while Word use binary file. But I don't think it matters.
In my humble opinion, the real difference between Word and LyX/LaTeX is as the following. In Word, you typeset in the lower level, you can control all the details but it also need more effort. In LyX/LaTex, you typeset in higher level, you only need to figure out the logic structure of the document. The resulted layout is not decided by you, you actually just share the layout developed by the expert. I think it is the key advantage of WYSIWYM.
Yesterday, we found that the application can not send mail successfully; the performance of the module using email feature is also very bad. I suspect it caused by that the mail server host name can not be resolved in the application server.
I executed the following command
host <mail server host name>
It shows a strange IP. It means it can not properly resolve the mail server host name
Then I execute the command below.
man host
The output tells me to resort to /etc/resolv.conf
open it with
vi /etc/resolv.conf
The context is as following:
nameserver <name server 1>
nameserver <name server 2>
update the config with correct DNS server IP.
Everything is OK.
P.S. It seems that the ping and host commands are different. For some host name, I can ping it but I can not host it.
The reality is far from the idealism - the inelegance in Operating System
I am interested in Operating System, after I know more and more concepts, know more and more details, I realize that the reality is far from the idealism. The root cause is the history and to some extent, it is the back compatibility. we can not afford to make a brand new thing from scratch, we need to include many old things in any things.
Let me give some example about how the history make the current Operation System become complicated and inelegant.
1. DMA
DMA stands for Direct Memory Access, which is a way to improve the parallelism in computer system. Basically, with DMA, peripheral device can access main Memory simultaneously when CPU is running. but for historical reason, in X86 platform, some DMA device only have 24 bit address line. which limit the memory scope to 16M. since X86 platform is also lack of IO-MMU to remap the address, the memory can be used in DMA is [0,16M). It definitely complicated the memory management.
2. High Memory
Since Linux kernel has only 1G linear address space, it can not address all the 4G physical memory in 32 bit machine. This is actual a design issue in Linux for historical reason. it does not predict that some day, the physical memory will become so large. Later in order to support more than 1 G physical memory, CONFIG_HIGHMEM compile option was added. There are also other way to fix this problem, such as 4G kernel space v.s. 4G user space.
3. PAE
PAE stands for Physical Memory Extension, PAE make it possible to support up to 64G physical memory. but to me, it is just a temporary solution, does not deserve the effort. I even do not want to have a look on the corresponding document. It does not make too much sense. I prefer to directly move to 64 bit platform. 64 bit platform has its own problems though.
the above is just some inelegant in hardware. majorly cause by historical reason. I am wondering how can we keep up the quick development under the burden of history. maybe at some point, we finally need to throw away the history and move on with a brand new start.
|