How Uuencoding Works

做题目学习  https://www.zhihu.com/question/26598476/answer/45396765

 

 

http://email.about.com/od/emailbehindthescenes/a/uuencoding.htm

It begins with "begin". It ends with "end". In between, however, are lots and lots of random characters. Or so it seems.

 

Of course, the characters you see in the email attachment are not random.

They are uuencoded.

Uuencoding is a method of representing arbitrary任意的;武断的;专制的 binary data (such as programs, word processor documents, or images) in plain US-ASCII text.

 

Why Uuencode? 【一种编码的程式(可将二进制文件编码为文本文件)】

Why would anybody need a way for doing that?

Email was designed for textual messages using only English characters and cannot safely transport text in other languages let alone更不必说 binary files such as images or programs.

When you insert the latter in an email you never know what will come out at the recipient's接受者 end, or if anything will come out at all.

This is why a way to convert binary data to ASCII text (and vice versa反之亦然) was needed.

Uuencoding is one way, and in a sense a predecessor to MIME and its predominant Base64 encoding.

 

What Uuencoding Does

Like Base64 encoding, uuencoding takes three bytes of binary data and converts them to four bytes of ASCII text (which equals four characters).

每3个字节,拆成对应的4个ASCII文本    8*3拆为6*4

Each byte consists of eight bits.

For the conversion, uuencoding starts by concatenating连结;使连锁 the 24 bits (3 times 8) that make up the three binary data bytes and splitting them into four chunks of six bits each.

To get four complete bytes (which need eight bits but we only have six), '0' characters need to be placed in front of these chunks.  需要补0

We do not want to change the value of the chunks, 007 has the same value has 7.

So far, we have turned three bytes into four bytes by adding superfluous多余的 data and wasting space.

This is necessary to ensure each byte can be represented in ASCII characters.

Another step is necessary before we can begin that conversion, however.

 

Nonchalantly冷淡地,漠不关心地, we add 32 (100000 in binary notation记号法) to each of the four bytes.    拆出的4个字节,每个+32

This makes sure that the output data is indeed a printable character, which begin with ASCII 32, the white space.

To get the character, we look up the byte value in the ASCII codeset (65, for example, corresponds to 'A').

Now we have a way to transform binary data to ASCII text.

 

To make it ready for inclusion包含;内含物 in email messages, one more hurdle障碍 is waiting for us to be taken.

While there is no limit to the size of an email message, email servers can and do limit the length of a single line in an email.

We need to insert line breaks into our output data.

 

This is done after at most 45 characters.   每一行最多45个字符

Since the last line will not always be filled with 45 characters and to introduce a certain amount of data security, each line begins with a character indicating the length of the line. 每一行开头的字符指明了长度

To get this character, we apply the procedure that already got us ASCII characters for our binary bits to the line length: we add 32 and look up the character in the ASCII table.   把长度+32然后转换成对应的ASCII码

If the line is 45 bytes long, for example, we look up 77, which is 'M'. This is why you will find a capital M at the beginning of most lines in a uuencoded file.   长度45,加上32,得到77,对应M

 

Encoding Example

Let us assume we have three bytes of input: 155, 162 and 233.

The corresponding bit stream is 100110111010001011101001,

which in turn corresponds to the 6-bit values 100110, 111010, 001011 and 101001.

 

Now we turn these six-bit chunks into full bytes (00100110 = 38, 00111010 = 58, 00001011 = 11 and 00101001 = 41)

and add 32 to get 70 = 01000110, 90 = 01011010, 43 = 00101011 and 73 = 01001001.

 

Eventually, we look up these numbers in the ASCII table: 70 = F, 90 = Z, 43 = + and 73 = I.

Our input stream converts to the printable characters FZ+I.

FZ+I is 4 bytes long, plus 32 makes 36, which translates to '$'. So the full line of uuencoded data is $FZ+I.

 

Header and Footer

Only the "begin" and "end" are still missing.

The header line of a uuencoded file consists of the word "begin", the Unix file permission value (something like "664") and the file name.

A header line could look like this, for example: "begin 664 test.bin".

The footer is simply "end". A complete example, thus, would be:

begin 664 test.bin
$FZ+I

end

 

posted @ 2016-02-10 10:08  ChuckLu  阅读(308)  评论(0编辑  收藏  举报