bencoding是BT用来说明与组织数据的格式。BitTorrent-4.1.6\BitTorrent\bencode.py是算法的实现。bencoding相当简单,不过我还是写了一个UnitTest来验证我的理解。

 1 import unittest
 2 from BitTorrent.bencode import *
 3 
 4 class testBencodeFun(unittest.TestCase):
 5     def test_decode(self):
 6         self.assert_(bdecode('6:1234ab'== '1234ab')
 7         self.assert_(bdecode('i100e'== 100)
 8         self.assert_(bdecode('l5:hello3:p2pe'== ['hello','p2p'])
 9         self.assert_(bdecode('d5:hello3:p2pe'== {'hello':'p2p'})
10 
11     def test_encode(self):
12         self.assert_('6:1234ab' == bencode('1234ab'))
13         self.assert_('i100e' == bencode(100))
14         self.assert_('l5:hello3:p2pe' == bencode(['hello','p2p']))
15         self.assert_('d5:hello3:p2pe' == bencode({'hello':'p2p'}))
16 
17 if __name__ == '__main__':
18     unittest.main()

    运行单元测试的结果

Ran 2 tests in 0.000s

OK

    上面的UnitTest测试了bencoding所有的四种类型数据的解码与编码,一切正常 ^_^
    string:      <string length>:<string data>
    integer:     
i<integer>e
    list:     l<bencoded type>e
    dictionary:     d<bencoded string><bencoded element>e

    打开一个.torrent文件的一部分:d8:announce26:http://bt.5qzone.net:8080/ ... ...,‘d’开头,说明是一个bencoding的dict类型、紧跟着的8:announce 是一个定长为8的字符串,再之就是定长为26的一个字符串,后面还有更多的内容,就无需多说,用四种类型一个个去匹配就行了

 bencoding的说明(摘自 http://www.bittorrent.com/protocol.html )

    Metainfo file and tracker responses are both sent in a simple, efficient, and extensible format called bencoding (pronounced 'bee encoding'). Bencoded messages are nested dictionaries and lists (as in Python), which can contain strings and integers. Extensibility is supported by ignoring unexpected dictionary keys, so additional optional ones can be added later. 

    Bencoding is done as follows:

  • Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'.
  • Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0.
  • Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs'].
  • Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow': 'moo', 'spam': 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam': ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).