pyinstxtractor代码执行流程分析
【待修改】
python可以使用库函数打包为exe,而pyinstxtractor.py是专门将打包好的py文件还原回去。下面的内容就是通过分析一个示例样本获取pyinstxtractor的执行流程和打包好的exe的文件格式。
分析的exe文件
name:box_2.exe
下载地址:https://github.com/Done163/pyinstxtractor/blob/master/box_2.exe
1. 首先读取文件最后的内容(24或88位)
PYINST20_COOKIE_SIZE = 24 # For pyinstaller 2.0
PYINST21_COOKIE_SIZE = 24 + 64 # For pyinstaller 2.1+
以下以新版本的pyinstaller 2.1+为例
filename: box_2.exe
fileSize 3477206
2. 读取后88字节

3. 通过判断开头8字节是否为”b'MEI\014\013\012\013\016'”(4D 45 49 0C 0B 0A 0B 0E)确定版本

4. 将88位格式化读取为magic, lengthofPackage, toc, tocLen, self.pyver, pylibname,
Magic(8), lengthofPackage(4), toc(4), tocLen(4), self.pyver(4), pylibname(64)
MEI 3232982 3232126 768 27 python27.dll


分别代表magic(识别码)、lengthofPackage (携带python内容的大小)、toc (载荷文件总大小)、tocLen (pyc表长度)、self.pyver(版本号)、pylibname(python动态链接库名称)灵魂绘画,哈哈哈哈!!!

5. 接下来从数据表中读取所有的数据信息
位置:fileSize –lengthofPackage + toc :3476350 (0x350b7e)
大小tocLen:768 (0x300)
entryPos(4), cmprsdDataSize(4), uncmprsdDataSize(4), cmprsFlag(1), typeCmprsData(1), name(left)由于总长不固定,总长度减去前面数据长度(18)剩下的是name的长度。

entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 0 169 234 1 m struct
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 169 1131 2480 1 m pyimod01_os_path
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 1300 4381 11725 1 m pyimod02_archive
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 5681 7501 22100 1 m pyimod03_importers
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 13182 1838 5263 1 s pyiboot01_bootstrap
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 15020 179 237 1 s box_2
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 15199 544 1050 1 b Microsoft.VC90.CRT.manifest
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 15743 41260 91648 1 b _ctypes.pyd
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 57003 479925 1016832 1 b _hashlib.pyd
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 536928 466 1009 1 b box_2.exe.manifest
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 537394 36734 71168 1 b bz2.pyd
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 574128 67070 225280 1 b msvcm90.dll
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 641198 157574 569680 1 b msvcp90.dll
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 798772 317309 653136 1 b msvcr90.dll
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 1116081 1203463 2640384 1 b python27.dll
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 2319544 5389 10240 1 b select.pyd
entrySize 48
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 2324933 257730 687104 1 b unicodedata.pyd
entrySize 80
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 2582663 0 0 0 o pyi-windows-manifest-filename box_2.exe.manifest
entrySize 32
nameLen 18
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 2582663 649463 649463 0 z PYZ-00.pyz
[*] Found 19 files in CArchive
6. 接下来按图索骥,根据刚才找到的信息读取对应位置的文件
回到载荷开始的位置
位置:fileSize –lengthofPackage :244224 (0x3ba00)
例:
第一条数据
entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name 0 169 234 1 m struct
数据含义:
entryPos(相对于载荷开始的位置);
cmprsdDataSize(数据大小);
uncmprsdDataSize(解压后大小);
cmprsFlag(解压标志位,‘1‘表示需要解压);
typeCmprsData(数据类型,‘s‘表示主要可执行文件,’z‘表示打包的pyc文件,’b‘表示可执行二进制文件,’m‘疑似配置文件);
name(文件名称)
第一个文件:

调用zlib解压:

7. 按照相同的方法取出box_2文件并解压

添加magic_number(03 F3 0D 0A 70 79 69 30)

使用在线pyc反编译工具反编译,成功取出源代码

8. 最后将标志位(typeCmprsData)为“z”或“Z”的数据进行进一步解压释放,该部分内容多为打包进去的python库。
首先读取pycHeader 03 F3 0D 0A(4-8)和tocPosition #640075 0x9c44b(8-C)

操作不同的地方在于这里将数据进行了序列化处理,直接读取了里面的内容
marshal.load(f) #f为将读取位置更改到了tocPosition的文件内容
序列化后的结果如下,一共是196个文件:
[('StringIO', (0, 17L, 4501)), ('UserDict', (0, 4518L, 2980)), ('__future__', (0, 7498L, 1743)), ('_abcoll', (0, 9241L, 7452)), ('_strptime', (0, 16693L, 6561)), ('_threading_local', (0, 23254L, 2576)), ('_weakrefset', (0, 25830L, 2508)), ('abc', (0, 28338L, 2528)), ('atexit', (0, 30866L, 1036)), ('base64', (0, 31902L, 4425)), ('bdb', (0, 36327L, 7025)), ('calendar', (0, 43352L, 9228)), ('cmd', (0, 52580L, 6001)), ('codecs', (0, 58581L, 10507)), ('collections', (0, 69088L, 9252)), ('copy', (0, 78340L, 5026)), ('copy_reg', (0, 83366L, 2366)), ('ctypes', (1, 85732L, 6928)), ('ctypes._endian', (0, 92660L, 1042)), ('difflib', (0, 93702L, 21957)), ('dis', (0, 115659L, 2966)), ('doctest', (0, 118625L, 28519)), ('dummy_thread', (0, 147144L, 2148)), ('encodings', (1, 149292L, 2112)), ('encodings.aliases', (0, 151404L, 3084)), ('encodings.ascii', (0, 154488L, 780)), ('encodings.base64_codec', (0, 155268L, 1235)), ('encodings.big5', (0, 156503L, 599)), ('encodings.big5hkscs', (0, 157102L, 607)), ('encodings.bz2_codec', (0, 157709L, 1508)), ('encodings.charmap', (0, 159217L, 1050)), ('encodings.cp037', (0, 160267L, 1234)), ('encodings.cp1006', (0, 161501L, 1309)), ('encodings.cp1026', (0, 162810L, 1243)), ('encodings.cp1140', (0, 164053L, 1215)), ('encodings.cp1250', (0, 165268L, 1281)), ('encodings.cp1251', (0, 166549L, 1278)), ('encodings.cp1252', (0, 167827L, 1273)), ('encodings.cp1253', (0, 169100L, 1260)), ('encodings.cp1254', (0, 170360L, 1273)), ('encodings.cp1255', (0, 171633L, 1257)), ('encodings.cp1256', (0, 172890L, 1293)), ('encodings.cp1257', (0, 174183L, 1272)), ('encodings.cp1258', (0, 175455L, 1277)), ('encodings.cp424', (0, 176732L, 1197)), ('encodings.cp437', (0, 177929L, 3710)), ('encodings.cp500', (0, 181639L, 1233)), ('encodings.cp720', (0, 182872L, 1340)), ('encodings.cp737', (0, 184212L, 3848)), ('encodings.cp775', (0, 188060L, 3729)), ('encodings.cp850', (0, 191789L, 3509)), ('encodings.cp852', (0, 195298L, 3729)), ('encodings.cp855', (0, 199027L, 3829)), ('encodings.cp856', (0, 202856L, 1222)), ('encodings.cp857', (0, 204078L, 3517)), ('encodings.cp858', (0, 207595L, 3480)), ('encodings.cp860', (0, 211075L, 3691)), ('encodings.cp861', (0, 214766L, 3704)), ('encodings.cp862', (0, 218470L, 3814)), ('encodings.cp863', (0, 222284L, 3705)), ('encodings.cp864', (0, 225989L, 3826)), ('encodings.cp865', (0, 229815L, 3704)), ('encodings.cp866', (0, 233519L, 3842)), ('encodings.cp869', (0, 237361L, 3740)), ('encodings.cp874', (0, 241101L, 1272)), ('encodings.cp875', (0, 242373L, 1250)), ('encodings.cp932', (0, 243623L, 601)), ('encodings.cp949', (0, 244224L, 601)), ('encodings.cp950', (0, 244825L, 602)), ('encodings.euc_jis_2004', (0, 245427L, 610)), ('encodings.euc_jisx0213', (0, 246037L, 610)), ('encodings.euc_jp', (0, 246647L, 600)), ('encodings.euc_kr', (0, 247247L, 600)), ('encodings.gb18030', (0, 247847L, 605)), ('encodings.gb2312', (0, 248452L, 604)), ('encodings.gbk', (0, 249056L, 597)), ('encodings.hex_codec', (0, 249653L, 1248)), ('encodings.hp_roman8', (0, 250901L, 1715)), ('encodings.hz', (0, 252616L, 600)), ('encodings.idna', (0, 253216L, 2443)), ('encodings.iso2022_jp', (0, 255659L, 607)), ('encodings.iso2022_jp_1', (0, 256266L, 609)), ('encodings.iso2022_jp_2', (0, 256875L, 609)), ('encodings.iso2022_jp_2004', (0, 257484L, 614)), ('encodings.iso2022_jp_3', (0, 258098L, 609)), ('encodings.iso2022_jp_ext', (0, 258707L, 610)), ('encodings.iso2022_kr', (0, 259317L, 607)), ('encodings.iso8859_1', (0, 259924L, 1226)), ('encodings.iso8859_10', (0, 261150L, 1255)), ('encodings.iso8859_11', (0, 262405L, 1291)), ('encodings.iso8859_13', (0, 263696L, 1259)), ('encodings.iso8859_14', (0, 264955L, 1269)), ('encodings.iso8859_15', (0, 266224L, 1242)), ('encodings.iso8859_16', (0, 267466L, 1261)), ('encodings.iso8859_2', (0, 268727L, 1252)), ('encodings.iso8859_3', (0, 269979L, 1248)), ('encodings.iso8859_4', (0, 271227L, 1251)), ('encodings.iso8859_5', (0, 272478L, 1240)), ('encodings.iso8859_6', (0, 273718L, 1181)), ('encodings.iso8859_7', (0, 274899L, 1255)), ('encodings.iso8859_8', (0, 276154L, 1187)), ('encodings.iso8859_9', (0, 277341L, 1232)), ('encodings.johab', (0, 278573L, 599)), ('encodings.koi8_r', (0, 279172L, 1301)), ('encodings.koi8_u', (0, 280473L, 1280)), ('encodings.latin_1', (0, 281753L, 797)), ('encodings.mac_arabic', (0, 282550L, 3617)), ('encodings.mac_centeuro', (0, 286167L, 1289)), ('encodings.mac_croatian', (0, 287456L, 1303)), ('encodings.mac_cyrillic', (0, 288759L, 1287)), ('encodings.mac_farsi', (0, 290046L, 1233)), ('encodings.mac_greek', (0, 291279L, 1279)), ('encodings.mac_iceland', (0, 292558L, 1296)), ('encodings.mac_latin2', (0, 293854L, 2087)), ('encodings.mac_roman', (0, 295941L, 1297)), ('encodings.mac_romanian', (0, 297238L, 1304)), ('encodings.mac_turkish', (0, 298542L, 1299)), ('encodings.mbcs', (0, 299841L, 820)), ('encodings.palmos', (0, 300661L, 1172)), ('encodings.ptcp154', (0, 301833L, 2049)), ('encodings.punycode', (0, 303882L, 3193)), ('encodings.quopri_codec', (0, 307075L, 1221)), ('encodings.raw_unicode_escape', (0, 308296L, 758)), ('encodings.rot_13', (0, 309054L, 1408)), ('encodings.shift_jis', (0, 310462L, 604)), ('encodings.shift_jis_2004', (0, 311066L, 612)), ('encodings.shift_jisx0213', (0, 311678L, 613)), ('encodings.string_escape', (0, 312291L, 700)), ('encodings.tis_620', (0, 312991L, 1276)), ('encodings.undefined', (0, 314267L, 872)), ('encodings.unicode_escape', (0, 315139L, 749)), ('encodings.unicode_internal', (0, 315888L, 751)), ('encodings.utf_16', (0, 316639L, 1703)), ('encodings.utf_16_be', (0, 318342L, 761)), ('encodings.utf_16_le', (0, 319103L, 761)), ('encodings.utf_32', (0, 319864L, 1828)), ('encodings.utf_32_be', (0, 321692L, 675)), ('encodings.utf_32_le', (0, 322367L, 674)), ('encodings.utf_7', (0, 323041L, 702)), ('encodings.utf_8', (0, 323743L, 752)), ('encodings.utf_8_sig', (0, 324495L, 1605)), ('encodings.uu_codec', (0, 326100L, 1906)), ('encodings.zlib_codec', (0, 328006L, 1409)), ('fnmatch', (0, 329415L, 1706)), ('functools', (0, 331121L, 1799)), ('genericpath', (0, 332920L, 1405)), ('getopt', (0, 334325L, 2989)), ('gettext', (0, 337314L, 6903)), ('hashlib', (0, 344217L, 3180)), ('heapq', (0, 347397L, 5906)), ('inspect', (0, 353303L, 14267)), ('io', (0, 367570L, 1694)), ('keyword', (0, 369264L, 1181)), ('linecache', (0, 370445L, 1679)), ('locale', (0, 372124L, 20586)), ('logging', (1, 392710L, 18649)), ('ntpath', (0, 411359L, 5695)), ('opcode', (0, 417054L, 2459)), ('optparse', (0, 419513L, 18405)), ('os', (0, 437918L, 8300)), ('os2emxpath', (0, 446218L, 2103)), ('pdb', (0, 448321L, 15685)), ('pickle', (0, 464006L, 13496)), ('posixpath', (0, 477502L, 4986)), ('pprint', (0, 482488L, 4296)), ('quopri', (0, 486784L, 3024)), ('random', (0, 489808L, 10202)), ('re', (0, 500010L, 5153)), ('repr', (0, 505163L, 2004)), ('shlex', (0, 507167L, 3285)), ('sre', (0, 510452L, 322)), ('sre_compile', (0, 510774L, 5482)), ('sre_constants', (0, 516256L, 2699)), ('sre_parse', (0, 518955L, 8274)), ('stat', (0, 527229L, 950)), ('string', (0, 528179L, 6800)), ('stringprep', (0, 534979L, 5892)), ('subprocess', (0, 540871L, 12988)), ('tempfile', (0, 553859L, 7428)), ('textwrap', (0, 561287L, 4924)), ('threading', (0, 566211L, 13594)), ('token', (0, 579805L, 1857)), ('tokenize', (0, 581662L, 6392)), ('traceback', (0, 588054L, 4583)), ('types', (0, 592637L, 1232)), ('unittest', (1, 593869L, 1656)), ('unittest.case', (0, 595525L, 13296)), ('unittest.loader', (0, 608821L, 4755)), ('unittest.main', (0, 613576L, 3404)), ('unittest.result', (0, 616980L, 2951)), ('unittest.runner', (0, 619931L, 2674)), ('unittest.signals', (0, 622605L, 1152)), ('unittest.suite', (0, 623757L, 3802)), ('unittest.util', (0, 627559L, 2110)), ('warnings', (0, 629669L, 5799)), ('weakref', (0, 635468L, 4607))]
其中每一个键即为每个pyc文件的名字,对应的值第二是相对位置,对三个表示长度。按图索骥即可得到对应数据,得到的数据通向需要进行解压操作(zlib.decompress(data))。
对数据添加刚才取到的pycHeader, 03 F3 0D 0A 和 00 00 00 00
(如果python版本大于33需要额外加‘00 00 00 00‘ 即:03 F3 0D 0A 00 00 00 00 00 00 00 00)
即:03 F3 0D 0A 00 00 00 00 保存为pyc文件即可。
到此为止所有文件拆解结束
附录pyinstxtractor.py主要代码
from __future__ import print_function import os import struct import marshal import zlib import sys import imp import types from uuid import uuid4 as uniquename class CTOCEntry: def __init__(self, position, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name): self.position = position self.cmprsdDataSize = cmprsdDataSize self.uncmprsdDataSize = uncmprsdDataSize self.cmprsFlag = cmprsFlag self.typeCmprsData = typeCmprsData self.name = name class PyInstArchive: PYINST20_COOKIE_SIZE = 24 # For pyinstaller 2.0 PYINST21_COOKIE_SIZE = 24 + 64 # For pyinstaller 2.1+ MAGIC = b'MEI\014\013\012\013\016' # Magic number which identifies pyinstaller print (type(MAGIC)) def __init__(self, path): self.filePath = path print ('path:',self.filePath) def open(self): try: self.fPtr = open(self.filePath, 'rb') self.fileSize = os.stat(self.filePath).st_size print ('fileSize',self.fileSize) except: print('[*] Error: Could not open {0}'.format(self.filePath)) return False return True def close(self): try: self.fPtr.close() except: pass def checkFile(self): print('[*] Processing {0}'.format(self.filePath)) # Check if it is a 2.0 archive self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET) magicFromFile = self.fPtr.read(len(self.MAGIC)) print ('magicFromFile:',type(magicFromFile)) if magicFromFile == self.MAGIC: self.pyinstVer = 20 # pyinstaller 2.0 print('[*] Pyinstaller version: 2.0') return True # Check for pyinstaller 2.1+ before bailing out # self.fPtr.seek(self.fileSize - 16 , os.SEEK_SET) # print ('readTest:',self.fPtr.read(8)) # for x in self.fPtr.read(4): # print (type(x)) # print (ord(x)) # print (hex(ord(x))) self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET) magicFromFile = self.fPtr.read(len(self.MAGIC)) # print ('size and location',self.fileSize,self.PYINST21_COOKIE_SIZE) # print ('len(self.MAGIC)',len(self.MAGIC),self.MAGIC) # print ('magicFromFile:',type(magicFromFile)) if magicFromFile == self.MAGIC: print('[*] Pyinstaller version: 2.1+') self.pyinstVer = 21 # pyinstaller 2.1+ return True print('[*] Error : Unsupported pyinstaller version or not a pyinstaller archive') return False def getCArchiveInfo(self): try: if self.pyinstVer == 20: self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET) # Read CArchive cookie (magic, lengthofPackage, toc, tocLen, self.pyver) = \ struct.unpack('!8siiii', self.fPtr.read(self.PYINST20_COOKIE_SIZE)) elif self.pyinstVer == 21: self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET) # Read CArchive cookie (magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) = \ struct.unpack('!8siiii64s', self.fPtr.read(self.PYINST21_COOKIE_SIZE)) print ('magic, lengthofPackage, toc, tocLen, self.pyver, pylibname',magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) except: print('[*] Error : The file is not a pyinstaller archive') return False print('[*] Python version: {0}'.format(self.pyver)) # Overlay is the data appended at the end of the PE self.overlaySize = lengthofPackage #3232982 self.overlayPos = self.fileSize - self.overlaySize #3477206-3232982=244224 self.tableOfContentsPos = self.overlayPos + toc #244224+3232126=3476350 self.tableOfContentsSize = tocLen #768 print ('tableOfContentsSize:',self.tableOfContentsSize) print('[*] Length of package: {0} bytes'.format(self.overlaySize)) return True def parseTOC(self): # Go to the table of contents self.fPtr.seek(self.tableOfContentsPos, os.SEEK_SET) #3476350 #856 #35 0b80h self.tocList = [] parsedLen = 0 # Parse table of contents while parsedLen < self.tableOfContentsSize: ##768 (entrySize, ) = struct.unpack('!i', self.fPtr.read(4)) print ('entrySize',entrySize) nameLen = struct.calcsize('!iiiiBc') #18 print ('nameLen',nameLen) (entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \ struct.unpack( \ '!iiiBc{0}s'.format(entrySize - nameLen), \ self.fPtr.read(entrySize - 4)) # print ('entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name',entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) name = name.decode('utf-8').rstrip('\0') if len(name) == 0: name = str(uniquename()) print('[!] Warning: Found an unamed file in CArchive. Using random name {0}'.format(name)) #self.overlayPos=244224+ self.tocList.append( \ CTOCEntry( \ self.overlayPos + entryPos, \ cmprsdDataSize, \ uncmprsdDataSize, \ cmprsFlag, \ typeCmprsData, \ name \ )) parsedLen += entrySize print('[*] Found {0} files in CArchive'.format(len(self.tocList))) def extractFiles(self): print('[*] Beginning extraction...please standby') extractionDir = os.path.join(os.getcwd(), os.path.basename(self.filePath) + '_extracted') if not os.path.exists(extractionDir): os.mkdir(extractionDir) os.chdir(extractionDir) for entry in self.tocList: basePath = os.path.dirname(entry.name) if basePath != '': # Check if path exists, create if not if not os.path.exists(basePath): os.makedirs(basePath) self.fPtr.seek(entry.position, os.SEEK_SET) data = self.fPtr.read(entry.cmprsdDataSize) # print ('cmprsdData',data) if entry.cmprsFlag == 1: data = zlib.decompress(data) # print ('decompressData',data) # Malware may tamper with the uncompressed size # Comment out the assertion in such a case assert len(data) == entry.uncmprsdDataSize # Sanity Check with open(entry.name, 'wb') as f: f.write(data) f.close() if entry.typeCmprsData == b's': print('[+] Possible entry point: {0}'.format(entry.name)) elif entry.typeCmprsData == b'z' or entry.typeCmprsData == b'Z': self._extractPyz(entry.name) def _extractPyz(self, name): dirName = name + '_extracted' # Create a directory for the contents of the pyz if not os.path.exists(dirName): os.mkdir(dirName) with open(name, 'rb') as f: pyzMagic = f.read(4) #50 59 5A 00 assert pyzMagic == b'PYZ\0' # Sanity Check pycHeader = f.read(4) # Python magic value #03 F3 0D 0A # print ('imp.get_magic(),pycHeader',imp.get_magic(),pycHeader) if imp.get_magic() != pycHeader: #'\x03\xf3\r\n' print('[!] Warning: The script is running in a different python version than the one used to build the executable') print(' Run this script in Python{0} to prevent extraction errors(if any) during unmarshalling'.format(self.pyver)) (tocPosition, ) = struct.unpack('!i', f.read(4)) #640075 0x9c44b print ('tocPosition',tocPosition) f.seek(tocPosition, os.SEEK_SET) try: toc = marshal.load(f) except: print('[!] Unmarshalling FAILED. Cannot extract {0}. Extracting remaining files.'.format(name)) return print ('toc',toc) print('[*] Found {0} files in PYZ archive'.format(len(toc))) # From pyinstaller 3.1+ toc is a list of tuples if type(toc) == list: toc = dict(toc) for key in toc.keys(): (ispkg, pos, length) = toc[key] f.seek(pos, os.SEEK_SET) fileName = key try: # for Python > 3.3 some keys are bytes object some are str object fileName = key.decode('utf-8') except: pass # Make sure destination directory exists, ensuring we keep inside dirName destName = os.path.join(dirName, fileName.replace("..", "__")) destDirName = os.path.dirname(destName) if not os.path.exists(destDirName): os.makedirs(destDirName) try: data = f.read(length) data = zlib.decompress(data) except: print('[!] Error: Failed to decompress {0}, probably encrypted. Extracting as is.'.format(fileName)) open(destName + '.pyc.encrypted', 'wb').write(data) continue with open(destName + '.pyc', 'wb') as pycFile: pycFile.write(pycHeader) # Write pyc magic pycFile.write(b'\0' * 4) # Write timestamp if self.pyver >= 33: pycFile.write(b'\0' * 4) # Size parameter added in Python 3.3 pycFile.write(data) def main(): if len(sys.argv) < 2: print('[*] Usage: pyinstxtractor.py <filename>') else: arch = PyInstArchive(sys.argv[1]) if arch.open(): if arch.checkFile(): if arch.getCArchiveInfo(): arch.parseTOC() arch.extractFiles() arch.close() print('[*] Successfully extracted pyinstaller archive: {0}'.format(sys.argv[1])) print('') print('You can now use a python decompiler on the pyc files within the extracted directory') return arch.close() if __name__ == '__main__': main()

浙公网安备 33010602011771号