UnicodeEncodeError: 'gbk' codec can't encode character '\u01f3' in position 79: illegal multibyte sequence
//2019-09-04
最近在做python自动运维脚本时,执行到代码块中logging.info(buff)发生了这样的一个问题:
UnicodeEncodeError: 'gbk' codec can't encode character '\u01f3' in position 79: illegal multibyte sequence
--代码块
def ends_with(end_txt): buff = '' while not buff.endswith(end_txt): resp = input_shell.recv(9999) buff += resp.decode('utf8', 'ignore') time.sleep(.5) logging.info(buff) print('获取到的提示符:%s' % buff) return buff
--报错
[oracle@localhost ~]$ --- Logging error --- Traceback (most recent call last): File "E:\Program Files\Python37\lib\logging\__init__.py", line 1037, in emit stream.write(msg + self.terminator) UnicodeEncodeError: 'gbk' codec can't encode character '\u01f3' in position 79: illegal multibyte sequence Call stack: File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1664, in <module> main() File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1658, in main globals = debugger.run(setup['file'], None, None, is_module) File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1068, in run pydev_imports.execfile(file, globals, locals) # execute the script File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "D:/test_py/shareplex_new.py", line 941, in <module> t.ssh_inspection_all() File "D:/test_py/shareplex_new.py", line 888, in ssh_inspection_all do_ssh_oracle_inspection(shaplx_ip, self.input_shell, self.ftm_ends_with, self.close) File "D:/test_py/shareplex_new.py", line 655, in do_ssh_oracle_inspection ftm_ends_with(['$ ', '>', '$']) File "D:/test_py/shareplex_new.py", line 899, in ftm_ends_with output = ends_with(end_txt, self.input_shell) File "D:/test_py/shareplex_new.py", line 137, in ends_with logging.info(buff) Message: 'exit\r\ndz\r\nConnection to 10.1.95.120 closed.\r\r\n[ybpic@localhost ~]$ ' Arguments: ()
在网上查询,发现很多爬虫程序会发生这个问题,参考其中原因,发现了问题所在:
--查询linux操作系统的字符集发现报错的服务器编码为 gb2312:
[oracle@localhost ~]$ echo $LANG
zh_CN.UTF-8
[oracle@localhost ~]$ echo $LANG
zh_CN.gb2312
在编码为gb2312的服务器需要使用gb2312(18030更全)来decode
--于是修改代码,添加code_flag参数,区别不同编码的服务器,使用不同的decode命令
# 判断命令是否完成,并输出执行记录
# 判断命令是否完成,并输出执行记录 def ends_with(end_txt, code_flag): buff = '' # 获取提示信息 while not buff.endswith(end_txt): resp = input_shell.recv(9999) # code_flag - 服务器打印编码标识(0 - gb18030, 1 - utf8) if code_flag == '1': buff += resp.decode('utf8', 'ignore') elif code_flag == '0': buff += resp.decode('gb18030', 'ignore') time.sleep(.5) logging.info(buff) print(buff, end='') return buff
对不同编码的服务器使用不同的参数:
ends_with(self, end_txt, code_flag='1')
问题就得到解决了!
浙公网安备 33010602011771号