UnicodeEncodeError: 'gbk' codec can't encode character '\u01f3' in position 79: illegal multibyte sequence

//2019-09-04
最近在做python自动运维脚本时,执行到代码块中logging.info(buff)发生了这样的一个问题:
UnicodeEncodeError: 'gbk' codec can't encode character '\u01f3' in position 79: illegal multibyte sequence
 
--代码块
def ends_with(end_txt):
    buff = ''
    while not buff.endswith(end_txt):
        resp = input_shell.recv(9999)
        buff += resp.decode('utf8', 'ignore')
        time.sleep(.5)
    logging.info(buff)
    print('获取到的提示符:%s' % buff)
    return buff
 
--报错
[oracle@localhost ~]$ --- Logging error ---
Traceback (most recent call last):
  File "E:\Program Files\Python37\lib\logging\__init__.py", line 1037, in emit
    stream.write(msg + self.terminator)
UnicodeEncodeError: 'gbk' codec can't encode character '\u01f3' in position 79: illegal multibyte sequence
Call stack:
  File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1664, in <module>
    main()
  File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "E:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/test_py/shareplex_new.py", line 941, in <module>
    t.ssh_inspection_all()
  File "D:/test_py/shareplex_new.py", line 888, in ssh_inspection_all
    do_ssh_oracle_inspection(shaplx_ip, self.input_shell, self.ftm_ends_with, self.close)
  File "D:/test_py/shareplex_new.py", line 655, in do_ssh_oracle_inspection
    ftm_ends_with(['$ ', '>', '$'])
  File "D:/test_py/shareplex_new.py", line 899, in ftm_ends_with
    output = ends_with(end_txt, self.input_shell)
  File "D:/test_py/shareplex_new.py", line 137, in ends_with
    logging.info(buff)
Message: 'exit\r\ndz\r\nConnection to 10.1.95.120 closed.\r\r\n[ybpic@localhost ~]$ '
Arguments: ()

 

在网上查询,发现很多爬虫程序会发生这个问题,参考其中原因,发现了问题所在:

 
--查询linux操作系统的字符集发现报错的服务器编码为 gb2312
[oracle@localhost ~]$ echo $LANG
zh_CN.UTF-8
 
[oracle@localhost ~]$ echo $LANG
zh_CN.gb2312
 
在编码为gb2312的服务器需要使用gb2312(18030更全)来decode
 
--于是修改代码,添加code_flag参数,区别不同编码的服务器,使用不同的decode命令
# 判断命令是否完成,并输出执行记录
# 判断命令是否完成,并输出执行记录
def ends_with(end_txt, code_flag):
    buff = ''

    # 获取提示信息
    while not buff.endswith(end_txt):
        resp = input_shell.recv(9999)
        
        # code_flag - 服务器打印编码标识(0 - gb18030, 1 - utf8)
        if code_flag == '1':
            buff += resp.decode('utf8', 'ignore')
        elif code_flag == '0':
            buff += resp.decode('gb18030', 'ignore')
        time.sleep(.5)

    logging.info(buff)
    print(buff, end='')
    return buff
 
对不同编码的服务器使用不同的参数:
ends_with(self, end_txt, code_flag='1')
问题就得到解决了!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

posted on 2019-09-05 00:07  爱love9  阅读(198)  评论(0)    收藏  举报