hello!python!

python encode decode unicode区别及用法

decode 解码

encode 转码

unicode是一种编码,具体可以百度搜

# coding: UTF-8
 
u = u''
print repr(u) # u'\u6c49'
s = u.encode('UTF-8')
print repr(s) # '\xe6\xb1\x89'
u2 = s.decode('UTF-8')
print repr(u2) # u'\u6c49'
 
# 对unicode进行解码是错误的
# s2 = u.decode('UTF-8')
# 同样,对str进行编码也是错误的
# u2 = s.encode('UTF-8')
s = u.encode('UTF-8') 是把u转码成utf-8
u2 = s.decode('UTF-8')是把u解码成utf-8
如果是windows下编码一般是gbk,所以解码时候要用 u.decode('gbk'),如下
>>> u='格式'
>>> u.decode('gbk')
u'\u683c\u5f0f'
>>> u.decode('utf-8')

Traceback (most recent call last):
  File "<pyshell#111>", line 1, in <module>
    u.decode('utf-8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 0: invalid start byte
>>> 

 

 
 
posted @ 2013-12-20 11:29  你坚持了吗  阅读(2892)  评论(0编辑  收藏  举报
hello!python!