Python 编码

Unicode 与 UTF-8

　　Unicode（统一码、万国码）是计算机科学领域里的一项业界标准,包括字符集、编码方案等.

　　　　存储方式：通通都是2个字节来存储

　　UTF-8（8-bit Unicode Transformation Format）是一种针对Unicode的可变长度字符编码.

　　　　存储方式：使用可长1-5个字节来存储，英文1个字节，中文3个字节。

　　总结：unicode是字符集，utf8是unicode的一种编码方式

Python3 与 Python2

　　Python3

　　　　python3 有两种表示字符序列的类型： bytes(8位二进制) 、str（Unicode字节）

　　　　python3 中的解释器编码默认为：unicode

　　　　python3 编写的程序默认编码为：utf-8

　　　　在内存中统一使用Unicode存储，当需要保存到硬盘或进行传输时，才会转换为utf-8.

　　　　decode(解码),encode(编码)

　　Python2

　　　　python2 有两种表示字符序列的类型： str(8位二进制)、unicode(Unicode字节)

编写Python3 helper函数

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value


def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

编写Python2 helper函数

def to_str(unicode_or_str):
    if isinstance(unicode_or_str, unicode):
        value = unicode_or_str.encode('utf-8')
    else:
        value = unicode_or_str
    return value


def to_unicode(unicode_or_str):
    if isinstance(unicode_or_str, str):
        value = unicode_or_str.decode('utf-8')
    else:
        value = unicode_or_str
    return value

posted @ 2017-01-21 11:21 Vincen_shen 阅读(273) 评论(0) 收藏举报

刷新页面返回顶部