Python编码问题收集

怎么避免UnicodeEncodeError: ‘ascii’ codec can’t…类似的错误？

1、首先在py文件头部指定文件内容编码，例如：# coding: utf8

2、文件保存的时候要和py文件头部编码一致

3、在用decode和encode的时候，一定要确认要转换的字符原编码是什么。

例如：网页中都会指定编码(<meta http-equiv=content-type content=”text/html; charset=gb2312″>), 你在抓取这个网站并获取它的html后进行编码转化就要注意了:

import urllib2

html = urllib2.urlopen(url)

html = html.decode(‘gb2312′)

只要做上面三个就不会出现转换编码错误了

python建议，在python代码中最好所有变量都是unicode; 流程可以这么写：变量(转换成unicode)——>python代码——–>变量(转换成其他编码)

sys.getdefaultencoding():系统的缺省编码(一般就是ascii),python默认语言的编码是ascii编码, 这就是为什么在py文件的头部都要指定编码了# coding:utf-8

Python获取系统编码参数的几个函数

系统的缺省编码(一般就是ascii)：sys.getdefaultencoding() 系统当前的编码：locale.getdefaultlocale() 系统代码中临时被更改的编码（通过locale.setlocale(locale.LC_ALL,“zh_CN.UTF-8″)）：locale.getlocale() 文件系统的编码：sys.getfilesystemencoding() 终端的输入编码：sys.stdin.encoding 终端的输出编码：sys.stdout.encoding 代码的缺省编码：文件头上# -*- coding: utf-8 –*-

字符串

python有两种字符串

byteString = "hello world! (in my default locale)"
unicodeString = u"hello Unicode world!"

相互转换

s = "hello normal string"
u = unicode( s, "utf-8" )
backToBytes = u.encode( "utf-8" )
backToUtf8 = backToBytes.decode(‘utf-8’) #与第二行效果相同

如何判断

if isinstance( s, str ): # 对Unicode strings，这个判断结果为False
if isinstance( s, unicode): # 对Unicode strings，这个判断结果为True
if isinstance( s, basestring ): # 对两种字符串，返回都为True

posted @ 2015-04-02 14:01 jeffkuang 阅读(117) 评论(0) 收藏举报

刷新页面返回顶部

Python编码问题收集

字符串

python有两种字符串

做个试验

UTF-8编码格式

保存utf-8格式的文件

自己写BOM头

自己去掉BOM头

源码文件的编码

公告