Python 编码问题

Posted on 2012-09-29 16:11 WebClerk 阅读(520) 评论(2) 编辑收藏举报

在源代码中使用UTF-8编码

在代码前面添加注释coding:utf-8 或者 # -*- coding: utf-8 -*-，如下：

__author__ = 'WebClerk'
# coding:utf-8
# -*- coding: utf-8 –*-

处理非ASCII编码

Python的默认编码是ascii编码，所以无法处理其他编码时需要设置python的默认编码为所需要的编码，主要有以下2个方法：

1. 具体代码处理

import sys
reload(sys)
sys.setdefaultencoding('gb2312')

2. 全局设置

在Python的Lib\site-packages文件夹下新建一个sitecustomize.py文件（sitecustomize.py是一个特殊文件， Python 在启动时将尝试加载该文件，因此所有代码都将运行该文件)，即可自动设置代码。

import sys
sys.setdefaultencoding('gb2312')

3. 检查当前编码

import sys
sys.getdefaultencoding()

字符编码判断

通过chardet可以实现对字符串/文件的编码检测。

1. chardet的安装

通过easy_install工具可以实现chardet的快速安装，命令如下：easy_install.exe chardet

2. chardet的使用

chardet可以直接用detect函数来检测所给字符的编码。函数返回值为字典，有2个元数，一个是检测的可信度，另外一个就是检测到的编码。

import urllib
import chardet
rawdata = urllib.urlopen('http://www.sina.com.cn/').read()
print chardet.detect(rawdata)
#result: {'confidence': 0.99, 'encoding': 'GB2312'}

会员力量，点亮园子希望

刷新页面返回顶部

WebClerk

导航

公告