python 学习笔记

urllib2

urllib2提供一个基础函数urlopen，通过向指定的URL发出请求来获取数据。最简单的形式就是

import urllib2
response=urllib2.urlopen('http://www.douban.com')
html=response.read() 注：read()函数与readlines()区别，后者会把结果封装成list形式。如readlines()结果为：['{"code":0,"data":"http://xxxxx/fileserver?method=getfile&uuid=F7847BBC260B4A23B5787D9E758CC50F","success":true}']，read()结果为：{"code":0,"data":"http://xxxxx/fileserver?method=getfile&uuid=F7847BBC260B4A23B5787D9E758CC50F","success":true}
也许你会注意到，我们平时除了刷网页的操作，还有向网页提交数据。这种提交数据的行为，urllib2会把它翻译为:
1. import urllib
2. import urllib2
3. url = 'http://www.douban.com'
4. info = {'name' : 'Michael Foord',
5. 'location' : 'Northampton'}
6. data = urllib.urlencode(info) #info 需要被编码为urllib2能理解的格式，这里用到的是urllib
7. req = urllib2.Request(url, data)
8. response = urllib2.urlopen(req)
9. the_page = response.read()
有时你会碰到，程序也对，但是服务器拒绝你的访问。这是为什么呢?问题出在请求中的头信息(header)。
有的服务端有洁癖，不喜欢程序来触摸它。这个时候你需要将你的程序伪装成浏览器来发出请求。请求的方式就包含在header中。
常见的情形:
1. 1. import urllib
  2. import urllib2
  3. url = 'http://www.someserver.com/cgi-bin/register.cgi'
  4. user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'# 将user_agent写入头信息
  5. values = {'name' : 'Michael Foord',
  6. 'location' : 'Northampton',
  7. 'language' : 'Python' }
  8. headers = { 'User-Agent' : user_agent }
  9. data = urllib.urlencode(values)
  10. req = urllib2.Request(url, data, headers)
  11. response = urllib2.urlopen(req)
  12. the_page = response.read()

posted @ 2015-03-27 22:04 ilsas 阅读(109) 评论(0) 收藏举报

刷新页面返回顶部

寒空孤鹰

python 学习笔记

公告