python利用beautifulSoup写爬虫

python BeautifulSoup模块的安装

安装包下载地址:http://www.crummy.com/software/BeautifulSoup/#Download

文档:http://www.crummy.com/software/BeautifulSoup/documentation.html 

下载后解压, 然后进入目录执行 :

python setup.py build 
python setup.py install

引入包要用

 import bs4
 from bs4 import BeautifulSoup

利用BeautifulSoup抓取网页内容

 1 # coding=utf-8
 2 from bs4 import BeautifulSoup
 3 import urllib
 4 import re
 5 
 6 url ='http://www.baidu.com/s'
 7 values ={'wd':u'渗透'}
 8 encoded_param = urllib.urlencode(values)
 9 full_url = url +'?'+ encoded_param
10 response = urllib.urlopen(full_url)
11 soup =BeautifulSoup(response)
12 alinks = soup.find_all('a', href=re.compile('^http|^/'))

 

posted @ 2013-10-07 02:18  bamb00  阅读(716)  评论(0编辑  收藏  举报