爬虫基础_张三

1.爬虫入门程序

import cookielib

import urllib2

url = "http://www.baidu.com"

response1 = urllib2.urlopen(url)

 

2.爬虫程序添加data、header,然后post请求

import urllib

import urllib2

url = 'http://www.server.com/login'

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

values = {'username' : 'cqc', 'password' : 'XXXX' }

headers = { 'User-Agent' : user_agent }

data = urllib.urlencode(values)

request = urllib2.Request(url, data, headers)

response = urllib2.urlopen(request)

page = response.read()

 

3.爬虫程序添加cookie

import cookielib

import urllib2

#设置保存cookie的文件,同级目录下的cookie.txt

filename = 'cookie.txt'

#声明一个MozillaCookieJar对象实例来保存cookie,之后写入文件

cookie = cookielib.MozillaCookieJar(filename)

#利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器

handler = urllib2.HTTPCookieProcessor(cookie)

#通过handler来构建opener

opener = urllib2.build_opener(handler)

 

4.正则表达式

import re

# 将正则表达式编译成Pattern对象

pattern = re.compile(r'xxxxx')

paxg=re.match(patter,"xxxx")

print(paxg)

posted @ 2022-03-13 16:06  三重丶刘德华  阅读(47)  评论(0)    收藏  举报