不知火语

2020年3月10日

摘要： import urllib.request import urllib.error # 进行异常处理时常用try...except语句 # 在try中执行主代码，except中捕获异常信息 try: urllib.request.urlopen("http://blog.csdn.net") exc 阅读全文

posted @ 2020-03-10 16:18 不知火语阅读(160) 评论(0) 推荐(0)

python爬虫006-使用debuglog边运行边打印调试日志

摘要： # 希望在运行时，边运行边打印调试日志，此时需要开启DebugLog import urllib.request # （1）使用HTTPHander和HTTPSHander将debuglevel的值设置为1 httphd = urllib.request.HTTPHandler(debuglevel 阅读全文

posted @ 2020-03-10 15:46 不知火语阅读(299) 评论(0) 推荐(0)

2020年3月6日

python爬虫005-代理服务器的设置

摘要： import urllib.request def use_proxy(proxy_addr,url): """ 使用代理服务器爬取某个url :param proxy_addr: 代理服务器地址 :param url: 要爬取的网页地址 :return: 网页的全部内容 """ # 设置对应的代理阅读全文

posted @ 2020-03-06 19:29 不知火语阅读(290) 评论(0) 推荐(0)

python爬虫004-http协议请求

摘要： http协议请求分为六种类型 1.get请求：通过url网址传递信息，可以直接在url中写上要传递的信息，也可以由表单进行传递 2.post请求：可以向服务器提交数据 3.put请求：请求服务器存储一个资源，通常要指定存储位置 4.delete请求：请求服务器删除一个资源 5.head请求：请求获得阅读全文

posted @ 2020-03-06 15:59 不知火语阅读(261) 评论(0) 推荐(0)

python爬虫003-超时请求

摘要： import urllib.request for i in range(1,10): try: # 用timeout设置超时限定为1秒 file = urllib.request.urlopen("http://www.baidu.com",timeout=1) data = file.read( 阅读全文

posted @ 2020-03-06 14:58 不知火语阅读(200) 评论(0) 推荐(0)

2020年3月3日

python爬虫002-使用headers属性模拟成浏览器

摘要：先获取浏览器的User-Agent信息任意打开一个网页，如百度的首页，按F12，切换到Network标签单击网页中的百度一下，让网页发生一个动作点击图中的 www.baidu.com 打开headers标签滚动进度条，找到User-Agent，可以将其复制出来 1 import urlli 阅读全文

posted @ 2020-03-03 21:01 不知火语阅读(835) 评论(0) 推荐(0)

python爬虫001-使用urllib爬取网页

摘要： 1 import urllib.request # 导入模块 2 import urllib.parse 3 4 # 将网页赋给变量file 5 file = urllib.request.urlopen("http://www.baidu.com") 6 7 # 读取网页 8 data = fil 阅读全文

posted @ 2020-03-03 20:01 不知火语阅读(606) 评论(0) 推荐(0)

公告