山…隹

2019年1月21日

摘要：不是python文件中的，而是在虚拟机中运行的命令行，先要workon进入虚拟环境 2.scrapy 框架的使用 -1.新建项目命令：scrapy startproject <project_name> [project_dir] 注意：cd到想要创建项目的目录下 -2.编写爬虫 -手动编写 -1 阅读全文

posted @ 2019-01-21 12:07 山…隹阅读(176) 评论(0) 推荐(0)

scrapy 手动编写模板

摘要： import scrapy class Tzspider(scrapy.Spider): # spider的名字，唯一 name = 'tz' # 初始url列表 start_urls = ['https://www.shiguangkey.com/course/list'] # 每个url爬取之后会调用这个方法 def parse(self,... 阅读全文

posted @ 2019-01-21 12:05 山…隹阅读(298) 评论(0) 推荐(0)

scrapy安装

摘要： 1.scrapy的安装 -前提，最好用virtualenv 创建的虚拟环境安装 -windows -官方推荐用anaconda -自定已安装 -1.https://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载对应的twisted ，注意python的版本和32/64 阅读全文

posted @ 2019-01-21 11:28 山…隹阅读(105) 评论(0) 推荐(0)

2019年1月19日

爬虫之xpath用法

摘要：导包用： from lxml import etree 阅读全文

posted @ 2019-01-19 18:17 山…隹阅读(136) 评论(0) 推荐(0)

2019年1月18日

爬虫之 beautifusoup4

摘要： 1. 使用方法 2.解析器 3. 详细用法 4. find_all方法 5. 遍历文档树阅读全文

posted @ 2019-01-18 22:57 山…隹阅读(157) 评论(0) 推荐(0)

2019年1月17日

抓包工具fiddler

摘要： 1. 查找域名用find 2. 筛选用filters 3. 命令行查询 select text 查询text 格式的 ?域名查询域名 =状态码查询状态码是...的 4. 设置全局断点方式 (After || Before) Rules --Automatic Breakpoints 5. 命令行阅读全文

posted @ 2019-01-17 15:37 山…隹阅读(139) 评论(0) 推荐(0)

2019年1月15日

12306登录爬虫 session版本

摘要： import requests import re import base64 # 定义session headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3664.3 Safari/537.36' } ... 阅读全文

posted @ 2019-01-15 18:46 山…隹阅读(614) 评论(0) 推荐(0)

12306登录爬虫 cookies版本

摘要： import requests import re import base64 cookies = None # 进入主页，保留cookies login_url = 'https://kyfw.12306.cn/otn/resources/login.html' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; ... 阅读全文

posted @ 2019-01-15 18:40 山…隹阅读(1238) 评论(0) 推荐(0)

2019年1月14日

爬虫3 requests之json 把json数据转化为字典

摘要： #json 将json数据转化为字典，方便操作数据 res = requests.get('http://httpbin.org/get') print(res.json()) #res.json()返回的是字典 print(type(res.json())) 阅读全文

posted @ 2019-01-14 19:36 山…隹阅读(1232) 评论(0) 推荐(0)

爬虫3 requests基础之下载图片用content(二进制内容)

摘要： res = requests.get('http://soso3.gtimg.cn/sosopic/0/11129365531347748413/640') # print(res.content) with open('img/test.jpg','wb') as f: f.write(res.content) 阅读全文

posted @ 2019-01-14 19:29 山…隹阅读(572) 评论(0) 推荐(0)

公告