山…隹

2019年3月14日

摘要：进入hbase shell命令行 bin/hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.92.1, r12989 阅读全文

posted @ 2019-03-14 19:30 山…隹阅读(424) 评论(0) 推荐(0)

2019年3月7日

scrapy Formrequest用法（豆瓣登录案例）

摘要： # -*- coding: utf-8 -*-import scrapyfrom scrapy.http import Request,FormRequestclass DbSpider(scrapy.Spider): name = 'db' allowed_domains = ['douban.com'] start_urls = ['https://accounts.d... 阅读全文

posted @ 2019-03-07 22:04 山…隹阅读(375) 评论(0) 推荐(0)

scrapy 日志一般配置

摘要：阅读全文

posted @ 2019-03-07 21:47 山…隹阅读(105) 评论(0) 推荐(0)

2019年2月8日

scrapy meta不用pipe用命令-o

摘要： 1. spider代码: 2. items代码: 3. 命令,(job.jl 是文件名字) 阅读全文

posted @ 2019-02-08 21:18 山…隹阅读(220) 评论(0) 推荐(0)

2019年1月28日

scrapy之Crawspider 腾讯招聘实战案例

摘要： 1. 在虚拟机中cd到项目目录,再运行下面代码创建spider文件： scrapy genspider -t crawl test www.baidu.com 2. spider.py代码 3. items代码： 4. pipelines代码：阅读全文

posted @ 2019-01-28 16:52 山…隹阅读(184) 评论(0) 推荐(0)

2019年1月27日

scrapy选择器归纳

摘要： python 爬虫： srcrapy框架xpath和css选择器语法 Xpath基本语法一、常用的路径表达式：举例元素标签为artical标签二、谓语谓语被嵌在方括号内，用来查找某个特定的节点或包含某个制定的值的节点三、通配符 Xpath通过通配符来选取未知的XML元素表达式| 结果// 阅读全文

posted @ 2019-01-27 19:50 山…隹阅读(208) 评论(0) 推荐(0)

2019年1月26日

scrapy response.xpath可以提取包含字符XX的标签

摘要： 1. 筛选属性包含某字符串的标签（如id = 'bigbaong' 查询包含'big'字符的就可以筛选到）阅读全文

posted @ 2019-01-26 19:06 山…隹阅读(2278) 评论(0) 推荐(0)

爬虫之案列1补充（pipelines优化）

摘要： 1. 先打开settings.py文件将 'ITEM_PIPELINES'启动（取消注释即可） 2. spider代码 3. pipelines.py代码 4. 补充2，防止item不规范，可以使用items.py文件对其限制（还要改spider中的item代码）（还要修改pipelines中的代码阅读全文

posted @ 2019-01-26 14:46 山…隹阅读(254) 评论(0) 推荐(0)

2019年1月21日

scrapy 第一个案例（爬取腾讯招聘职位信息）

摘要： import scrapy import json class TzcSpider(scrapy.Spider): # spider的名字，唯一 name = 'tzc' # 起始地址 start_urls = ['https://hr.tencent.com/position.php?keywords=python&tid=0&lid=2268'] ... 阅读全文

posted @ 2019-01-21 16:56 山…隹阅读(176) 评论(0) 推荐(0)

scrapy 代码调试用 shell

摘要：在虚拟机里CD到你的scrapy某个项目的目录,再 1. scrapy shell + '网址'（注意引号） 2. response.xpath(' ')来提取如: response.xpath('//table[@class="tablelist"]/tr[2]/td/a/text()').ex 阅读全文

posted @ 2019-01-21 15:23 山…隹阅读(210) 评论(0) 推荐(0)

公告