KD_131 - 博客园

2019年7月3日

cur.execute(sql,args)和cur.execute(sql)的区别

摘要：轉：https://blog.csdn.net/mjjyszazc/article/details/88932664 方式一： userid = “123”sql = “select id,name from user where id = ‘%s’” % useridcur.execute(sql 阅读全文

posted @ 2019-07-03 10:04 KD_131 阅读(1536) 评论(0) 推荐(0)

2019年7月2日

中国天气网数据获取

摘要： # 中国天气网 # 练习使用 BeautifulSoup 解析 # 数据可视化 import requests from bs4 import BeautifulSoup import html5lib from pyecharts import Bar ALL_DATA = [] def parse_page(url): headers = { "User-... 阅读全文

posted @ 2019-07-02 23:11 KD_131 阅读(2090) 评论(0) 推荐(0)

雪球数据的定时爬取

摘要：优化成redis增量式获取数据阅读全文

posted @ 2019-07-02 23:09 KD_131 阅读(516) 评论(0) 推荐(0)

爬虫下载中间件

摘要： # 设置随机请求头设置代理ip # 在middleware.py文件中写一个类 class MiddlewearproDownloaderMiddleware(object): user_agent_list = [ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 " "(KHTML,... 阅读全文

posted @ 2019-07-02 23:03 KD_131 阅读(287) 评论(0) 推荐(0)

微信小程序社区爬取（Scrapy框架）

摘要：阅读全文

posted @ 2019-07-02 23:02 KD_131 阅读(774) 评论(0) 推荐(0)

简书全站CrawlSpider爬取 mysql异步保存

摘要： # 简书网 # 数据保存在mysql中; 将selenium+chromedriver集成到scrapy; 整个网站数据爬取 # 抓取ajax数据 #爬虫文件 # -*- coding: utf-8 -*- import scrapy from scrapy.linkextractors impor 阅读全文

posted @ 2019-07-02 23:01 KD_131 阅读(421) 评论(0) 推荐(0)

房天下新房和二手房

摘要： # 爬虫文件 # -*- coding: utf-8 -*- import scrapy import re from soufangwang.items import NewHouseItem,SecondhandHouseItem class FangspiderSpider(scrapy.Spider): name = 'fangSpider' allowed_doma... 阅读全文

posted @ 2019-07-02 22:59 KD_131 阅读(391) 评论(0) 推荐(0)

多线程

摘要： # 图片下载耗时用多线程 # threading模块 import threading import time def coding(): for i in range(3): print("正在写代码%s"%i) time.sleep(1) def drawing(): for i in range(3): print(... 阅读全文

posted @ 2019-07-02 22:57 KD_131 阅读(207) 评论(0) 推荐(0)

selenium+chromdriver 动态网页的爬虫

摘要： # 获取加载更多的数据有 2 种方法# 第一种就是直接找数据接口, 点击'加载更多' 在Network看下, 直接找到数据接口 # 第二种方法就是使用selenium+chromdriver 阅读全文

posted @ 2019-07-02 22:53 KD_131 阅读(631) 评论(0) 推荐(0)

汽车之家下载文件和图片

摘要： # scrapy框架里下载问价和图片 # 判断文件夹和路径是否存在 # 爬虫文件 import scrapy from bmw.items import BmwItem class Bme5Spider(scrapy.Spider): name = 'bme5' allowed_domains = ['car.autohome.com.cn'] start_urls ... 阅读全文

posted @ 2019-07-02 22:49 KD_131 阅读(922) 评论(0) 推荐(0)