爬虫 - 随笔分类 - willowj

python 自建爬虫复用简单框架(gevent异步)

摘要：一般爬虫可以分为以下几个步骤：一、打开指定网页二、解析网页三、处理/存储数据，新增任务网页另外异步的话，需要调度器。简单爬虫的话，不需要搞复杂验证码，requests/urllib修改cookie,header就能访问的话，写一个打开，一个解析就够了，处理数据和新任务，直接写在解析类就下，阅读全文

posted @ 2017-08-18 19:03 willowj 阅读(899) 评论(0) 推荐(0)

获取安居客小区信息

摘要：# -*- coding: utf-8 -*- """ Created on Sat Jun 24 22:03:17 2017 @author: willowj """ import sys stdout, stdin, stderr = sys.stdout, sys.stdin, sys.stderr reload(sys) sys.stdout, sys.stdin, sys.std... 阅读全文

posted @ 2017-08-07 19:02 willowj 阅读(625) 评论(0) 推荐(0)

python requests 使用Session对象加速

摘要：Requests 会话对象会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie，期间使用 urllib3 的 connection pooling 功能。所以如果你向同一主机发送多个请求，底层的 TCP 连接将会被重用，从而带来显著的性能提阅读全文

posted @ 2017-08-02 18:50 willowj 阅读(1012) 评论(0) 推荐(0)

Python获取中国证券报最新资讯

摘要：1 # -*- coding: utf-8 -*- 2 import urllib 3 from bs4 import BeautifulSoup 4 from time import time 5 from time import ctime 6 def get_last_info(): 7 url='http://ggjd.cnstock.com/gglist/search/g... 阅读全文

posted @ 2017-01-03 21:16 willowj 阅读(340) 评论(0) 推荐(0)

willowj

随笔分类 - 爬虫

公告