摘要: import requests,re,time,pymongofrom bs4 import BeautifulSoup as bs#计数用num = 0str_time = time.time()#连接mongodbclient = pymongo.MongoClient(host='localh 阅读全文
posted @ 2018-11-06 18:13 cwkcwk 阅读(671) 评论(0) 推荐(0) 编辑
摘要: import requestsimport refrom bs4 import BeautifulSoup as bsimport tracebackdef getHTMLtext(url,code = "utf-8 "): try: r = requests.get(url) r.raise_fo 阅读全文
posted @ 2018-10-13 21:25 cwkcwk 阅读(596) 评论(0) 推荐(0) 编辑
摘要: import requestsimport re headers = {'cookie': 'l=Aj8/z1CVFeqHt7/Nk9kSI9v3TxnJEZPG; miid=5178119511105888855; cna=cDBEEgUJsxMCARsRgoXUNkvN; x=e%3D1%26p 阅读全文
posted @ 2018-10-13 21:18 cwkcwk 阅读(1157) 评论(0) 推荐(0) 编辑
摘要: import scrapyimport refrom collections import Counterfrom lianjia.items import LianjiaItemclass LianjiaSpiderSpider(scrapy.Spider): name = 'lianjia_sp 阅读全文
posted @ 2018-09-26 23:48 cwkcwk 阅读(420) 评论(0) 推荐(0) 编辑
摘要: import scrapyimport json,time,refrom zhihuinfo.items import ZhihuinfoItemclass ZhihuSpider(scrapy.Spider): name = 'zhihu' allowed_domains = ['www.zhih 阅读全文
posted @ 2018-09-26 23:39 cwkcwk 阅读(300) 评论(0) 推荐(0) 编辑
摘要: import requests,time,re,json,pymongofrom urllib.parse import urlencodefrom requests.exceptions import RequestExceptionfrom bs4 import BeautifulSoup as 阅读全文
posted @ 2018-09-26 23:27 cwkcwk 阅读(294) 评论(0) 推荐(0) 编辑
摘要: import requestsfrom bs4 import BeautifulSoup as bsimport reimport timeimport pandas as pdheaders ={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) 阅读全文
posted @ 2018-09-03 00:21 cwkcwk 阅读(2294) 评论(0) 推荐(0) 编辑
摘要: import requests,re,timeheader ={ "Cookie":"登陆过账号后的cookie 必须填写", "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Geck 阅读全文
posted @ 2018-09-03 00:11 cwkcwk 阅读(293) 评论(0) 推荐(0) 编辑
摘要: 一些进程绝大多数时间在计算上,称为计算密集型(CPU密集型)computer-bound。 有一些进程则在input 和output上花费了大多时间,称为I/O密集型,I/O-bound。比如搜索引擎蜘蛛大多时间是在等待相应这种就属于I/O密集型。 所以说 CPU密集型的项目适合调用多进程 I/O密 阅读全文
posted @ 2018-08-23 14:45 cwkcwk 阅读(1248) 评论(0) 推荐(0) 编辑
摘要: 无意发现 阅读全文
posted @ 2018-08-23 01:50 cwkcwk 阅读(82) 评论(0) 推荐(0) 编辑