2019 年 2月随笔档案 - st--st

摘要：使用模块 fake-useragent https://github.com/hellysmile/fake-useragent 1.安装模块 2.配置阅读全文

posted @ 2019-02-27 16:47 st--st 阅读(1130) 评论(0) 推荐(0)

摘要：如何提高scrapy的爬取效率增加并发：默认scrapy开启的并发线程为32个，可以适当进行增加。在settings配置文件中修改CONCURRENT_REQUESTS = 100值为100,并发设置成了为100。降低日志级别：在运行scrapy时，会有大量日志信息的输出，为了减少CPU的使阅读全文

posted @ 2019-02-24 15:20 st--st 阅读(1014) 评论(0) 推荐(0)

【爬虫】多线程爬取糗事百科写入文件

摘要：''' 爬取糗事百科的段子，将内容和连接爬取下来，写入scv 使用技术：多线程，锁，队列，xpath，csv ''' import requests import csv from queue import Queue from lxml import etree import threading class Creeper(threading.Thread): def __ini... 阅读全文

posted @ 2019-02-21 16:46 st--st 阅读(163) 评论(0) 推荐(0)

【爬虫】多线程爬取表情包

摘要：''' 利用多线程、队列爬取表情包 URL：http://www.bbsnet.com/doutu/page/1 ''' import requests from lxml import etree import os import re from urllib import request from queue import Queue import threading class Pr... 阅读全文

posted @ 2019-02-21 09:53 st--st 阅读(179) 评论(0) 推荐(0)

【爬虫】Condition版的生产者和消费者模式

摘要：Condition版的生产者和消费者模式 threading.Condition 在没有数据的时候处于阻塞状态，有数据可以使用notify的函数通知等等待状态的线程运作 threading.Condition 实际上是继承threading.Lock acquire：上锁。 release：解锁。阅读全文

posted @ 2019-02-20 20:38 st--st 阅读(176) 评论(0) 推荐(0)

【爬虫】Load版的生产者和消费者模式

摘要：''' Lock版的生产者和消费者模式 ''' import threading import random import time gMoney = 1000 # 原始金额 gLoad = threading.Lock() gTime = 0 # 生产次数 class Producer(threading.Thread): def run(self... 阅读全文

posted @ 2019-02-20 20:06 st--st 阅读(126) 评论(0) 推荐(0)

Python小白白白白白白

02 2019 档案

公告