python网络爬虫-提升爬虫的速度(八)

提升爬虫的速度

从前面几篇已经可以从获取网页、解析网页、存储数据来实现一些基本的爬虫。现在记录一些进阶部分:提升爬虫速度,主要有3中方法:多线程爬虫、多进程爬虫、多协程爬虫。对比普通单线程爬虫,使用这3种方法爬虫的速度能成倍的提升。

并发和并行

并发是指在一段时间内发生的若干时间的情况
并行是值在同一时刻发生若干事件的情况

同步和异步

同步就是并发并行的各个任务不是独自运行的,任务之间有一定交替顺序,像接力赛一样。
异步就是并发和并行的各个任务独立运行互不干扰。每个任务都不在同一个赛道上面跑步的速度不受其他选手影响

多线程爬虫

多线程爬虫是以并发的方式执行的。也就是说,多线程并不能真正的同时执行,而是通过进程的快速切换加快网络爬虫的速度的。
在操作IO的时候使用多线程可以提升程序执行效率

简单单线程爬虫

点击查看代码
	http://www.baidu.com
	http://www.qq.com
	http://www.naver.com
	http://www.taobao.com
	http://www.reddit.com
	http://www.sohu.com
	http://www.tmall.com
	http://www.sina.com.cn
	http://www.daum.net
	http://www.jd.com
	http://www.360.cn
	http://www.weibo.com
	http://www.aliexpress.com
	http://www.linkedin.com
	http://www.alipay.com
	http://www.hao123.com
	http://www.csdn.net
	http://www.youth.cn
	http://www.live.com
	http://www.tianya.cn
	http://www.microsoftonline.com
	http://www.office.com
	http://www.soso.com
	http://www.so.com
	http://www.gmw.cn
	http://www.china.com
	http://www.nate.com
	http://www.huaban.com
	http://www.bing.com
	http://www.xinhuanet.com
	http://www.youku.com
	http://www.zhihu.com
	http://www.cctv.com
	http://www.airasia.com
	http://www.douyu.com
	http://www.babytree.com
	http://www.apple.com
	http://www.sogou.com
	http://www.china.com.cn
	http://www.yelp.com
	http://www.ocbc.com
	http://www.microsoft.com
	http://www.mama.cn
	http://www.bitauto.com
	http://www.bankofamerica.com
	http://www.1688.com
	http://www.stackoverflow.com
	http://www.163.com
	http://www.39.net
	http://www.cnblogs.com
	http://www.bilibili.com
	http://www.interpark.com
	http://www.huanqiu.com
	http://www.cnzz.com
	http://www.chinadaily.com.cn
	http://www.openrice.com
	http://www.msn.com
	http://www.k618.cn
	http://www.yesky.com
	http://www.caijing.com.cn
	http://www.emirates.com
	http://www.amazon.cn
	http://www.aliyun.com
	http://www.eastday.com
	http://www.youdao.com
	http://www.oeeee.com
	http://www.ci123.com
	http://www.baike.com
	http://www.adobe.com
	http://www.rednet.cn
	http://www.iqiyi.com
	http://www.wemakeprice.com
	http://www.douban.com
	http://www.familydoctor.com.cn
	http://www.agoda.com
	http://www.jrj.com.cn
	http://www.read01.com
	http://www.17ok.com
	http://www.chinaz.com
	http://www.youboy.com
	http://www.tesco.com
	http://www.alibaba.com
	http://www.gearbest.com
	http://www.51sole.com
	http://www.dbs.com
	http://www.suning.com
	http://www.oschina.net
	http://www.voc.com.cn
	http://www.zol.com.cn
	http://www.asos.com
	http://www.chinaso.com
	http://www.jianshu.com
	http://www.ifeng.com
	http://www.stockstar.com
	http://www.zhanqi.tv
	http://www.52pk.com
	http://www.whatsbuying.com
	http://www.cqnews.net
	http://www.gongchang.com
	http://www.godaddy.com
	http://www.godaddy.com
	http://www.wtoip.com
	http://www.segmentfault.com
	http://www.evernote.com
	http://www.dianping.com
	http://www.qingdaonews.com
	http://www.guancha.cn
	http://www.standardchartered.com
	http://www.singaporeair.com
	http://www.toutiao.com
	http://www.jiameng.com
	http://www.dm5.com
	http://www.w3school.com.cn
	http://www.zhaopin.com
	http://www.99.com
	http://www.mi.com
	http://www.b2b.cn
	http://www.cathaypacific.com
	http://www.southcn.com
	http://www.battle.net
	http://www.ups.com
	http://www.jb51.net
	http://www.comcast.net
	http://www.alicdn.com
	http://www.v2ex.com
	http://www.firefoxchina.cn
	http://www.360doc.com
	http://www.xunlei.com
	http://www.sharepoint.com
	http://www.scol.com.cn
	http://www.admaimai.com
	http://www.v1.cn
	http://www.51cto.com
	http://www.jqw.com
	http://www.bzw315.com
	http://www.126.com
	http://www.beanfun.com
	http://www.chooseauto.com.cn
	http://www.renren.com
	http://www.taleo.net
	http://www.51.la
	http://www.zcool.com.cn
	http://www.4399.com
	http://www.duba.com
	http://www.globaltimes.cn
	http://www.ycwb.com
	http://www.sfacg.com
	http://www.hotelscombined.com
	http://www.mydrivers.com
	http://www.taoche.com
	http://www.runoob.com
	http://www.tlscontact.com
	http://www.nba.com
	http://www.gamebase.com.tw
	http://www.zhibo8.cc
	http://www.hexun.com
	http://www.xiami.com
	http://www.finnair.com
	http://www.feng.com
	http://www.cdstm.cn
	http://www.uniqlo.com
	http://www.iciba.com
	http://www.qudong.com
	http://www.panda.tv
	http://www.cnbeta.com
	http://www.nipic.com
	http://www.sznews.com
	http://www.huawei.com
	http://www.tuicool.com
	http://www.baimao.com
	http://www.umeng.com
	http://www.ccidnet.com
	http://www.klm.com
	http://www.qcloud.com
	http://www.hupu.com
	http://www.ikanman.com
	http://www.3dmgame.com
	http://www.icolor.com.cn
	http://www.360.com
	http://www.36kr.com
	http://www.miui.com
	http://www.boc.cn
	http://www.gamersky.com
	http://www.joyme.com
	http://www.17173.com
	http://www.uc.cn
	http://www.alimama.com
	http://www.oasgames.com
	http://www.focus.cn
	http://www.cnr.cn
	http://www.miomio.tv
	http://www.jjwxc.net
	http://www.5dcar.com
	http://www.hjenglish.com
	http://www.dangdang.com
	http://www.springer.com
	http://www.to8to.com
	http://www.xiaomi.com
	http://www.ctrip.com
	http://www.delta.com
	http://www.anjuke.com
	http://www.cnki.net
	http://www.surveymonkey.com
	http://www.tower.im
	http://www.baiducontent.com
	http://www.acfun.cn
	http://www.people.com.cn
	http://www.jmw.com.cn
	http://www.worktile.com
	http://www.newsmth.net
	http://www.vmall.com
	http://www.07073.com
	http://www.qyer.com
	http://www.hujiang.com
	http://www.cnnic.cn
	http://www.meituan.com
	http://www.yinxiang.com
	http://www.ngacn.cc
	http://www.smzdm.com
	http://www.ccb.com
	http://www.ali213.net
	http://www.alibaba-inc.com
	http://www.3158.cn
	http://www.vmall.com
	http://www.nike.com
	http://www.eqxiu.com
	http://www.jandan.net
	http://www.office365.com
	http://www.imooc.com
	http://www.ikea.com
	http://www.united.com
	http://www.ly.com
	http://www.epwk.com
	http://www.tudou.com
	http://www.leagueoflegends.com
	http://www.aa.com
	http://www.garena.com
	http://www.mafengwo.cn
	http://www.ifensi.com
	http://www.pptv.com
	http://www.fobshanghai.com
	http://www.asiamiles.com
	http://www.znds.com
	http://www.hc360.com
	http://www.job853.com
	http://www.sf-express.com
	http://www.lianjia.com
	http://www.guokr.com
	http://www.cmbchina.com
	http://www.modernweekly.com
	http://www.ynet.com
	http://www.dell.com
	http://www.dict.cn
	http://www.yinyuetai.com
	http://www.aizhan.com
	http://www.gome.com.cn
	http://www.meishichina.com
	http://www.51hejia.com
	http://www.ule.com
	http://www.ea3w.com
	http://www.saraba1st.com
	http://www.chsi.com.cn
	http://www.vlive.tv
	http://www.sonhoo.com
	http://www.hongkongairlines.com
	http://www.jxnews.com.cn
	http://www.free.com.tw
	http://www.docin.com
	http://www.liepin.com
	http://www.chinaunix.net
	http://www.weibo.cn
	http://www.ifanr.com
	http://www.51auto.com
	http://www.ebrun.com
	http://www.10010.com
	http://www.hebei.com.cn
	http://www.tgbus.com
	http://www.mtime.com
	http://www.vip.com
	http://www.kdslife.com
	http://www.www.gov.cn
	http://www.cncn.org.cn
	http://www.techcrunch.com
	http://www.zbj.com
	http://www.ip138.com
	http://www.cyol.com
	http://www.pc6.com
	http://www.joox.com
	http://www.178.com
	http://www.lagou.com
	http://www.18183.com
	http://www.365jia.cn
	http://www.autohome.com.cn
	http://www.battlenet.com.cn
	http://www.oracle.com
	http://www.miaopai.com
	http://www.sina.cn
	http://www.ch.com
	http://www.yxdown.com
	http://www.etao.com
	http://www.vietnamairlines.com
	http://www.iyiou.com
	http://www.shop.com
	http://www.588ku.com
	http://www.le.com
	http://www.sina.com
	http://www.jstv.com
	http://www.ceconline.com
	http://www.koreanair.com
	http://www.skype.com
	http://www.ih5.cn
	http://www.ems.com.cn
	http://www.efu.com.cn
	http://www.pcbaby.com.cn
	http://www.shimo.im
	http://www.macaolife.com
	http://www.xiu.com
	http://www.eastmoney.com
	http://www.xiumi.us
	http://www.yhd.com
	http://www.jiemian.com
	http://www.daikuan.com
	http://www.ximalaya.com
	http://www.marriott.com
	http://www.d1ev.com
	http://www.xitek.com
	http://www.chuansong.me
	http://www.alitrip.com
	http://www.xiaomi.cn
	http://www.51job.com
	http://www.91jm.com
	http://www.2cto.com
	http://www.qoo10.com
	http://www.centadata.com
	http://www.lufthansa.com
	http://www.techweb.com.cn
	http://www.kugou.com
	http://www.80018.cn
	http://www.tmtpost.com
	http://www.house365.com
	http://www.hp.com
	http://www.unity3d.com
	http://www.zoom.us
	http://www.kafan.cn
	http://www.liansuo.com
	http://www.netease.com
	http://www.10jqka.com.cn
	http://www.xiazaiba.com
	http://www.fang.com
	http://www.smartisan.com
	http://www.photofans.cn
	http://www.ooopic.com
	http://www.zybang.com
	http://www.gw-ec.com
	http://www.wed114.cn
	http://www.huomao.com
	http://www.ithome.com
	http://www.ccb.com.cn
	http://www.chinanews.com
	http://www.doc88.com
	http://www.sanguosha.com
	http://www.evaair.com
	http://www.icbc.com.cn
	http://www.youxidudu.com
	http://www.verycd.com
	http://www.netcoc.com
	http://www.pepper.com
	http://www.dygang.com
	http://www.liaoxuefeng.com
	http://www.flyasiana.com
	http://www.sciencenet.cn
	http://www.feiyang.com
	http://www.800hr.com
	http://www.iconfont.cn
	http://www.youzan.com
	http://www.360kan.com
	http://www.chinabyte.com
	http://www.samsung.com
	http://www.zxart.cn
	http://www.gucheng.com
	http://www.bootcss.com
	http://www.cankaoxiaoxi.com
	http://www.58pic.com
	http://www.81.cn
	http://www.csair.com
	http://www.chiphell.com
	http://www.antpedia.com
	http://www.xiachufang.com
	http://www.winshang.com
	http://www.fzg360.com
	http://www.chaduo.com
	http://www.12306.cn
	http://www.morningpost.com.cn
	http://www.soku.com
	http://www.sspai.com
	http://www.yoox.com
	http://www.huxiu.com
	http://www.nyu.edu
	http://www.jiwu.com
	http://www.u17.com
	http://www.jiayuan.com
	http://www.yy.com
	http://www.duowan.com
	http://www.mbalib.com
	http://www.wanfangdata.com.cn
	http://www.ibuying.com
	http://www.chouti.com
	http://www.71.net
	http://www.hrloo.com
	http://www.meizu.com
	http://www.miercn.com
	http://www.fengniao.com
	http://www.fangdd.com
	http://www.htc.com
	http://www.jdzj.com
	http://www.pcauto.com.cn
	http://www.kaola.com
	http://www.kuaidi100.com
	http://www.yougov.com
	http://www.ku6.com
	http://www.sanwen8.cn
	http://www.yiwugou.com
	http://www.lottedfs.com
	http://www.cisco.com
	http://www.wallstreetcn.com
	http://www.gamedog.cn
	http://www.tencent.com
	http://www.tvhome.com
	http://www.xbox.com
	http://www.cr173.com
	http://www.onlinedown.net
	http://www.ebay.com.hk
	http://www.searchs.cn
	http://www.17track.net
	http://www.hyundai.com
	http://www.baixing.com
	http://www.258.com
	http://www.cn2che.com
	http://www.pudn.com
	http://www.dv37.com
	http://www.dv37.com
	http://www.uisdc.com
	http://www.sojump.com
	http://www.d1net.com
	http://www.ganji.com
	http://www.jobbole.com
	http://www.pearsoncmg.com
	http://www.kongfz.com
	http://www.365jilin.com
	http://www.strawberrynet.com
	http://www.11467.com
	http://www.jobui.com
	http://www.hh010.com
	http://www.teambition.com
	http://www.woshipm.com
	http://www.lge.com
	http://www.kanxi.cc
	http://www.leiphone.com
	http://www.d1com.com
	http://www.114so.cn
	http://www.d1com.com
	http://www.114so.cn
	http://www.duomai.com
	http://www.win007.com
	http://www.weidian.com
	http://www.qiku.com
	http://www.cli.im
	http://www.flyertea.com
	http://www.lenovo.com.cn
	http://www.aso100.com
	http://www.xueqiu.com
	http://www.bp.com
	http://www.dingtalk.com
	http://www.processon.com
	http://www.flyme.cn
	http://www.a9vg.com
	http://www.sinaimg.cn
	http://www.saic.gov.cn
	http://www.mgtv.com
	http://www.nuomi.com
	http://www.tiexue.net
	http://www.vvvdj.com
	http://www.tvmao.com
	http://www.panduoduo.net
	http://www.wechat.com
	http://www.52pojie.cn
	http://www.miwifi.com
	http://www.iteye.com
	http://www.kanzhun.com
	http://www.mango.com
	http://www.cheaa.com
	http://www.13322.com
	http://www.jikexueyuan.com
	http://www.taisha.org
	http://www.mydigit.cn
	http://www.gusuwang.com
	http://www.pinggu.org
	http://www.lbldy.com
	http://www.sgcn.com
	http://www.misumi-ec.com
	http://www.lofter.com
	http://www.unrealengine.com
	http://www.gao7.com
	http://www.leju.com
	http://www.home77.com
	http://www.qunar.com
	http://www.xdowns.com
	http://www.oa.com
	http://www.sgcn.com
	http://www.szjy188.com
	http://www.tuniu.com
	http://www.135editor.com
	http://www.f.com
	http://www.zhibo.tv
	http://www.jiyoujia.com
	http://www.95516.com
	http://www.yiqifa.com
	http://www.cocoachina.com
	http://www.babyschool.com.cn
	http://www.iweihai.cn
	http://www.haowu.com
	http://www.hm.com
	http://www.wish.com
	http://www.fitbit.com
	http://www.taojindi.com
	http://www.koolearn.com
	http://www.xabbs.com
	http://www.020.com
	http://www.qiniu.com
	http://www.25pp.com
	http://www.nga.cn
	http://www.educity.cn
	http://www.zealer.com
	http://www.xdowns.com
	http://www.liqu.com
	http://www.qichacha.com
	http://www.51credit.com
	http://www.duomai.com
	http://www.juooo.com
	http://www.shanbay.com
	http://www.juooo.com
	http://www.shanbay.com
	http://www.meishij.net
	http://www.th7.cn
	http://www.jia400.com
	http://www.cas.cn
	http://www.wenwuchina.com
	http://www.189.cn
	http://www.liuxue86.com
	http://www.klook.com
	http://www.shfft.com
	http://www.8264.com
	http://www.china.cn
	http://www.zhifang.com
	http://www.made-in-china.com
	http://www.rabbitpre.com
	http://www.sap.com
	http://www.macx.cn
	http://www.everychina.com
	http://www.9game.cn
	http://www.ca800.com
	http://www.dgtle.com
	http://www.cloudscar.com
	http://www.bdhome.cn
	http://www.news18a.com
	http://www.shilladfs.com
	http://www.net-a-porter.com
	http://www.zealer.com
	http://www.discoverhongkong.com
	http://www.80s.tw
	http://www.9ku.com
	http://www.33lc.com
	http://www.thepaper.cn
	http://www.scswl.cn
	http://www.officedepot.com
	http://www.fx678.com
	http://www.banma.com
	http://www.eee114.com
	http://www.9384.com
	http://www.xuexila.com
	http://www.9384.com
	http://www.xuexila.com
	http://www.cheshen.cn
	http://www.mr-world.com
	http://www.fx112.com
	http://www.97665.com
	http://www.chinahr.com
	http://www.acs.org
	http://www.mikecrm.com
	http://www.checheng.com
	http://www.appgame.com
	http://www.linkhaitao.com
	http://www.meipai.com
	http://www.linuxidc.com
	http://www.fliggy.com
	http://www.amap.com
	http://www.4px.com
	http://www.qpic.cn
	http://www.modao.cc
	http://www.dianxiaomi.com
	http://www.56.com
	http://www.java.com
	http://www.hdpfans.com
	http://www.thinkphp.cn
	http://www.2345.com
	http://www.baoku.com
	http://www.tiancity.com
	http://www.bcsh.com
	http://www.bozhong.com
	http://www.zhiding.cn
	http://www.longzhu.com
	http://www.xjtour.com
	http://www.kancloud.cn
	http://www.open-open.com
	http://www.itpub.net
	http://www.elong.com
	http://www.pchome.net
	http://www.pps.tv
	http://www.qinqinbaby.com
	http://www.chuandong.com
	http://www.coding.net
	http://www.yidianzixun.com
	http://www.51nb.com
	http://www.dhgate.com
	http://www.10086.cn
	http://www.6vhao.com
	http://www.5acbd.com
	http://www.atobo.com.cn
	http://www.kubo365.com
	http://www.111cn.net
	http://www.zhongmin.cn
	http://www.weiyangx.com
	http://www.juesheng.com
	http://www.uuu9.com
	http://www.siilu.com
	http://www.pconline.com.cn
	http://www.dji.com
	http://www.west.cn
	http://www.ctfile.com
	http://www.idianfa.com
	http://www.smm.cn
	http://www.shejis.com
	http://www.zhangyu.tv
	http://www.17zwd.com
	http://www.dhl.com
	http://www.shfft.com
	http://www.wanmei.com
	http://www.122.gov.cn
	http://www.51nb.com
	http://www.xici.net
	http://www.cnki.com.cn
	http://www.redocn.com
	http://www.qvc.com
	http://www.aipai.com
	http://www.dapenti.com
	http://www.3lian.com
	http://www.guidechem.com
	http://www.jiankang.com
	http://www.tgfcer.com
	http://www.freebuf.com
	http://www.sodao.com
	http://www.zhcw.com
	http://www.sh.com
	http://www.ablesky.com
	http://www.microsoftstore.com.cn
	http://www.7k7k.com
	http://www.southmoney.com
	http://www.btc123.com
	http://www.digitaling.com
	http://www.meitu.com
	http://www.chinaaet.com
	http://www.kaoyan.com
	http://www.aipai.com
	http://www.tripadvisor.cn
	http://www.colg.cn
	http://www.admin5.com
	http://www.ncar.cc
	http://www.intel.com
	http://www.wanyx.com
	http://www.chmotor.cn
	http://www.mxhichina.com
	http://www.jzb.com
	http://www.it168.com
	http://www.1kkk.com
	http://www.cnodejs.org
	http://www.hudong.com
	http://www.ucweb.com
	http://www.xyw.gov.cn
	http://www.airasiago.com
	http://www.damai.cn
	http://www.farnell.com
	http://www.hi-pda.com
	http://www.wenku1.com
	http://www.haosou.com
	http://www.ishuhui.com
	http://www.paopaoche.net
	http://www.csai.cn
	http://www.zhaoshangbao.com
	http://www.eol.cn
	http://www.excelhome.net
	http://www.missevan.com
	http://www.cncv.org.cn
	http://www.365yg.com
	http://www.huim.com
	http://www.zxxk.com
	http://www.51yes.com
	http://www.cainiao.com
	http://www.nh87.cn
	http://www.b0yp.com
	http://www.qdaily.com
	http://www.kongzhong.com
	http://www.shangc.net
	http://www.dongqiudi.com
	http://www.jiankang.com
	http://www.dzsc.com
	http://www.chinaacc.com
	http://www.vcg.com
	http://www.oneplusbbs.com
	http://www.xuetangx.com
	http://www.fz222.com
	http://www.cnwnews.com
	http://www.chinadmd.com
	http://www.b2b168.com
	http://www.pingan.com
	http://www.pushauction.com
	http://www.sdo.com
	http://www.9978.cn
	http://www.ltaaa.com
	http://www.gxyj.com
	http://www.kuaizhan.com
	http://www.airchina.com.cn
	http://www.gcl-power.com
	http://www.medsci.cn
	http://www.lbxcn.com
	http://www.lzgd.com.cn
	http://www.oray.com
	http://www.taobao.org
	http://www.btbtdy.com
	http://www.i2ya.com
	http://www.istar.cn
	http://www.xgo.com.cn
	http://www.66law.cn
	http://www.heiguang.com
	http://www.ao.com
	http://www.jq22.com
	http://www.qidian.com
	http://www.goldcarpet.cn
	http://www.zxbtz.cn
	http://www.jiushang.cn
	http://www.cicpa.org.cn
	http://www.wowenda.com
	http://www.coursera.org
	http://www.fangdr.com
	http://www.cps.com.cn
	http://www.kmf.com
	http://www.cri.cn
	http://www.lmjx.net
	http://www.lonshinetech.cn
	http://www.infoq.com
	http://www.gushiwen.org
	http://www.ecp888.com
	http://www.tongtool.com
	http://www.dajie.com
	http://www.co188.com
	http://www.fumanhua.net
	http://www.maiche168.com
	http://www.sankuai.com
	http://www.ucas.ac.cn
	http://www.lamabang.com
	http://www.huajiao.com
	http://www.accorhotels.com
	http://www.wendangku.net
	http://www.dragonparking.com
	http://www.6789.com
	http://www.xdf.cn
	http://www.tucao.tv
	http://www.91yunxiao.com
	http://www.liebiao.com
	http://www.9lianmeng.com
	http://www.51240.com
	http://www.zhiyoo.com
	http://www.silkair.com
	http://www.313.cn
	http://www.ssl-images-amazon.com
	http://www.eepw.com.cn
	http://www.gs307.com
	http://www.yindou.com
	http://www.i1515.com
	http://www.imiker.com
	http://www.lvmama.com
	http://www.louisvuitton.com
	http://www.nowgoal.com
	http://www.makeding.com
	http://www.xz7.com
	http://www.guitarchina.com
	http://www.wto168.net
	http://www.abchina.com
	http://www.fzdm.com
	http://www.ichacha.net
	http://www.1024sj.com
	http://www.ef43.com.cn
	http://www.newrank.cn
	http://www.ceair.com
	http://www.zimuku.net
	http://www.ppkoo.com
	http://www.jc35.com
	http://www.dnspod.cn
	http://www.hsw.cn
	http://www.caixin.com
	http://www.manmanbuy.com
	http://www.23us.com
	http://www.asus.com
	http://www.zoosnet.net
	http://www.xp510.com
	http://www.vgtime.com
	http://www.qiushibaike.com
	http://www.jinshuju.net
	http://www.115.com
	http://www.3367.com
	http://www.fanli.com
	http://www.newcger.com
	http://www.kepu.net.cn
	http://www.findlaw.cn
	http://www.jiumei.com
	http://www.gkstk.com
	http://www.ihg.com
	http://www.blizzard.com
	http://www.lenovo.com
	http://www.longau.com
	http://www.seedit.com
	http://www.ofweek.com
	http://www.61baobao.com
	http://www.400.cn
	http://www.wines-info.com
	http://www.innisfree.com
	http://www.weather.com.cn
	http://www.che168.com
	http://www.dilidili.wang
	http://www.7po.com
	http://www.qiushibaike.com
	http://www.9r.cn
	http://www.weather.com.cn
	http://www.107cine.com
	http://www.coolapk.com
	http://www.ixueshu.com
	http://www.iplaysoft.com
	http://www.blizzard.cn
	http://www.dangbei.com
	http://www.hellorf.com
	http://www.21food.cn
	http://www.libaclub.com
	http://www.outofmemory.cn
	http://www.ele.me
	http://www.shihuo.cn
	http://www.zmz2017.com
	http://www.zybuluo.com
	http://www.66ys.tv
	http://www.sczw.com
	http://www.xtx6.com
	http://www.tutorabc.com
	http://www.zhipin.com
	http://www.cgdc.com.cn
	http://www.61learn.com
	http://www.sm.cn
	http://www.571xz.com
	http://www.sobt5.org
	http://www.starwoodhotels.com
	http://www.qqtn.com
	http://www.sgamer.com
	http://www.120ask.com
	http://www.appinn.com
	http://www.qianzhan.com
	http://www.888pic.com
	http://www.tianyancha.com
	http://www.k73.com
	http://www.yiibai.com
	http://www.downxia.com
	http://www.managershare.com
	http://www.downcc.com
	http://www.biquge.tw
	http://www.fgowiki.com
	http://www.p2peye.com
	http://www.haosou.com
	http://www.yimu100.com
	http://www.fox.com
	http://www.mrporter.com
	http://www.genshuixue.com
	http://www.jisutiyu.com
	http://www.topfo.com
	http://www.right.com.cn
	http://www.5ewin.com
	http://www.dongnanshan.com
	http://www.jizhangla.com
	http://www.laawoo.com
	http://www.3618med.com
	http://www.ahgame.com
	http://www.mamicode.com
	http://www.wugu.com.cn
	http://www.115.com
	http://www.genshuixue.com
	http://www.57mh.com
	http://www.oiegg.com
	http://www.21csp.com.cn
	http://www.kekenet.com
	http://www.c5game.com
	http://www.juejin.im
	http://www.baofeng.com
	http://www.kuwo.cn
	http://www.6.cn
	http://www.chayu.com
	http://www.sanwen.net
	http://www.962.net
	http://www.etest.net.cn
	http://www.innisfree.com
	http://www.dragonair.com
	http://www.vjshi.com
	http://www.lawtime.cn
	http://www.sccnn.com
	http://www.qqbaobao.com
	http://www.dragonair.com
	http://www.vjshi.com
	http://www.lawtime.cn
	http://www.sccnn.com
	http://www.qqbaobao.com
	http://www.chinaswitch.com
	http://www.5118.com
	http://www.cntv.cn
	http://www.knowsky.com
	http://www.skyscanner.com
	http://www.wrz.com
	http://www.wasu.cn
	http://www.mojifen.com
	http://www.nvidia.com
	http://www.oceanpark.com.hk
	http://www.pcbeta.com
	http://www.psnine.com
	http://www.228.com.cn
	http://www.zhuixinfan.com
	http://www.okcoin.cn
	http://www.huya.com
	http://www.1ppt.com
	http://www.fyber.com
	http://www.72byte.com
	http://www.cpic.com.cn
	http://www.wlmq.com
	http://www.lusongsong.com
	http://www.fanjian.net
	http://www.hopetrip.com.hk
	http://www.hnjy.com.cn
	http://www.8kana.com
	http://www.8d.cc
	http://www.linux.cn
	http://www.enterprise.com
	http://www.iqing.in
	http://www.sg560.com
	http://www.mnw.cn
	http://www.trendmicro.com
	http://www.sipo.gov.cn
	http://www.a.com.cn
	http://www.hangame.com
	http://www.cngold.org
	http://www.95095.com
	http://www.ishuo.cn
	http://www.tecenet.com
	http://www.jinti.com
	http://www.sobaidupan.com
	http://www.ichunqiu.com
	http://www.xilu.com
	http://www.3987.com
	http://www.rr-sc.com
	http://www.99114.com
	http://www.haodou.com
	http://www.wolfram.com
	http://www.expreview.com
	http://www.myexception.cn
	http://www.shixiseng.com
	http://www.bjjs.gov.cn
	http://www.xxbiquge.com
	http://www.lesports.com
	http://www.hea.cn
	http://www.24home.com
	http://www.yeah.net
	http://www.qcw.com
	http://www.shoes.net.cn
	http://www.9c9v.com
	http://www.bjhjyd.gov.cn
	http://www.ecvv.com
	http://www.fanlibang.com
	http://www.jxmall.com
	http://www.xcar.com.cn
	http://www.go108.com.cn
	http://www.divcss5.com
	http://www.sc.com
	http://www.watchstore.com.cn
	http://www.mexgroup.com
	http://www.xunyingwang.com
	http://www.chinagate.cn
	http://www.zdic.net
	http://www.bdimg.com
link_list = []
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
    file_list = file.readlines()
    for e in file_list:
        link = e.split('\t')[1]
        link = link.replace('\n', '')
        link_list.append(link)
stat = time.time()
for e in link_list:
    try:
        r = requests.get(e)
        print(r.status_code, e)
    except Exception as erro:
        print('Error:', erro)
end = time.time()
print('串行的总时长为:', end - stat)

学习python多线程

python两种使用多线程的方法。
函数式:调用_thread模块中的start_new_thread()
类包装式:调用Threading库创建线程,从threading.thread继承。
1.

# 为线程定义一个函数
def print_time(threadName, delay):
count = 0
while count < 3:
    time.sleep(delay)
    count += 1
    print(threadName, time.ctime())
# _thread.start_new_thread(print_time, ("Thread-1", 1))
# _thread.start_new_thread(print_time, ("Thread-2", 2))
# print("Main Finished")

class myThread(threading.Thread):
def __init__(self, name, delay):
    threading.Thread.__init__(self)
    self.name = name
    self.delay = delay

def run(self):
    print("Starting" + self.name)
    print_time(self.name, self.delay)
    print("Exiting" + self.name)

def print_time(threadName, delay):
    counter = 0
    while counter < 3:
        time.sleep(delay)
        print(threadName, time.ctime())
    counter += 1
threads = []

# 创建新线程
thread1 = myThread("Thread-1", 1)
thread2 = myThread("Thread-2", 2)

# 开启新线程
thread1.start()
thread2.start()

# 添加线程到线程列表
threads.append(thread1)
threads.append(thread2)

# 等待所有线程完成
for t in threads:
	t.join()
print("Exiting Main Thread")

run():以表示线程活动的方法
start():启动线程活动
join([time]):组设调用线程直至线程的join()方法被调用为止
isAlive():返回线程是否是活动的
getNmae():返回线程名称
setName():设置线程名
上面代码中,thread1 = myThread("Thread-1", 1),然后在myThread这个类中对线程进行设置,使用run()表示线程运行方法当counter小于3时打印线程名称和时间。然后使用thread1.start()开启线程,使用threads.append(thread1)添加线程到线程列表中,用t.join()等待所有线程完成才会继续执行主线程。

简单的多线程爬虫实例

import threading
link_list = []
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
	file_list = file.readlines()
	for e in file_list:
		link = e.split('\t')[1]
		link = link.replace('\n', '')
		link_list.append(link)
stat = time.time()


class myThread(threading.Thread):
	def __init__(self, name, link_range):
		threading.Thread.__init__(self)
		self.name = name
		self.link_range = link_range

	def run(self):
		print("Starting" + self.name)
		crawler(self.name, self.link_range)
		print("Exiting" + self.name)


def crawler(threaName, link_range):
	for i in range(link_range[0], link_range[1] + 1):
		try:
			r = requests.get(link_list[i], timeout=20)
			print(threaName, r.status_code, link_list[i])
		except Exception as e:
			print(threaName, 'Error:', e)


thread_list = []

link_range_list = [(0, 200), (201, 400), (401, 600), (601, 800), (801, 1000)]

# 创建
for i in range(1, 6):
	thread = myThread("Thread-" + str(i), link_range_list[i - 1])
	thread.start()
	thread_list.append(thread)
# 等待所有线程完成
for i in thread_list:
	i.join()
end = time.time()

print('简单多线程爬虫的总时长为:', end - stat)

上面代码中,将1000个网页分成5份,然后利用for循环创建了5个线程,将这些网页分别指派到5个线程中运行

使用Queue的多线程爬虫

python的Queue模块提供了同步的、线程安全的队列类,包括FIFO(先进先出)队列、LIFO(后入先出)队列和优先级队列PriorityQueue。
例子:
开启五个线程然后通过队列的方式,把一千个网页平均分配给这五个线程

link_list = []  # 网页连接
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
	file_list = file.readlines()
	for e in file_list:
		link = e.split('\t')[1]
		link = link.replace('\n', '')
		link_list.append(link)
# 开始时间
stat = time.time()
# 继承Thread类
class myThread(threading.Thread):
	def __init__(self, name, q):
		threading.Thread.__init__(self)
		self.name = name
		self.q = q

	def run(self):
		print("Starting" + self.name)
		while True:
			try:
				crawler(self.name, self.q)
			except Exception as e:
				break
		print("Exiting" + self.name)


def crawler(threaName, q):
	# 获取队列中的链接
	url = q.get(timeout=2)
	try:
		r = requests.get(url, timeout=20)
		print(q.qsize(), threaName, r.status_code, url)
	except Exception as e:
		print(q.qsize(), threaName, url, 'Error', e)


threadlist = ['Thread-1', 'Thread-2', 'Thread-3', 'Thread-4', 'Thread-5']
# 建立一个队列对象
workQueue = Queue.Queue(1000)
threads = []

# 创建新线程
for tName in threadlist:
	thread = myThread(tName, workQueue)
	thread.start()
	threads.append(thread)

# 填充队列
for url in link_list:
	workQueue.put(url)  # 填充队列

# 等待所有线程完成
for t in threads:
	t.join()

end = time.time()
print('简单多线程爬虫的总时长为:', end - stat)

多进程爬虫

python的多线程爬虫只能运行在单核上,各个线程以并发的方式异步运行。由于GIL的存在,多线程并不能发挥多核CPU的资源。
作为提升python网络爬虫的速度的另外一种方法,多进程爬虫则可以利用CPU的多核,多进程就需要用到multiprocessing这个库。
使用multiprocess这个库有两种方法,一种是使用Process+queue的方法,另外一种是pool+queue的方法。

使用multiprocessing的多进程爬虫

当进程数大于cpu的内核数量时,等待运行的进程会等其他进程运行完让出内核。所以我们需要了解计算机的cpu核心数量。

查看当前电脑spu核
from multiprocessing import cpu_count
print(cpu_count())

多线程爬虫实例:
1.Process+queue的方法,在多进程中,每个进程都可以单独设置它的属性,如果将daemon设置为true,当父进程结束后,子进程就会自动终止。

from multiprocessing import Queue, Process
link_list = []
with open(r'C:\Users\K1567\Desktop\alexa.txt', 'r') as file:
	file_list = file.readlines()
	for e in file_list:
		link = e.split('\t')[1]
		link = link.replace('\n', '')
		link_list.append(link)
stat = time.time()

# Process子进程
class MyProcess(Process):
	def __init__(self, q):
		Process.__init__(self)
		self.q = q

	def run(self):
		print("Starting", self.pid)
		while not self.q.empty():
			crawler(self.q)
		print("Exiting" + str(self.pid))


def crawler(q):
	url = q.get(timeout=2)
	try:
		r = requests.get(url, timeout=20)
		print(q.qsize(), r.status_code, url)
	except Exception as e:
		print(q.qsize(), url, 'Error', e)


if __name__ == '__main__':
	workQueue = Queue(1000)

	#     填充队列
	for url in link_list:
		workQueue.put(url)

	for i in range(0, 5):
		p = MyProcess(workQueue)
		p.daemon = True
		p.start()
		p.join()
	end = time.time()
	print('简单多进程爬虫的总时长为:', end - stat)

2.使用pool+queue的多进程爬虫:当被操作数目不大时,可以直接利用multiprocessing中的process动态生成多个进程,十几个还好,如果成百上千个目标,手动的限制进程数量就太繁琐,此时可以使用pool发挥进程池的功效。

posted @ 2023-02-20 10:04  小旺first  阅读(983)  评论(0)    收藏  举报
顶部