利用threading多线程爬取王者荣耀的高清壁纸

不知道有没有小伙伴跟我一样,面对Windows桌面单调的壁纸,有一种锤屏幕的冲动,下面我会以代码的形式教大家下载王者荣耀的高清壁纸

项目模块需求

  • requests
  • urllib
  • queue
  • threading
  • os
  • time

项目实现

  1. 首先我们需要导入以下模块
import requests
from urllib import request
from urllib import parse
import queue
import threading
import os
import time
  1. 为了防止王者荣耀的网页后台识别出爬虫,我们需要伪装成浏览器,这里我们需要在chrome的network中找到headersreferer这两项
headers={
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
    'referer':'https://pvp.qq.com/web201605/wallpaper.shtm'
}
  1. 在这个项目中我使用的是Queue安全队列,利用生产者和消费者模式,我定义了两个类,并继承threading.Thread

    1. 生产者:Getinfo
    class Getinfo(threading.Thread):
    def __init__(self,page_queue,image_queue,*args,**kwargs):
        super(Getinfo, self).__init__(*args,**kwargs)
        self.page_queue = page_queue
        self.image_queue = image_queue
    @staticmethod
    def extract_images(data):
        images = []
        for x in range(1, 9):
            image_urls = parse.unquote(data['sProdImgNo_%d' % x]).replace('200', '0')
            images.append(image_urls)
        return images
    
    def run(self) -> None:
        while not self.page_queue.empty():
            page_url = self.page_queue.get()
            resp = requests.get(page_url,headers=headers)
            datas = resp.json().get("List")
            for data in datas:
                img_urls = Getinfo.extract_images(data)
                name = parse.unquote(data['sProdName']).replace('1:1','').strip()
                dir_path = os.path.join("image",name)
                if not os.path.exists(dir_path):
                    os.mkdir(dir_path)
                for index,img_url in enumerate(img_urls):
                    self.image_queue.put({"name":name,"img_url":img_url,"index":index})
    
        print("%s线程执行完成"%threading.current_thread().name)
    
    1. 消费者:Saveinfo
    class Saveinfo(threading.Thread):
    def __init__(self,page_queue,image_queue,*args,**kwargs):
        super(Saveinfo, self).__init__(*args,**kwargs)
        self.page_queue = page_queue
        self.image_queue = image_queue
    
    def run(self) -> None:
        while True:
            try:
                img_obj = self.image_queue.get(timeout=30)
                dir_name = img_obj.get("name")
                img_url = img_obj.get("img_url")
                index = img_obj.get("index")
                dir_path = os.path.join("image",dir_name)
                try:
                    request.urlretrieve(img_url,os.path.join(dir_path, "{}.jpg".format(index+1)))
                    print(os.path.join(dir_path,"{}.jpg".format(index+1)+"下载完成!"))
                except Exception as e:
                    print("="*30)
                    print(e)
                    print(img_url)
                    print("="*30)
            except queue.Empty as e:
                print(e)
                time.sleep(0.1)
                continue
    

    在图片的存储中,我们需要套用两个try-except代码块,这样即使在访问错误的情况下,程序依旧会运行下去。

  2. main函数中,我们分别对王者荣耀壁纸页码队列和图片队列进行大小设置。

  3. 为了快速高效的下载壁纸我们分别对GetinfoSaveinfo启用8个线程。


def main():
    page_queue = queue.Queue(21)
    image_queue = queue.Queue(1000)
    for x in range(21):
        page_url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page={}&iOrder=0&iSortNumClose=1&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1584692023502'.format(x)
        page_queue.put(page_url)
    for x in range(8):
        th = Getinfo(page_queue,image_queue,name="数据线程%d"%x)
        th.start()
    for x in range(8):
        th = Saveinfo(page_queue,image_queue,name="数据线程%d"%x)
        th.start()

if __name__ == "__main__":
    main()

总结:

以上就是我为大家分享的多线程下载文件的方式,以下载王者荣耀壁纸的形式进行说明。文章的重点是Queue队列的使用,它的getput这两个方法。

posted @ 2020-03-20 19:18  reidosann  阅读(107)  评论(0)    收藏  举报