利用threading多线程爬取王者荣耀的高清壁纸
不知道有没有小伙伴跟我一样,面对Windows桌面单调的壁纸,有一种锤屏幕的冲动,下面我会以代码的形式教大家下载王者荣耀的高清壁纸
项目模块需求
- requests
- urllib
- queue
- threading
- os
- time
项目实现
- 首先我们需要导入以下模块
import requests
from urllib import request
from urllib import parse
import queue
import threading
import os
import time
- 为了防止王者荣耀的网页后台识别出爬虫,我们需要伪装成浏览器,这里我们需要在chrome的network中找到
headers和referer这两项
headers={
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
'referer':'https://pvp.qq.com/web201605/wallpaper.shtm'
}
-
在这个项目中我使用的是
Queue安全队列,利用生产者和消费者模式,我定义了两个类,并继承threading.Thread- 生产者:Getinfo
class Getinfo(threading.Thread): def __init__(self,page_queue,image_queue,*args,**kwargs): super(Getinfo, self).__init__(*args,**kwargs) self.page_queue = page_queue self.image_queue = image_queue @staticmethod def extract_images(data): images = [] for x in range(1, 9): image_urls = parse.unquote(data['sProdImgNo_%d' % x]).replace('200', '0') images.append(image_urls) return images def run(self) -> None: while not self.page_queue.empty(): page_url = self.page_queue.get() resp = requests.get(page_url,headers=headers) datas = resp.json().get("List") for data in datas: img_urls = Getinfo.extract_images(data) name = parse.unquote(data['sProdName']).replace('1:1','').strip() dir_path = os.path.join("image",name) if not os.path.exists(dir_path): os.mkdir(dir_path) for index,img_url in enumerate(img_urls): self.image_queue.put({"name":name,"img_url":img_url,"index":index}) print("%s线程执行完成"%threading.current_thread().name)- 消费者:Saveinfo
class Saveinfo(threading.Thread): def __init__(self,page_queue,image_queue,*args,**kwargs): super(Saveinfo, self).__init__(*args,**kwargs) self.page_queue = page_queue self.image_queue = image_queue def run(self) -> None: while True: try: img_obj = self.image_queue.get(timeout=30) dir_name = img_obj.get("name") img_url = img_obj.get("img_url") index = img_obj.get("index") dir_path = os.path.join("image",dir_name) try: request.urlretrieve(img_url,os.path.join(dir_path, "{}.jpg".format(index+1))) print(os.path.join(dir_path,"{}.jpg".format(index+1)+"下载完成!")) except Exception as e: print("="*30) print(e) print(img_url) print("="*30) except queue.Empty as e: print(e) time.sleep(0.1) continue在图片的存储中,我们需要套用两个
try-except代码块,这样即使在访问错误的情况下,程序依旧会运行下去。 -
在
main函数中,我们分别对王者荣耀壁纸页码队列和图片队列进行大小设置。 -
为了快速高效的下载壁纸我们分别对
Getinfo和Saveinfo启用8个线程。
def main():
page_queue = queue.Queue(21)
image_queue = queue.Queue(1000)
for x in range(21):
page_url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page={}&iOrder=0&iSortNumClose=1&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1584692023502'.format(x)
page_queue.put(page_url)
for x in range(8):
th = Getinfo(page_queue,image_queue,name="数据线程%d"%x)
th.start()
for x in range(8):
th = Saveinfo(page_queue,image_queue,name="数据线程%d"%x)
th.start()
if __name__ == "__main__":
main()
总结:
以上就是我为大家分享的多线程下载文件的方式,以下载王者荣耀壁纸的形式进行说明。文章的重点是Queue队列的使用,它的get和put这两个方法。

浙公网安备 33010602011771号