爬虫学习之爬取酷狗音乐

我们这次使用Python来进行对酷狗音乐的爬取，数据来源与酷狗

1.发送请求的模块

import requests

音乐的地址URL

m_url = 'https://webfs.ali.kugou.com/202305172335/695e4719686e024397958a7eb3f7d89c/KGTX/CLTX001/413e3ae5346ea60b3850927602aa7a18.mp3'

在network下的Media找到URL地址

2. 发送请求到服务器，获取音乐数据

m_resp = requests.get(m_url,headers=headers)

在上面需要用headers来进行伪装自己。

headers = {
　　'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.42'
}

3. 服务器回应数据，保存数据

with open('./mp3/zj.mp3','wb') as f:
　　f.write(m_resp.content)

这就是可以进行对一首音乐的提取下载。但我们需要进行对多首音乐进行操作。

多首歌曲爬取操作。

# 首先呢，需要获取音乐列表的URL

list_url = 'https://complexsearch.kugou.com/v2/search/song?callback=callback123&srcappid=2919&clientver=1000&clienttime=1684327843988&mid=52ee050d4b6d1ba7acb9a1a05a60c98d&uuid=52ee050d4b6d1ba7acb9a1a05a60c98d&dfid=4FHmC13bOoFu3sMxa31Eh55n&keyword=%E5%86%8D%E8%A7%81&page=1&pagesize=30&bitrate=0&isfuzzy=0&inputtype=0&platform=WebFilter&userid=0&iscorrection=1&privilege_filter=0&filter=10&token=&appid=1014&signature=154588196dcaaa686211037f95d9d68b'

然后进行发送请求

list_resp = requests.get(list_url,headers=headers)

数据的提取

song_list = json.loads(list_resp.text[12:-2])['data']['lists']
for i, s in enumerate(song_list):
　　print(f'{i+1}----{s.get("FileName")}----{s.get("EMixSongID")}')

为什么要提取EMixSongID呢，因为每一首歌的内容信息都放在一个URL里面，需要用EMixSongID去识别这是哪一首歌，以便我们去爬取相应的歌曲。

那么EMixSongID怎么去获取呢？

我们需要在歌曲的network下搜索mp3，找到存放歌曲信息的URL

然后在另一个网页，搜索该URL，就可以看到关于这首歌的信息内容，在URL中依次删除不要的，直到剩下一个能显示内容的URL

例如：

上面的URL简化为：https://wwwapi.kugou.com/yy/index.php?r=play/getdata&encode_album_audio_id=j4r7s30

能唯一识别内容的encode_album_id=j4r7s30，j4r7s30就是我们需要去获取的歌曲的内容标识，通过这个去获取不同的歌曲内容信息。

而这个内容存放在EMixSongID里面，所以我们要要提取EMixSongID。

音乐的信息URL：通过num去选择下载那个歌曲。

num = input("去请输入下载第几首音乐:")

info_url = f'https://wwwapi.kugou.com/yy/index.php?r=play/getdata&encode_album_audio_id={song_list[int(num) - 1].get("EMixSongID")}'
# print(info_url)

对歌曲信息URL发送请求，

info_resp = requests.get(info_url, headers=headers)

获取关键mp3

m_url = info_resp.json()['data']['play_url']

最后在发送请求到服务器，获取音乐数据，服务器回应数据，保存数据。

完整的代码：

import requests  # 发送请求用的模块
import json

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.42',
    'cookie': 'kg_mid=52ee050d4b6d1ba7acb9a1a05a60c98d; kg_dfid=4FHmC13bOoFu3sMxa31Eh55n; kg_dfid_collect=d41d8cd98f00b204e9800998ecf8427e'
}

# 音乐列表
list_url = 'https://complexsearch.kugou.com/v2/search/song?callback=callback123&srcappid=2919&clientver=1000&clienttime=1684327843988
&mid=52ee050d4b6d1ba7acb9a1a05a60c98d&uuid=52ee050d4b6d1ba7acb9a1a05a60c98d&dfid=4FHmC13bOoFu3sMxa31Eh55n&keyword=%E5%86%8D%E8%A7%81
&page=1&pagesize=30&bitrate=0&isfuzzy=0&inputtype=0&platform=WebFilter&userid=0&iscorrection=1&privilege_filter=0&filter=10&token=
&appid=1014&signature=154588196dcaaa686211037f95d9d68b'

list_resp = requests.get(list_url, headers=headers)
# 数据的提取
song_list = json.loads(list_resp.text[12:-2])['data']['lists']
for i, s in enumerate(song_list):
    print(f'{i + 1}----{s.get("FileName")}----{s.get("EMixSongID")}')

num = input("去请输入下载第几首音乐:")

# 音乐信息的URL
info_url = f'https://wwwapi.kugou.com/yy/index.php?r=play/getdata&encode_album_audio_id={song_list[int(num) - 1].get("EMixSongID")}'
# print(info_url)

info_resp = requests.get(info_url, headers=headers)
# print("mp3---", info_resp.json()['data']['play_url'])

# 1.音乐的地址URL
m_url = info_resp.json()['data']['play_url']

# 发送请求到服务器，获取音乐数据
m_resp = requests.get(m_url, headers=headers)

# 3.服务器回应数据，保存数据
with open('./mp3/zj.mp3', 'wb') as f:
    f.write(m_resp.content)

需要注意：

在headers 里面加入cookie内容，是网页在进行反爬取时，我们需要用cookie来进行处理。

posted @ 2023-05-18 10:00 慧眼识辰阅读(1561) 评论(0) 收藏举报

刷新页面返回顶部

beichens

爬虫学习之爬取酷狗音乐

1.发送请求的模块

2. 发送请求到服务器，获取音乐数据

3. 服务器回应数据，保存数据

多首歌曲爬取操作。

完整的代码：

公告

beichens

爬虫学习之爬取酷狗音乐

1.发送请求的模块

2. 发送请求到服务器，获取音乐数据

3. 服务器回应数据，保存数据

多首歌曲爬取操作。

完整的代码 ：

公告

完整的代码：