B站短评爬取优化版（来源知乎）

纯小白，这个代码我花了额一天时间优化（其实不用一天，主要是找资料调试费的时间多）

博客园发布：

import json
import random
import time
import tkinter as tk
from tkinter import filedialog
import requests
root = tk.Tk()
root.withdraw()
FilePath = filedialog.askopenfilename()  # 一般这个直接选择文件，会比较符合人们的使用习惯和软件的用户体验
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"}  # 伪装成浏览器，绕过反爬
url = 'https://api.bilibili.com/pgc/review/short/list?media_id=28223053&ps=20&sort=1'  # media_id就是动画id，sort排序，默认0，最新1
# 发送get请求
w = requests.get(url, headers=headers).text
json_comment = json.loads(w)
total = json_comment['data']['list']  # url中list中存储的内容
num = json_comment['data']['total']  # total中的内容，一共有多少个url
s = json_comment['data']  # url中的所有内容
j = 0
while j < num:
    total = json_comment['data']['list']
    for i in range(len(total)):
        comment = total[i]['content']  # 获取url中的评论
        print(comment)
    j += 1
    next = json_comment['data']['next']  # 获取next中的内容
    next1 = str(next)
    url1 = url + '&cursor=' + next1
    response = requests.get(url1, headers=headers).text
    json_comment = json.loads(response)
    time.sleep(random.choice([0.3, 0.5]))  # 随机延时0.3或者0.5秒,可调时间 要无延时可删除此行
    with open(FilePath, 'a+', encoding='utf-8', ) as f:  # 可选择文件夹
        # "a" - 追加 - 会追加到文件的末尾"w" - 写入 - 会覆盖任何已有的内容
        f.write(comment + '\n')

posted @ 2021-05-02 13:06 cc1236 阅读(100) 评论(0) 收藏举报

刷新页面返回顶部

cc1236

B站短评爬取优化版（来源知乎）

公告