课堂直播视频-爬虫-1

       最近看我外甥女数学不会,我就从网上买了本辅导书-课堂直播,上面的例题可以扫一扫看老师讲解,这样也太费事了,看看能不能自动下载下来。打开一看flash。。。。。

 

  哎,不行也得行啊,直接上,找到总的地址。

http://ktzb.feelmoore.com/LiveClassroom/nodes/getRootNode?isbn=9787552258608&nodeType=5

  一步步来,先获取全部数据在说,f12简单操作,找到这个。

不说了,直接上代码。
def get_parent_url(id, node_type):
    try:
        result = []
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",

            "Host": "ktzb.feelmoore.com",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Language": "zh-CN,zh;q=0.9",
            "Connection": "keep-alive",
            "Referer": "http://ktzb.feelmoore.com/LivingClassroom/index.html?isbn=" + str(id),
            "X-Requested-With": "XMLHttpRequest"
        }
        data = requests.get(
            "http://ktzb.feelmoore.com/LiveClassroom/nodes/getRootNode?isbn=" + str(id) + "&nodeType=" + str(node_type),
            headers=headers).text
        for element in data:
            temp = {}
            temp['名字'] = element['title']
            temp['id'] = element['id']
            result.append(temp)
        return result
    except:
        print("失败")
get_parent_url("9787552258608","5")

  打开下一页,f12一下,我去,都不用我爬。

def get_children_url(id, parentId):
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",

            "Host": "ktzb.feelmoore.com",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Language": "zh-CN,zh;q=0.9",
            "Connection": "keep-alive",
            "Cookie": "PHPSESSID=ctjqa2ka7s5cie0crq023segs4",  # 这里不是实时爬,所以直接抄它的cookie,在失效时间内凑合用
            "Referer": "http://ktzb.feelmoore.com/LivingClassroom/index.html?isbn=" + str(id),
            "X-Requested-With": "XMLHttpRequest"
        }
        data = requests.get("http://ktzb.feelmoore.com/LiveClassroom/nodes/getChildrenNode?parentId=" + str(parentId),
                            headers=headers).text
        data = {"data": data}
        with open("data/" + str(parentId) + '.json', "w", encoding="utf8") as dump_f:
            json.dump(data, dump_f, ensure_ascii=False, indent=2)
        return data
    except:
        print(parentId + "失败")
# for element in data:
#     get_children_url('9787552258608', element['id'])

  最终结果:

   在这得瑟了,还以为有多难,结果就是啪啪打脸。。。。。。下回看我咋就被啪啪打脸。

 

posted @ 2020-08-09 17:53  boblee12  阅读(132)  评论(0)    收藏  举报