js逆向的题目,1-5

####

第一题

第一题,找到数据接口,第一页的接口重放攻击是正常的,但是第二页以后的接口重放攻击就是异常的了,

查看正常的和异常的两次请求,正常的请求头里面有一个safe参数,而且每次请求都是变化的,

既然每次都变化,怀疑是和随机数和时间有关,判断是js加密了这个safe参数放到了请求头,然后传递到后端,然后后端校验了这个字段,

所以要找到js是如何加密这个safe的,

var a = '9622';
var timestamp = String(Date.parse(new Date()) / 1000);
var tokens = hex_md5(window.btoa(a + timestamp));
request.setRequestHeader("safe", tokens);
request.setRequestHeader("timestamp", timestamp)

所以这个safe的逻辑就是:时间/1000,然后+9622,然后window.btoa,然后md5加密,

注意1,时间是可以在请求头里面找到的,这个不要自己傻傻的写ew Date()

注意2,window.btoa,这个对应的就是python里面的base64加密

注意3,md5加密,对应的就是python里面的md5加密,

注意4,md5加密是可能网站对这个库进行改动的,如何判断是用的原生的md5,还是改动的,就是使用debug把加密之前的拿到,然后你自己md5加密看看,和他的是不是一样就知道了

上python代码

 

import requests
import urllib3
import base64
import hashlib
import time
import re
import json

urllib3.disable_warnings()

url = "https://www.python-spider.com/cityjson"

cookies = "vaptchaNetway=cn; Hm_lvt_337e99a01a907a08d00bed4a1a52e35d=1628248083,1629106799;" \
          " sessionid=a7ckvdtsz5p6i1udfggnkn5tk6je3dgr; _i=MTYyOTI2NDQ3M35ZV2xrYVc1blgzZHBiakUyTWpreU5qUTBOek16TXpR" \
          "PXw1MmRkNzJhMDk4NDNkNGRmNz$wNDM1Zj$xYjhiOTBlYQ; " \
          "_v=TVRZeU9USTJORFEzTTM1WlYyeHJZVmMxYmxnelpIQmlha1V5VFdwcmVVNX" \
          "FVVEJPZWsxNlRYcFJQWHcxTW1Sa056SmhNRGs0TkROa05HUm1OeiR3TkRNMVpqJHhZamhpT1R$bFlR; " \
          "sign=1629264618748~ca1c4ad08c0e246bfc23632a09b1ef64; Hm_lpvt_337e99a01a907a08d00bed4a1a52e35d=1629264744"

cookies_dict = {cookie.split("=")[0].strip(): cookie.split("=")[1].strip() for cookie in cookies.split(";")}

count_sum = 0

for i in range(1, 86):

    res = requests.get(url, verify=False, cookies=cookies_dict)
    # s = 'var returnCitySN = {"cip": "123.112.20.12", "timestamp": "1629274784"};'
    print(res.text)
    timestamp = re.findall('"(\d{10})"', res.text)[0]
    # print(timestamp)

    safe_s = "9622" + timestamp
    safe_b64 = base64.b64encode(safe_s.encode())
    safe_md5 = hashlib.md5(safe_b64).hexdigest()

    print(safe_md5)
#
    headers = {
        "Host": "www.python-spider.com",
        "Connection": "keep-alive",
        "Content-Length": "0",
        "timestamp": timestamp,
        "safe": safe_md5,
        "Origin": "https://www.python-spider.com",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
        "Accept": "*/*",
        "Referer": "https://www.python-spider.com/challenge/1",
        "Accept-Encoding": "gzip, deflate",
        "Accept-Language": "zh-CN,zh;q=0.9",
    }
#
    url2 = "https://www.python-spider.com/challenge/api/json?page={}&count=14".format(i)
    print(url2)
    res2 = requests.get(url2, verify=False, headers=headers, cookies=cookies_dict)
    # res2 = res2.content.decode("utf-8")
    # res2 = json.loads(res2)
    # print(res2)
    # print(type(res2))

    print(res2.json()["infos"])

    for item in res2.json()["infos"]:
        if "" in item["message"]:
            print(item["message"])
            count_sum += 1

print("count_sum =",count_sum)

 

####

第二题

既然是要找到cookie里面的sign加密字符串,

第一步,hook,cookie看看这个cookie的程序入口在哪里

这个断点调试,太厉害了,

找到函数入口,通过hook的方式,

 

 找堆栈,找到函数入口,

怎么写这个hook,cookie,

document.cookie_bak = document.cookie

Object.defineProperty(document,"cookie",

{set:function(value){

debugger;

return value}

})

这个时候返回document,说明是hook成功了,

然后调试,往上找堆栈,看哪里调用的,

这个方法有一个问题,就是只能hook运行一次,否则就要报错,你可以重新刷新页面,再执行,

 

 

第二步,扣代码

注意1,缺什么,补什么,深度优先,

注意2,使用pycharm调试,需要安装nodejs插件,

注意3,window在nodejs里面是不存在的,所以要使用window = this;的写法

注意4,window.btoa = require("btoa");

第三步,制作一个函数,python调用,

先安装一个包,pip install  PyExecjs -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

 

第二步就是缺什么补什么,要深度优先,把入口的需要的变量函数都找到,补齐,要找到每一个代码的起始,

把执行这一句的所有的变量函数都找到,

比如你找到了变量,里面可能还有使用了其他的变量,函数,也都要抠出来,

比如你找到了函数,你还要往函数里面去补环境,因为函数里面可能还有变量,函数都要抠出来,

技巧1:可以pycharm里面通过运行的方式,看看是缺少什么,然后这样好补,因为缺少的话会报错的,比如变量 undefined,

技巧2,如果碰到了btoa加密,可以使用nodejs的btoa,也就是window.btoa  = nodejs的btoa,

技巧3,如果碰到md5加密,之类的,可以把整个的文件拿过来,放到我们的代码上面作为一个包使用,

技巧4,你只要运行不报错了,说明你扣代码已经结束了,这个时候你打印你想要的内容,比如cookie,应该能得到结果了

js补环境代码:

function SDK_1() {
    window = this

    function md5_ii(a, b, c, d, x, s, t) {
        return md5_cmn(c ^ (b | (~d)), a, b, x, s, t);
    }

    function md5_hh(a, b, c, d, x, s, t) {
        return md5_cmn(b ^ c ^ d, a, b, x, s, t);
    }

    function md5_gg(a, b, c, d, x, s, t) {
        return md5_cmn((b & d) | (c & (~d)), a, b, x, s, t);
    }

    function bit_rol(num, cnt) {
        return (num << cnt) | (num >>> (32 - cnt));
    }

    function safe_add(x, y) {
        var lsw = (x & 0xFFFF) + (y & 0xFFFF);
        var msw = (x >> 16) + (y >> 16) + (lsw >> 16);
        return (msw << 16) | (lsw & 0xFFFF);
    }

    function md5_ff(a, b, c, d, x, s, t) {
        return md5_cmn((b & c) | ((~b) & d), a, b, x, s, t);
    }

    function md5_cmn(q, a, b, x, s, t) {
        return safe_add(bit_rol(safe_add(safe_add(a, q), safe_add(x, t)), s), b);
    }

    function hex_md5(s) {
        return binl2hex(core_md5(str2binl(s), s.length * chrsz));
    }

    var chrsz = 8
    var b64pad = ""

    function str2binl(str) {
        var bin = Array();
        var mask = (1 << chrsz) - 1;
        for (var i = 0; i < str.length * chrsz; i += chrsz)
            bin[i >> 5] |= (str.charCodeAt(i / chrsz) & mask) << (i % 32);
        return bin;
    }

    function core_md5(x, len) {
        /* append padding */
        x[len >> 5] |= 0x80 << ((len) % 32);
        x[(((len + 64) >>> 9) << 4) + 14] = len;

        var a = 1732584193;
        var b = -271733879;
        var c = -1732584194;
        var d = 271733878;

        for (var i = 0; i < x.length; i += 16) {
            var olda = a;
            var oldb = b;
            var oldc = c;
            var oldd = d;

            a = md5_ff(a, b, c, d, x[i + 0], 7, -680876936);
            d = md5_ff(d, a, b, c, x[i + 1], 12, -389564586);
            c = md5_ff(c, d, a, b, x[i + 2], 17, 606105819);
            b = md5_ff(b, c, d, a, x[i + 3], 22, -1044525330);
            a = md5_ff(a, b, c, d, x[i + 4], 7, -176418897);
            d = md5_ff(d, a, b, c, x[i + 5], 12, 1200080426);
            c = md5_ff(c, d, a, b, x[i + 6], 17, -1473231341);
            b = md5_ff(b, c, d, a, x[i + 7], 22, -45705983);
            a = md5_ff(a, b, c, d, x[i + 8], 7, 1770035416);
            d = md5_ff(d, a, b, c, x[i + 9], 12, -1958414417);
            c = md5_ff(c, d, a, b, x[i + 10], 17, -42063);
            b = md5_ff(b, c, d, a, x[i + 11], 22, -1990404162);
            a = md5_ff(a, b, c, d, x[i + 12], 7, 1804603682);
            d = md5_ff(d, a, b, c, x[i + 13], 12, -40341101);
            c = md5_ff(c, d, a, b, x[i + 14], 17, -1502002290);
            b = md5_ff(b, c, d, a, x[i + 15], 22, 1236535329);

            a = md5_gg(a, b, c, d, x[i + 1], 5, -165796510);
            d = md5_gg(d, a, b, c, x[i + 6], 9, -1069501632);
            c = md5_gg(c, d, a, b, x[i + 11], 14, 643717713);
            b = md5_gg(b, c, d, a, x[i + 0], 20, -373897302);
            a = md5_gg(a, b, c, d, x[i + 5], 5, -701558691);
            d = md5_gg(d, a, b, c, x[i + 10], 9, 38016083);
            c = md5_gg(c, d, a, b, x[i + 15], 14, -660478335);
            b = md5_gg(b, c, d, a, x[i + 4], 20, -405537848);
            a = md5_gg(a, b, c, d, x[i + 9], 5, 568446438);
            d = md5_gg(d, a, b, c, x[i + 14], 9, -1019803690);
            c = md5_gg(c, d, a, b, x[i + 3], 14, -187363961);
            b = md5_gg(b, c, d, a, x[i + 8], 20, 1163531501);
            a = md5_gg(a, b, c, d, x[i + 13], 5, -1444681467);
            d = md5_gg(d, a, b, c, x[i + 2], 9, -51403784);
            c = md5_gg(c, d, a, b, x[i + 7], 14, 1735328473);
            b = md5_gg(b, c, d, a, x[i + 12], 20, -1926607734);

            a = md5_hh(a, b, c, d, x[i + 5], 4, -378558);
            d = md5_hh(d, a, b, c, x[i + 8], 11, -2022574463);
            c = md5_hh(c, d, a, b, x[i + 11], 16, 1839030562);
            b = md5_hh(b, c, d, a, x[i + 14], 23, -35309556);
            a = md5_hh(a, b, c, d, x[i + 1], 4, -1530992060);
            d = md5_hh(d, a, b, c, x[i + 4], 11, 1272893353);
            c = md5_hh(c, d, a, b, x[i + 7], 16, -155497632);
            b = md5_hh(b, c, d, a, x[i + 10], 23, -1094730640);
            a = md5_hh(a, b, c, d, x[i + 13], 4, 681279174);
            d = md5_hh(d, a, b, c, x[i + 0], 11, -358537222);
            c = md5_hh(c, d, a, b, x[i + 3], 16, -722521979);
            b = md5_hh(b, c, d, a, x[i + 6], 23, 76029189);
            a = md5_hh(a, b, c, d, x[i + 9], 4, -640364487);
            d = md5_hh(d, a, b, c, x[i + 12], 11, -421815835);
            c = md5_hh(c, d, a, b, x[i + 15], 16, 530742520);
            b = md5_hh(b, c, d, a, x[i + 2], 23, -995338651);

            a = md5_ii(a, b, c, d, x[i + 0], 6, -198630844);
            d = md5_ii(d, a, b, c, x[i + 7], 10, 1126891415);
            c = md5_ii(c, d, a, b, x[i + 14], 15, -1416354905);
            b = md5_ii(b, c, d, a, x[i + 5], 21, -57434055);
            a = md5_ii(a, b, c, d, x[i + 12], 6, 1700485571);
            d = md5_ii(d, a, b, c, x[i + 3], 10, -1894986606);
            c = md5_ii(c, d, a, b, x[i + 10], 15, -1051523);
            b = md5_ii(b, c, d, a, x[i + 1], 21, -2054922799);
            a = md5_ii(a, b, c, d, x[i + 8], 6, 1873313359);
            d = md5_ii(d, a, b, c, x[i + 15], 10, -30611744);
            c = md5_ii(c, d, a, b, x[i + 6], 15, -1560198380);
            b = md5_ii(b, c, d, a, x[i + 13], 21, 1309151649);
            a = md5_ii(a, b, c, d, x[i + 4], 6, -145523070);
            d = md5_ii(d, a, b, c, x[i + 11], 10, -1120210379);
            c = md5_ii(c, d, a, b, x[i + 2], 15, 718787259);
            b = md5_ii(b, c, d, a, x[i + 9], 21, -343485551);

            a = safe_add(a, olda);
            b = safe_add(b, oldb);
            c = safe_add(c, oldc);
            d = safe_add(d, oldd);
        }
        return Array(a, b, c, d);

    }

    var hexcase = 0

    function binl2hex(binarray) {
        var hex_tab = hexcase ? "0123456789ABCDEF" : "0123456789abcdef";
        var str = "";
        for (var i = 0; i < binarray.length * 4; i++) {
            str += hex_tab.charAt((binarray[i >> 2] >> ((i % 4) * 8 + 4)) & 0xF) +
                hex_tab.charAt((binarray[i >> 2] >> ((i % 4) * 8)) & 0xF);
        }
        return str;
    }

    var _$oa = [
        "WFpLV0k=",
        "Y29pRlM=",
        "YXpEbnE=",
        "OyBwYXRoPS8=",
        "RER6V2o=",
        "cGZkekg=",
        "Z2dlcg==",
        "WEpaVEs=",
        "aW5pdA==",
        "VXdNUUw=",
        "bVVvd0U=",
        "amtsS3A=",
        "Y2hhaW4=",
        "TEFDT0Y=",
        "cm91bmQ=",
        "SGRETEU=",
        "VGpsR04=",
        "TUtHaFk=",
        "TlNsalk=",
        "S2h5YUc=",
        "ZGVidQ==",
        "d25MZ3A=",
        "bHFvT0M=",
        "c2lnbj0=",
        "V3pZd3A=",
        "Y1JFV3Q=",
        "dXdQYUs=",
        "T1RFR2M=",
        "T1hMZ04=",
        "TndnQlc=",
        "SHNRVGQ=",
        "dXRmc3o=",
        "Y291bnRlcg==",
        "UHVLTlI=",
        "R29IeVM=",
        "TU9QeWY=",
        "bG9n",
        "d01oYVU=",
        "aUh5RWQ=",
        "cmVsb2Fk",
        "a1lucGw=",
        "bG92WVk=",
        "Uk1CdVo=",
        "bmdtb3k=",
        "TWhZd2g=",
        "dGVzdA==",
        "b1pjVXI=",
        "WU54dEQ=",
        "aGxoVEE=",
        "cXNSZnY=",
        "XCtcKyAqKD86W2EtekEtWl8kXVswLTlhLXpBLVpfJF0qKQ==",
        "bVJZSWc=",
        "ZnVuY3Rpb24gKlwoICpcKQ==",
        "dVZ3emc=",
        "T0VIZHo=",
        "c3RhdGVPYmplY3Q=",
        "Y2JyRFU=",
        "bGVuZ3Ro",
        "dGJ1elA=",
        "a1p6dXQ=",
        "YXBwbHk=",
        "aW5wdXQ=",
        "S05zbWI=",
        "TEFkVmE=",
        "ZGhvTUg=",
        "Q21BbUQ=",
        "SmlmQ0o=",
        "c3RyaW5n",
        "YWN0aW9u",
        "U05nV3E=",
        "Y29va2ll",
        "Y29uc3RydWN0b3I=",
        "SXlMaWE=",
        "d2hpbGUgKHRydWUpIHt9",
        "aktGdkU=",
        "dXpiVXg=",
        "YUlLVnk=",
        "5q2k572R6aG15Y+X44CQ54ix6ZSt5LqR55u+IFYxLjAg5Yqo5oCB54mI44CR5L+d5oqk",
        "amxnWlU=",
        "SFF6RmY=",
        "U0FYVGc=",
        "RGR2Wnk=",
        "dmFsdWVPZg==",
        "VmNoR2U=",
        "ckdSaEc="
    ]

    var _$ob = function (a, b) {
        a = a - 0x0;
        var c = _$oa[a];
        if (_$ob['fVeoOz'] === undefined) {
            (function () {
                var f;
                try {
                    var h = Function('return\x20(function()\x20' + '{}.constructor(\x22return\x20this\x22)(\x20)' + ');');
                    f = h();
                } catch (i) {
                    f = window;
                }
                var g = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
                f['atob'] || (f['atob'] = function (j) {
                        var k = String(j)['replace'](/=+$/, '');
                        var l = '';
                        for (var m = 0x0, n, o, p = 0x0; o = k['charAt'](p++); ~o && (n = m % 0x4 ? n * 0x40 + o : o,
                        m++ % 0x4) ? l += String['fromCharCode'](0xff & n >> (-0x2 * m & 0x6)) : 0x0) {
                            o = g['indexOf'](o);
                        }
                        return l;
                    }
                );
            }());
            _$ob['rYGugk'] = function (e) {
                var f = atob(e);
                var g = [];
                for (var h = 0x0, j = f['length']; h < j; h++) {
                    g += '%' + ('00' + f['charCodeAt'](h)['toString'](0x10))['slice'](-0x2);
                }
                return decodeURIComponent(g);
            }
            ;
            _$ob['okJzdh'] = {};
            _$ob['fVeoOz'] = !![];
        }
        var d = _$ob['okJzdh'][a];
        if (d === undefined) {
            c = _$ob['rYGugk'](c);
            _$ob['okJzdh'][a] = c;
        } else {
            c = d;
        }
        return c;
    };
    var a = {
        'uzbUx': function (d, e) {
            return d + e;
        },
        'yTrWo': _$ob('0x14'),
        'RZoQG': _$ob('0x6'),
        'HQzFf': _$ob('0x37'),
        'IJeEt': function (d, e) {
            return d !== e;
        },
        'mRYIg': _$ob('0x4e'),
        'dhoMH': _$ob('0x34'),
        'oZcUr': _$ob('0x32'),
        'pfdzH': function (d, e) {
            return d(e);
        },
        'PuKNR': _$ob('0x8'),
        'hfxlo': _$ob('0xc'),
        'DdvZy': function (d, e) {
            return d + e;
        },
        'baKIo': _$ob('0x3d'),
        'mUowE': function (d, e) {
            return d !== e;
        },
        'YNxtD': 'RcOux',
        'FFiEx': function (d) {
            return d();
        },
        'NwgBW': 'while\x20(true)\x20{}',
        'kZzut': 'counter',
        'QHHVn': function (d, e, f) {
            return d(e, f);
        },
        'xvdvK': _$ob('0x4d'),
        'jKFvE': 'aiding_win',
        'JifCJ': function (d, e) {
            return d(e);
        },
        'MhYwh': function (d, e) {
            return d(e);
        },
        'aIKVy': function (d, e) {
            return d + e;
        },
        'azDnq': function (d, e) {
            return d(e);
        },
        'WzYwp': function (d, e) {
            return d / e;
        },
        'ngmoy': function (d, e) {
            return d + e;
        },
        'ASPPX': function (d, e) {
            return d + e;
        },
        'DDzWj': _$ob('0x17'),
        'knFPT': function (d, e) {
            return d / e;
        },
        'ZIATq': _$ob('0x3')
    };

    var c = new Date()[_$ob('0x52')]();
    // var c = '1587102734000';
    // console.log(c)

    window.btoa = require('btoa')
    var token = window['btoa'](a[_$ob('0x51')](a[_$ob('0x4a')], a[_$ob('0x42')](String, c)));


    var md = a[_$ob('0x2c')](hex_md5, window['btoa'](a['aIKVy'](a[_$ob('0x4a')], a[_$ob('0x2')](String, Math[_$ob('0xe')](a[_$ob('0x18')](c, 0x3e8))))));

    var cookie = a[_$ob('0x4c')](a[_$ob('0x4c')](a[_$ob('0x4c')](a[_$ob('0x2b')](a[_$ob('0x2b')](a['ASPPX'](a[_$ob('0x4')], Math[_$ob('0xe')](a['knFPT'](c, 0x3e8))), '~'), token), '|'), md), a['ZIATq']);
    // console.log(cookie)
    return cookie
}

console.log(SDK_1())

 

第三步,补完了之后,封装一下,然后供python调用,使用execjs,这个库,编译执行js脚本,获取到我们想要的内容,就可以进行下一步了,

首先你可以把扣代码变成一个文件,文件末尾返回return一个你想要的值,

python代码

import requests
import execjs

with open('sdk.js','r',encoding='utf-8') as f:
        js_text=f.read()
        # print(js_text)
        compile= execjs.compile(js_text)
        cookie=compile.call("SDK_1").split(';')[0].replace('sign=','')
        print(cookie)
        cookies = {
            # 'Hm_lvt_337e99a01a907a08d00bed4a1a52e35d': '1615456972',
            # 'no-alert': 'true',
            'sessionid': '47wd3fm32bz79kezyq1t7dqqzdoahq0v',
            # 'Hm_lpvt_337e99a01a907a08d00bed4a1a52e35d': '1615513367',
            'sign': cookie
        }

        headers = {
            'Connection': 'keep-alive',
            'Pragma': 'no-cache',
            'Cache-Control': 'no-cache',
            'Upgrade-Insecure-Requests': '1',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'Referer': 'http://www.python-spider.com/challenge/2',
            'Accept-Language': 'zh-CN,zh;q=0.9',
        }

        response = requests.get('http://www.python-spider.com/challenge/2', headers=headers, cookies=cookies, verify=False)
        print(response.text)

 

上python代码:

import requests
import urllib3
import execjs

urllib3.disable_warnings()

url = "http://www.python-spider.com/challenge/2"

with open("./2.js", "r") as f:
    js_text = f.read()
    # print(js_text)
    js = execjs.compile(js_text)
    cookie = js.call("SDK_2").split(";")[0].replace("sign=","")

    print(cookie)

cookies = {
    "sessionid": "xm64ecbvpwv036ycfnw07vg6oyqpluxi",
    "sign": cookie,
}

session = requests.session()
res = session.get(url, verify=False, cookies=cookies)

print(res.content.decode())
 

 

 

 

#####

第四题,

ip封禁,

这个每一页都只能一个ip访问,所以需要解决封的ip,

我使用了vps拨号服务器来解决了 这个问题,

 

import os
import time
import requests
import json
import redis


class Spider4:

    def __init__(self):
        self.conn = redis.Redis(host='127.0.0.1', port=6379)

    def net_control(self):
        print("重新拨号")
        os.system("ifdown ppp0")
        # os.system("pppoe-status")
        time.sleep(6)
        os.system("ifup ppp0")
        # os.system("pppoe-status")
        print("拨号成功")

    def get_url(self, url, page):
        headers = {
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 '
                          '(KHTML, like Gecko) Ubuntu Chromium/60.0.3112.113 Chrome/60.0.3112.113 Safari/537.36',
        }

        data = {
            "page": page
        }

        resp = requests.post(url, headers=headers, data=data, timeout=5, verify=False)

        count = 0
        while True:
            count += 1
            if count > 3:
                print("重试大于3次了")
                break

            if "爱锭云盾" in resp.text:
                print("ip被封 重试")
                self.net_control()
                resp = requests.post(url, headers=headers, data=data, timeout=5, verify=False)
            else:
                print("{}次通过 不需要重试".format(count))
                break


        return resp.url, resp.status_code, resp.text

        # print(resp.text)

    def process_data(self, text, page):
        # print(type(json.loads(text,strict=False)))

        try:
            data_dict = json.loads(text, strict=False)
            # print(data_dict["data"])
        except Exception as e:
            print(text)
            print(e)

        num_list = []

        for i in data_dict["data"]:
            num_list.append(int(i["value"]))
            # print(i["value"])

        self.conn.hset("allPageNum", page, sum(num_list))

        print("第 {} 页的总和 sum(num_list)".format(page), sum(num_list))

        return sum(num_list)

    def start(self):
        url = "https://www.python-spider.com/api/challenge4"

        all_page_num_sum = []

        for page in range(87, 101):
            url, status_code, text = self.get_url(url, page)

            page_num = self.process_data(text, page)
            all_page_num_sum.append(page_num)
            self.net_control()

        print("所有页码的总和", sum(all_page_num_sum))


spider4 = Spider4()
spider4.start()

 

 

 

 

 

 

 

 

 

 

###

posted @ 2021-08-16 19:41  技术改变命运Andy  阅读(491)  评论(0编辑  收藏  举报