python采集京东商品详情页面数据,京东API接口,京东h5st签名(2023.08.20)
一、原理与分析
1、目标页面
https://item.jd.com/6515029.html
在chrome中打开,按f12键进入开发者模式,找到商品详情数据接口,如下:

2、URL链接:
https://api.m.jd.com/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14
3、标头:
:authority: api.m.jd.com :method: GET :path: /?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1692499380806&body=%7B%22skuId%22%3A6515029%2C%22cat%22%3A%221316%2C1381%2C1391%22%2C%22area%22%3A%2225_2258_0_0%22%2C%22shopId%22%3A%221000099941%22%2C%22venderId%22%3A1000099941%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppp1pppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%7D&h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e&x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX&loginType=3&uuid=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14 :scheme: https Accept: application/json, text/javascript, */*; q=0.01 Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9 Cookie: shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdu=16893052418291576334291; mba_muid=16893052418291576334291; wlfstk_smdl=4qftb0r6lu47t0sx6ovvi37no1pu4y49; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; retina=0; appCode=msc588d6d5; webp=1; visitkey=8718662230147716920; sc_width=1536; wxa_level=1; cid=9; jxsid=16924405174098442434; __jdv=122270672%7Cdirect%7C-%7Cnone%7C-%7C1692440521537; equipmentId=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; fingerprint=ba1afe80c24e71237978e1b005ec6a48; deviceVersion=115.0.0.0; deviceOS=; deviceOSVersion=; deviceName=Chrome; warehistory="10072773656365,10072773656365,10072773656365,10072773656365,"; autoOpenApp_downCloseDate_autoOpenApp_autoPromptly=1692441025259_1; __wga=1692441027033.1692440547180.1691914712301.1691914712301.4.2; PPRD_P=UUID.16893052418291576334291-LOGID.1692441027044.644926152; __jd_ref_cls=MProductdetail_CouponFloorExpo; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1692440521.1692498368.14; token=a4d78cd04f402b3f7ad6a29e8af8aa6f,2,940277; __tk=krazkYhsAcgzjrhtAuewjueDjufpArg5BVoz4zttAzG,2,940277; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX; _gia_d=1; __jdb=122270672.2.16893052418291576334291|14.1692498368; shshshfpb=xbVnfPmoZnca-0u5O8YJzHQ; areaId=25; ipLoc-djd=25-2258-0-0 Origin: https://item.jd.com Referer: https://item.jd.com/ Sec-Ch-Ua: "Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115" Sec-Ch-Ua-Mobile: ?0 Sec-Ch-Ua-Platform: "Windows" Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 X-Referer-Page: https://item.jd.com/6515029.html X-Rp-Client: h5_1.0.0
4、接口返回数据:
其中包括:商品图片地址,商品价格,标题,等信息,正是我们所需要的。
(数据量太大,截了一小部分)
{ "extendWarrantyInfo": { "descUrl": "https://baozhang.jd.com/static/serviceDesc", "detailUrl": "https://b.jr.jd.com/service/serveIntroduce/#/introduce3?mainSkuId={mainSkuId}&brandId={brandId}&thirdCategoryId={cid3}&bindSkuId={bindSku}", "serviceItems": [ {
5、数据分析
(1)body参数
经过分析发现,URL里body包含请求参数详情,body经过了url编码,解码后如下:
{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_0_0","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}
"skuId":6515029为商品编号;"shopId":"1000099941"为店铺编号;其它参数跟浏览器等硬件环境有关,可固定不变。
(2)appid参数
指示接口类别,数据值如下:
appid=pc-item-soa pc端数据详情;
appid=item-v3 数据版本v3;
(3)functionId参数
指示该接口的功能:
functionId=pc_detailpage_wareBusiness pc端商品页面详情
functionId=pc_club_productCommentSummaries pc端评论接口数据
functionId=recDivinerApi 商品页有关数据
functionId=pctradesoa_getprice 返回价格信息
functionId参数不同,body里面的具体参数也不一样。
(4)x-api-eid-token参数
x-api-eid-token=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNAAAAAMKCDJFEVIAAAAAC5FNEJMJ5UGYTMX
经测试,该 参数并不会被服务器校验,因此可忽略,不影响数据采集。
(5)h5st参数(数据签名)
h5st=20230820104308635%3B9m99mz6itng955u3%3Bfb5df%3Btk02w99fb1bc541lMisxd2I5N0tm7s66XeObtysPWoIPlRdJ92-R1cXDBQzPnH5QrNdDMfm18N7zHpJuWML8dwJhOORi%3Bed6048632bdcf647c9a4db5b69b49569%3B4.1%3B1692499388635%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d6272cc6f198b41da73fbe26adfe0d7e3723450ed4c906efbd52e0671d7ab8bd9af7bfc208a38071126c8c70d775962c87b10b611b4f8489070e9d264c47c25dbd35aabe0addff39a3c732105c114056f93a71acfb90156d61b39e11217d5bf21c2e
h5st是京东数据签名参数,每个接口都需要。只有签名正确,服务器才会返回数据。不然就会出现多次请求偶尔返回一次数据的情况。
所以,要想采集到数据,必须得到h5st正确的签名。下面具体分析h5st的签名过程:
二、h5st签名分析
1、查找h5st签名算法的位置
全局搜索:getDataColor,为什么要搜索getDataColor,因为h5st算法就在这个函数的附近。
设下断点,刷新页面,截图如下:

可以直观的看到具体签名过程如下:
try { var d = JSON.parse(JSON.stringify(r)); d.body = SHA256(s).toString(), window.PSign.sign(d).then(function(e) { r.h5st = encodeURI(e.h5st); //...................... }
签名语句:window.PSign.sign(d);
然后返回: r.h5st = encodeURI(e.h5st);
是一个异步过程。
2、下面具体分析各个签名参数:
(1)body参数
{"skuId":6515029,"cat":"1316,1381,1391","area":"25_2258_2261_6568","shopId":"1000099941","venderId":1000099941,"paramJson":"{\"platform2\":\"1\",\"specialAttrStr\":\"p0ppppppppppp1pppppppppppp\",\"skuMarkStr\":\"00\"}","num":1,"bbTraffic":""}
(2)d参数:
{ "appid": "pc-item-soa", "functionId": "pc_detailpage_wareBusiness", "client": "pc", "clientVersion": "1.0.0", "t": 1692498783586, "body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496" }
d.body = SHA256(s).toString()
按f11单步跟进,发现SHA256的位置。抠下来:

(3)t参数
t:a
a = (new Date).getTime()
t参数是一个时间戳。
签名参数分析完了,下面寻找h5st签名算法。
3、h5st签名算法
在window.PSign.sign(d)处下断点,按f11键单步进入:

进入h5st签名的js文件后,把该签名文件整个保存下来。该js文件名为:js_security_v3_0.1.4.js
4、h5st签名返回字符串:
{ "appid": "pc-item-soa", "functionId": "pc_detailpage_wareBusiness", "client": "pc", "clientVersion": "1.0.0", "t": 1692498783586, "body": "dddd48059b91f87eb42b080167bd70b5303b3df8c4b71a3967372fcda60cd496", "_stk": "appid,body,client,clientVersion,functionId,t", "_ste": 1, "h5st": "20230820131419818;9m99mz6itng955u3;fb5df;tk03w9d441cbf18nk990HQLMH0ehQyR5j8EBXtSrYlGtY8KzYUkKCoUctg6u1pqtBeAqYw-t1yFcromGuN17RlgILtyk;65001318ffed0d17ee21652afb01a996;4.1;1692508459818;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980acde1a10effcc27fd84ad69b6a255fa2bacbfc5a0cc8222e4ac53b669906820b1461c75971601a3f031b5c1f40b721502f3b79e32d29b726ebec75a213493a818f67211b187fcf51e032e0b772bee8c70e4a1d7502aa775b148a504a31d627d6db4fde5974622b566cdace3d88a8999574369ad4a27c752e256a8a6d92a5fdfa8633dae1aa5d17f9ea6a859ed6b22c920d7881227b2f7f61f3bbf82c17afd340c42be154e8e3ad1d39c2d8ba94acb84c25299080b5545acc894168647303ed" }
其中的h5st字段是我们所需要的。
三、在python等其它语言中调用签名接口
js_security_v3_0.1.4.js是具体的签名文件,但还不能在python中直接调用,会报缺少window的错误,因此需要补环境。
技术支持:复制:byc6352
下面的python代码是调用签名及请求接口(环境已补):
# -*- coding: UTF-8 -*- import requests,json import pkgutil import time from urllib.parse import urlparse, parse_qs, urlunparse import hashlib import execjs from urllib.parse import quote import io import sys def savetofile(text,filename): file = open(filename, "w",encoding='utf-8' ) file.write(text) file.close() def print_hi(name): # Use a breakpoint in the code line below to debug your script. print(f'Hi, {name}') # Press Ctrl+F8 to toggle the breakpoint. def jd(skuid): appid='item-v3' functionId='recDivinerApi' body={"lid":27,"lim":15,"ec":"utf-8","uuid":"16900368971511636315768","pin":"","p":902029,"sku":skuid,"ck":"pin,ipLocation,atw,aview","c1":1316,"c2":1387,"c3":11932,"securityToken":"iJJJBrR7BAxWWavOluQxmMQ","clientChannel":"3","clientPageId":"item.jd.com"} js_file = open("h5st.js", "r", encoding='utf-8') js=js_file.read() exc = execjs.compile(js) url= exc.call("sign", appid,functionId,body) print('url='+url) headers={ "Authority": "api.m.jd.com", "Accept": "application / json, text / javascript, * / *; q = 0.01", "Accept - Encoding": "gzip, deflate, br", "Accept - Language": "zh - CN, zh;q = 0.9", "Cookie": "shshshfpb=i0ZU6VlHi9tt1RukWDDyR0w; 3AB9D23F7A4B3C9B=GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTJKNBR32WP5NA7JKC4CLDZDF5AIRXNA; shshshfpa=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; shshshfpx=cb3af5e3-c2cf-dae5-48e3-c2331a38092a-1653253655; __jdc=122270672; __jdv=122270672|direct|-|none|-|1689305241830; __jdu=16893052418291576334291; areaId=25; ipLoc-djd=25-2258-2261-6568; token=7a3a5010c8ea7250057d9168270daacd,2,939221; __tk=be32047e11adf495830ad564f7c34cd6,2,939221; 3AB9D23F7A4B3CSS=jdd03GZSZ6SPDPJZS6ARBGAUDIS7NMVC2A24XK6SN4JCWH44HGMYJVGXZIEY2SHDTRiDY9CRQSU93J9SUTiPmFy3PTP7N8itsNd7DLuiPzfoEjAAACXCBKUWUQMP7FMX; _gia_d=1; jsavif=1; __jda=122270672.16893052418291576334291.1689305242.1690550636.1690599310.7; __jdb=122270672.1.16893052418291576334291|7.1690599310", "Origin": "https://item.jd.com", "Referer": "https://item.jd.com/", "Sec-Ch-Ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"", "Sec-Ch-Ua-Mobile":"?0", "Sec-Ch-Ua-Platform":"\"Windows\"", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-site", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", "X-Referer-Page": f"https://item.jd.com/{skuid}.html", "X-Rp-Client": "h5_1.0.0", } res=requests.get(url=url, headers=headers) print(res) text=res.text savetofile(text,"sku.txt") print(text) return text # Press the green button in the gutter to run the script. if __name__ == '__main__': print_hi('最新4.1版本h5st签名返回商品详情。技术支持:byc6352') jd(100019322424)
四、在python中成功返回商品详情信息

大功造成!

浙公网安备 33010602011771号