爬虫之数据提取: jsonpath模块
3.数据提取-jsonpath模块
知识点
-
了解 jsonpath模块的使用场景
-
掌握 jsonpath模块的使用
3.1. jsonpath模块的使用场景
如果有一个多层嵌套的复杂字典,想要根据key和下标来批量提取value,这是比较困难的。jsonpath模块就能解决这个痛点,接下来我们就来学习jsonpath模块
jsonpath可以按照key对python字典进行批量数据提取
3.2. jsonpath模块的使用方法
3.2.1 jsonpath模块的安装
jsonpath是第三方模块,需要额外安装
pip install jsonpath
from jsonpath import jsonpath ret = jsonpath(a, 'jsonpath语法规则字符串')
data = { 'key1': { 'key2': { 'key3': { 'key4': { 'key5': { 'key6': 'python' } } } } } } # 正常获取key6的值 print(data['key1']['key2']['key3']['key4']['key5']['key6']) # 使用jsonpath获取key6的值 from jsonpath import jsonpath # jsonpath的结果是一个列表 print(jsonpath(data, '$.key1.key2.key3.key4.key5.key6')) print(jsonpath(data, '$..key6'))
# 练习,将json数据转为字典,再获取字典中的数据 book_dict = '''{ "store": { "book": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 }, { "category": "fiction", "author": "Herman Melville", "title": "Moby Dick", "isbn": "0-553-21311-3", "price": 8.99 }, { "category": "fiction", "author": "J. R. R. Tolkien", "title": "The Lord of the Rings", "isbn": "0-395-19395-8", "price": 22.99 } ], "bicycle": { "color": "red", "price": 19.95 } } }''' import json from jsonpath import jsonpath data = json.loads(book_dict) # 获取自行车的颜色 ret = jsonpath(data, '$.store.bicycle.color') print(ret) print(jsonpath(data, '$..color')) # 获取商店所有产品的价格 ret = jsonpath(data, '$..price') print(ret) 练习一:基本使用

我们以拉勾网城市JSON文件 <http://www.lagou.com/lbs/getAllCitySearchLabels.json> 为例,获取所有城市的名字的列表 #!/usr/bin/env python # -*- coding:utf-8 -*- import requests import json import jsonpath headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36" } response = requests.get('https://www.lagou.com/lbs/getAllCitySearchLabels.json', headers=headers) dict_data = json.loads(response.content) name_list = jsonpath.jsonpath(dict_data, '$..name') print(name_list) 练习二:拉钩网数据提取
知识点:掌握 jsonpath模块的使用


浙公网安备 33010602011771号