【Python】Xpath的简单使用
xpath是解析网页的另一种方法,该方法最大的好处在于可以直接复制路径。
复制粘贴得到路径:/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]
#导入模块并爬取页面 import requests from lxml import etree url = "https://www.qidian.com/" headers = { "user-agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Mobile Safari/537.36" } response = requests.get(url,headers = headers) response.encoding="utf-8" html = etree.HTML(response.text)
xpath的复制粘贴路径使用方法:
print(html.xpath('/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]/text()'))#获取标签内容 #运行结果:['云罱']
注意输出结果为列表,若想输出字符串,可以这样:
print(html.xpath('/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]/text()')[0])#获取标签内容
//代表相对路径下的所有元素,可以隔代,不管其位置, /代表绝对路径,无法隔代
##指定绝对路径## print(html.xpath('/html/body/div[@class="wrap"]/div[@class="index-two-wrap box-center mb40 cf"]/div[@class="book-list-wrap mr30 fl"]/div[@class="book-list"]/ul/li[@data-rid="1"]/a[@data-eid="qd_A104"]/text()')) #运行结果:['云罱'] print(html.xpath('/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]/@href'))#获取标签href链接 #运行结果:['//me.qidian.com/authorIndex.aspx?id=2852726'] print(html.xpath('/html/body//a[@data-eid="qd_A104"]/text()'))#所有包含data-eid="qd_A104"的a标签内容 #运行结果:['云罱', '中二的菌菇', '夜夜烨烨', '愁啊愁', '翔炎', '寸夕日', '皇家雇佣猫', '红丸辣椒酱', '蚂蚁吃萝卜', '永恒的恒星', '唐家三少', '屠鸡剑神', '山俪', '老翻译家', '鸭不先知', '真愚老人', '南天有雪'] print(html.xpath('/html/body//a[@data-eid="qd_A104"]/@href'))#所有包含data-eid="qd_A104"的href链接内容 #运行结果:['//me.qidian.com/authorIndex.aspx?id=2852726', '//me.qidian.com/authorIndex.aspx?id=9588053', '//me.qidian.com/authorIndex.aspx?id=8138524', '//me.qidian.com/authorIndex.aspx?id=9057974', '//me.qidian.com/authorIndex.aspx?id=29652', '//me.qidian.com/authorIndex.aspx?id=402401261', '//me.qidian.com/authorIndex.aspx?id=400339097', '//me.qidian.com/authorIndex.aspx?id=430543775', '//me.qidian.com/authorIndex.aspx?id=430546303', '//me.qidian.com/authorIndex.aspx?id=10627245', '//me.qidian.com/authorIndex.aspx?id=4921', '//me.qidian.com/authorIndex.aspx?id=430503963', '//me.qidian.com/authorIndex.aspx?id=430559705', '//me.qidian.com/authorIndex.aspx?id=10877310', '//me.qidian.com/authorIndex.aspx?id=430174122', '//me.qidian.com/authorIndex.aspx?id=402352953', '//me.qidian.com/authorIndex.aspx?id=430295861']
运行结果: