【Python】Xpath的简单使用

xpath是解析网页的另一种方法,该方法最大的好处在于可以直接复制路径。

 

 复制粘贴得到路径:/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]

#导入模块并爬取页面
import requests
from lxml import etree

url = "https://www.qidian.com/"
headers = {
    "user-agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Mobile Safari/537.36"
}
response = requests.get(url,headers = headers)
response.encoding="utf-8"
html = etree.HTML(response.text)

xpath的复制粘贴路径使用方法:

print(html.xpath('/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]/text()'))#获取标签内容
#运行结果:['云罱']

注意输出结果为列表,若想输出字符串,可以这样:

print(html.xpath('/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]/text()')[0])#获取标签内容

//代表相对路径下的所有元素,可以隔代,不管其位置, /代表绝对路径,无法隔代

##指定绝对路径##
print(html.xpath('/html/body/div[@class="wrap"]/div[@class="index-two-wrap box-center mb40 cf"]/div[@class="book-list-wrap mr30 fl"]/div[@class="book-list"]/ul/li[@data-rid="1"]/a[@data-eid="qd_A104"]/text()'))
#运行结果:['云罱']
print(html.xpath('/html/body/div[2]/div[6]/div[1]/div/ul/li[1]/a[2]/@href'))#获取标签href链接
#运行结果:['//me.qidian.com/authorIndex.aspx?id=2852726']
print(html.xpath('/html/body//a[@data-eid="qd_A104"]/text()'))#所有包含data-eid="qd_A104"的a标签内容
#运行结果:['云罱', '中二的菌菇', '夜夜烨烨', '愁啊愁', '翔炎', '寸夕日', '皇家雇佣猫', '红丸辣椒酱', '蚂蚁吃萝卜', '永恒的恒星', '唐家三少', '屠鸡剑神', '山俪', '老翻译家', '鸭不先知', '真愚老人', '南天有雪']
print(html.xpath('/html/body//a[@data-eid="qd_A104"]/@href'))#所有包含data-eid="qd_A104"的href链接内容
#运行结果:['//me.qidian.com/authorIndex.aspx?id=2852726', '//me.qidian.com/authorIndex.aspx?id=9588053', '//me.qidian.com/authorIndex.aspx?id=8138524', '//me.qidian.com/authorIndex.aspx?id=9057974', '//me.qidian.com/authorIndex.aspx?id=29652', '//me.qidian.com/authorIndex.aspx?id=402401261', '//me.qidian.com/authorIndex.aspx?id=400339097', '//me.qidian.com/authorIndex.aspx?id=430543775', '//me.qidian.com/authorIndex.aspx?id=430546303', '//me.qidian.com/authorIndex.aspx?id=10627245', '//me.qidian.com/authorIndex.aspx?id=4921', '//me.qidian.com/authorIndex.aspx?id=430503963', '//me.qidian.com/authorIndex.aspx?id=430559705', '//me.qidian.com/authorIndex.aspx?id=10877310', '//me.qidian.com/authorIndex.aspx?id=430174122', '//me.qidian.com/authorIndex.aspx?id=402352953', '//me.qidian.com/authorIndex.aspx?id=430295861']

运行结果:

 

posted @ 2021-07-02 20:06  山鬼谣`  阅读(342)  评论(0)    收藏  举报