案例-抽屉新热榜:xpath
网址: https://dig.chouti.com/
xpath代码:
import requests
import json
from lxml import etree
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
url = 'https://dig.chouti.com/'
resp = requests.get(url, headers=headers)
resp.encoding = 'UTF-8'
html_tree = etree.HTML(resp.text)
data = html_tree.xpath('//div[@class="main"]/div[2]/div[1]/div')
for item in data:
print("来源==>", item.xpath('.//div[@class="link-detail"]/div/a/span/text()')[0])
print("地址==>", item.xpath('.//div[@class="link-detail"]/a/@href')[0])
print("内容==>", item.xpath('.//div[@class="link-detail"]/a/text()')[0])
浙公网安备 33010602011771号