爬虫基础之xpath基础语法

1. 路径查找
  //：子孙节点，不考虑层级
  /：找直接子节点
1. 谓词查询
  //div【@id="content"】
1. 属性查询
  //@class
1. 模糊查询
  //div[contains(@id,"he")]
  //div[starts-with(@id, "he")]
1. 内容查询
  //div/h1/text()
1. 逻辑查询
  //div[@id="head" and @class="s_down"]
  //text | //price
1. 节点轴选择
ancestor轴获取所有祖先节点
result = html.xpath('//li[1]/ancestor:😗')
result = html.xpath('//li[1]/ancestor::div')
attribute获取所有属性值
result = html.xpath('//li[1]/attribute:😗')
child轴获取直接子节点
result = html.xpath('//li[1]/child::a[@href="link1.html"]')
descendant轴获取所有子孙节点
result = html.xpath('//li[1]/descendant::span')
following轴获取当前节点之后所有节点
result = html.xpath('//li[1]/following:😗[2]')
following—sibling轴获取当前节点后的同级节点
result = html.xpath('//li[1]/following-sibling:😗')
1. 获取子标签的所有文本
  .xpath('normalize-space(string())')
  内容查找
  //a[normalize-space(text())='货运表现']
1. 不要携带某个标签
  ul[not(@style="display: none;")]

posted on 2024-01-23 15:27 HelloJacker 阅读(19) 评论(0) 收藏举报

刷新页面返回顶部

公告