scrapy xpath中提取多个class值

xpath中没有提供对class的原生查找方法。但是 stackoverflow 看到了一个很有才的回答:

This selector should work but will be more efficient if you replace it with your suited markup:

//*[contains(@class, 'Test')]  

But since this will also match cases like class="Testvalue" or class="newTest".

但是这个表达式会把类似 class="Testvalue" 或者 class="newTest"也匹配出来。

//*[contains(concat(' ', @class, ' '), ' Test ')]  

If you wished to be really certain that it will match correctly, you could also use the normalize-space function to clean up stray whitespace characters around the class name (as mentioned by @Terry)

如果您希望确定它能够正确匹配,则还可以使用 normalize-space 函数清除类名周围的空白字符(如@Terry所述)

//*[contains(concat(' ', normalize-space(@class), ' '), ' Test ')]  

Note that in all these versions, the * should best be replaced by whatever element name you actually wish to match, unless you wish to search each and every element in the document for the given condition.



html = """
<div class="view_fields">
    <div class="row view_row">
        <!-- row header -->
        <div class="view_field col-xs-4 col-sm-5 col-md-4">Organization 
        <!-- row value -->
          <div class="view_value col-xs-8 col-sm-7 col-md-8">
            <a href="/org/14607">INTERNET HARBOR</a>

items = response.xpath('//div[contains(@class,"view_fields")]')





var result = node.SelectNodes(".//span[not(@class)]");
var result = node.SelectNodes(".//span[not(@class) and not(@id)]");
var result = node.SelectNodes(".//span[not(contains(@class,'expire'))]");
var result = node.SelectNodes(".//span[contains(@class,'expire')]");


--------------------------------------//td[.!= '']---------------------------------------------------------------- if each.xpath("./td[2]/text()[.!= '']"): self.positionType = each.xpath("./td[2]/text()").extract()[0] else: self.positionType = "未知"



posted @ 2018-06-03 14:27  焦国峰的随笔日记  阅读(6090)  评论(1编辑  收藏  举报
// ############################### // ##############################