python 第二周(第九天) 我的python成长记 一个月搞定python数据挖掘!(16) -scrapy框架
scrapy 框架
response的解析
>>> response.css('title::text').extract() ['Quotes to Scrape']
There are two things to note here:
(1)one is that we’ve added::textto the CSS query, to mean we want to select only the text elements directly inside<title>element. If we don’t specify::text, we’d get the full title element, including its tags:
(2)the other thing is that the result of calling.extract()is a list, because we’re dealing with an instance ofSelectorList. When you know you just want the first result, as in this case, you can do:
When you know you just want the first result, as in this case, you can do:
>>> response.css('title::text').extract_first()
'Quotes to Scrape'
Besides the extract() and extract_first() methods, you can also use the re() method to extract using regular expressions:
>>> response.css('title::text').re(r'Quotes.*') ['Quotes to Scrape'] >>> response.css('title::text').re(r'Q\w+') ['Quotes'] >>> response.css('title::text').re(r'(\w+) to (\w+)') ['Quotes', 'Scrape']

浙公网安备 33010602011771号