python 第二周（第九天）我的python成长记一个月搞定python数据挖掘！(16) -scrapy框架

scrapy 框架

response的解析

>>> response.css('title::text').extract()
['Quotes to Scrape']

There are two things to note here:
　　(1)one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside <title> element. If we don’t specify ::text, we’d get the full title element, including its tags:　　
　　(2)the other thing is that the result of calling .extract() is a list, because we’re dealing with an instance of SelectorList. When you know you just want the first result, as in this case, you can do:
When you know you just want the first result, as in this case, you can do:

>>> response.css('title::text').extract_first()
'Quotes to Scrape'

Besides the extract() and extract_first() methods, you can also use the re() method to extract using regular expressions:

>>> response.css('title::text').re(r'Quotes.*')
['Quotes to Scrape']
>>> response.css('title::text').re(r'Q\w+')
['Quotes']
>>> response.css('title::text').re(r'(\w+) to (\w+)')
['Quotes', 'Scrape']

posted @ 2017-08-01 21:07 yugengde 阅读(209) 评论(0) 收藏举报

刷新页面返回顶部

yugengde

python 第二周（第九天） 我的python成长记 一个月搞定python数据挖掘！(16) -scrapy框架

公告

python 第二周（第九天）我的python成长记一个月搞定python数据挖掘！(16) -scrapy框架