scrapy shell
shell
- Syntax:
scrapy shell [url] - Requires project: no
Starts the Scrapy shell for the given URL (if given) or empty if no URL is given. Also supports UNIX-style local file paths, either relative with ./ or ../ prefixes or absolute file paths. See Scrapy shellfor more info.
Supported options:
--spider=SPIDER: bypass spider autodetection and force use of specific spider-c code: evaluate the code in the shell, print the result and exit--no-redirect: do not follow HTTP 3xx redirects (default is to follow them); this only affects the URL you may pass as argument on the command line; once you are inside the shell,fetch(url)will still follow HTTP redirects by default.
Usage example:
1、基本shell使用
scrapy shell -s ROBOTSTXT_OBEY=False --no-redirect "https://jigsaw.w3.org/HTTP/300/301.html"
-s:对settings进行设置 ROBOTSTXT_OBEY=False(不遵守机器人协议)
--no-redirect:不进行重定向
2、当url 带参数时,url必须带上引号,否则后面的请求参数会被截掉
scrapy shell -s ROBOTSTXT_OBEY=False --no-redirect "https://www.toutiao.com/search_content/?offset=20&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&cur_tab=3&from=gallery" #2018-09-10 15:45:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.toutiao.com/search_content/?offset=20&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&cur_tab=3&from=gallery> (referer: None) scrapy shell -s ROBOTSTXT_OBEY=False --no-redirect https://www.toutiao.com/search_content/?offset=20&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&cur_tab=3&from=gallery #2018-09-10 15:55:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.toutiao.com/search_content/?offset=20> (referer: None)
3、当shell遇到重定向
scrapy shell -s ROBOTSTXT_OBEY=False --no-redirect "https://jigsaw.w3.org/HTTP/300/301.html" #fetch('https://www.toutiao.com/group/6599226921567912452/',redirect=False) fetch(response.headers['Location'])
3、添加User-Agent
scrapy shell -s USER_AGENT='Mozilla/5.0'
fetch('http://www.baidu.com',headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36'})
from scrapy import Request req = Request('yoururl.com', headers={"header1":"value1"}) fetch(req)
4、shell清屏
import os os.system("clear")

浙公网安备 33010602011771号