A simple search engine demo.
A simple search engine demo.
https://github.com/bobbyz3g/Chihiro
一个基于elasticsearch的全栈应用demo。
使用scrapy爬取数据,输入elasticsearch,
使用Django作为web搜索界面, 调用elasticsearch的搜索接口。
Introduction
Chihiro is a simple search engine demo. It show you how to build a search engine website by using Scrapy, Django and ElasticSearch.
Chihiro consists of ChihiroSearch and ChihiroSpider.
- ChihiroSearch: website backend.
- ChihiroSpider: Spiders.
问题
上面例子, 不是完全使用docker封装,
下面repo对此问题做了改进。
https://github.com/fanqingsong/Chihiro
项目运行,只需要如下命令
docker-compose build docker-compose up
关联知识点
scrapy选择器
https://scrapy-chs.readthedocs.io/zh-cn/latest/topics/selectors.html
爬取目标
http://quotes.toscrape.com/page/1/
elasticsearch
https://elasticsearch-py.readthedocs.io/en/v7.12.0/api.html
https://elasticsearch-dsl.readthedocs.io/en/latest/index.html
Scrapy-Redis is a powerful open source Scrapy extension that enables you to run distributed crawls/scrapes across multiple servers and scale up your data processing pipelines.
https://www.baeldung.com/linux/docker-cmd-multiple-commands
6. Run Multiple Commands With a Shell Script
Sometimes, we need to do more complex processing, instead of chaining a few commands. If this is the case, we can create a shell script that contains all the necessary logic, and copy it to the container’s filesystem. We can use the Dockerfile COPY directive to copy the shell script file to our container’s filesystem.
Let’s create a simple Dockerfile for our image:
FROM ubuntu:latest COPY startup.sh . CMD ["/bin/bash","-c","./startup.sh"]
We’ll use the COPY directive to copy the shell script to the container’s filesystem. We’ll execute the script with the CMD directive.
Next, we’ll create the startup.sh shell script:
#! /bin/bash echo $HOME date
The above script prints our home directory in the container and the current date. An important note is that we should grant the execute privilege to the shell script on the host machine. This is because the execute privilege will be transferred to our container when we copy the file. Otherwise, the container won’t be able to execute the script when it starts.