docker搭建elasticsearch和kibana环境 电影系统检索实战

elasticsearch 存储和检索数据;kibana 结构化查询和展现数据。类似mysql和Navicat的关系。
docker安装步骤如下:

相关参数可官网查询

https://hub.docker.com//kibana/
https://hub.docker.com/
/elasticsearch/

下载镜像

查看最新版本:https://hub.docker.com/_/elasticsearch?tab=tags

docker pull elasticsearch:7.12.0
docker pull kibana:7.12.0

创建网桥,这样es和kibana可以通过名称链接,不需要关注ip变更的问题

docker network create esnetwork

启动es

可以通过-v绑定虚拟机的地址和宿主机文件地址,便于外部修改后docker重启即可神效
docker run -d --name es --net esnet -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.12.0
docker run -d --name kibana --net esnet -p 5601:5601 -e ELASTICSEARCH_HOSTS=http://es:9200 kibana:7.12.0 
# 上文es:9200中的es表示 启动的elasticsearch的name。 同同一个网络桥接可以这么搞。 查看docker的网络
docker network inspect esnet

image

也可以直接在docker里ping/curl

image

通过kibana操作es效果

image

创建mapping

安装分词插件:https://www.cnblogs.com/gwyy/p/12205257.html

中文分词

ik_smart:粗粒度分词;ik_max_word:细粒度分词
https://github.com/medcl/elasticsearch-analysis-ik/tags

拼音分词词

https://github.com/medcl/elasticsearch-analysis-pinyin

实战

1.中文检索系统

创建索引

curl -XDELETE http://localhost:9200/index
curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d' {
	"properties": {
		"content": {
			"type": "text",
			"analyzer": "ik_max_word",
			"search_analyzer": "ik_smart"
		}
	}
}'

写入数据

curl -XPOST http://localhost:9200/index/_doc/1 -H 'Content-Type:application/json' -d'  {"content":"美国留给伊拉克的是个烂摊子吗"}  '
curl -XPOST http://localhost:9200/index/_doc/3 -H 'Content-Type:application/json' -d'  {"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}  '
curl -XPOST http://localhost:9200/index/_doc/4 -H 'Content-Type:application/json' -d'  {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}  '

执行查询

curl -XGET http://localhost:9200/index/_search -H 'Content-Type:application/json' -d'  {
  "query":
          {"match":{ "content":"中国"}}
}'

2. 电影检索系统

获取数据集

可以直接爬虫,或者下载已有数据集
google搜索:the Movies Dataset :https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset/code
一般是csv格式,相对json文件,数据更加小巧。导入es的话需要解析csv文件,产出json格式,示例如下:

package main

import (
	"encoding/csv"
	"fmt"
	"io"
	"log"
	"os"
)
var keys=[]string{"id","imdb_id","original_language","original_title","overview","popularity","release_date","runtime","status","tagline","title","vote_average","vote_count"}
func extractCSV()  {
	f_name := "data/the_movies_dataset/movies_metadata.csv"
	f,err:=os.Open(f_name)
	panicError(err)

	r:=csv.NewReader(f)
	row,err:=r.Read()
	panicError(err)
	log.Printf("%v",row)
	keysIndex :=make([]int,len(keys))
	start:=0
	for idx,k:=range row{
		if k==keys[start]  {
			keysIndex[start]=idx
			start++
		}
	}
	log.Printf("debug:%v,%v",keysIndex,keys)
	for  {
		record,err:=r.Read()
		if err == io.EOF{
			break
		}
		//log.Printf("record:%+v,%d",record,len(record))
		if len(record)<keysIndex[len(keysIndex)-1] {
			continue
		}
		result:=make(map[string]interface{})
		for idx,i:=range keysIndex{
			result[keys[idx]]=record[i]
		}
		fmt.Printf(`{"index":{"_id":"%s"}}`, record[keysIndex[0]])
		fmt.Printf("\n%+v\n",JsonMarshal(result))
	}
}

创建索引

选择合适的字段创建索引,数据集里默认有20+字段,只选择有用的字段产出

DELETE /movie
PUT /movie/
 {
   "mappings":{
	"properties": {
		"overview": {
			"type": "text"
		},
		"id":{
			"type":"long"
		},
		"imdb_id":{
			"type":"keyword"
		},
		"original_title":{
		  "type":"text"
		},
		"title":{
		  "type":"text"
		},
		"popularity":{
		  "type":"double"
		},
		"vote_average":{
		  "type":"double"
		},
		"vote_count":{
		  "type":"long"
		},
		"release_date":{
		  "type":"date"
		}
	}}
}

bulk方式批量导入数据

{"index":{"_id":"862"}}
{"id":"862","imdb_id":"tt0114709","original_language":"en","original_title":"Toy Story","overview":"Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.","popularity":"21.946943","release_date":"1995-10-30","runtime":"81.0","status":"Released","tagline":"","title":"Toy Story","vote_average":"7.7","vote_count":"5415"}
{"index":{"_id":"8844"}}
{"id":"8844","imdb_id":"tt0113497","original_language":"en","original_title":"Jumanji","overview":"When siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.","popularity":"17.015539","release_date":"1995-12-15","runtime":"104.0","status":"Released","tagline":"Roll the dice and unleash the excitement!","title":"Jumanji","vote_average":"6.9","vote_count":"2413"}

curl -XPOST "http://localhost:9200/movie/_bulk" -H 'Content-Type: application/json' --data-binary '@es_bulk.txt' 

检索

bool 表达式检索,通过boost提权,filter过滤,match搜索,must必须命中

GET /movie/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "title": {
            "query": "the"
          }
        }}
      ], 
      "should": [
        {"match": {"title": {"query":"Love and peace","boost":10}}},
        {"match": {"overview": {"query": "LOVE and peace"}}}
      ],
      "filter": [
        {"term": {
          "original_language": "zh"
        }}
      ]
    }
  },"_source": ["title","overview","original_title"], 
  "aggs": {
    "types_count" : { "terms": { "field" : "vote_average"} }
  }
}
posted @ 2021-04-04 17:18  dancingwolves  阅读(221)  评论(0)    收藏  举报