ElasticSearch快速入门
一、简介
Elasticsearch (简称“ES”)是分布式搜索和分析引擎。Logstash 和 Beats 将他们收集的数据存储到 ES。Kibana 提供可视化以及用户交互良好的方式来将ES的数据进行探索、监控还有可视化报表。
- ElasticSearch 数据仓库,存放数据的空间
- Logstash/Beats 仓库采购员/搬运工,收集和分类数据
- Kibana 仓库的管理员,把数据分析后再呈现
ES中的数据模型:文档(Document) 和 索引(Index)
ES将数据序列化为 JSON 格式的文档进行存储,索引是优化的文档集合,文档是字段(键值对)的集合。如果字段是文本数据类型(text),存储的数据结构是倒序索引,支持快速的全文搜索。而字段是数字类(numeric)和地理信息类(geo),结构是BKD树。
需要知道的是倒序索引,会列出每一个唯一的词,不管它在哪一个文档并且出现过几次,并标识该词出现的所有文档。
无模式(schema-less)
对文档写入的模式约束灵活,文档要存多少字段,以及字段类型可以不做约定。即使做了约定,还可以存储没有约定的字段。比如:
要存储图书的信息,事先约定了属性 id、name和price。但是你在写入时,可以写入 description 字段的数据。
想想看这在关系数据库是不允许的,而且存数据前一定要数据建模(schema),对写入有强约束。
ES的模式(schema)这里类似对应的是映射(mapping)
搜索和分析
- 搜索 REST API 结构化查询,本质上是JSON风格的查询用的特定领域语言(Query DSL)
- 分析 聚合查询对数据获取摘要,求平均数、中位数等等
可扩展性和弹性
ES 是分布式的搜索和分析引擎。多集群和多节点复制副本可以容灾,分区将同一份数据较为均匀分布在多个集群/节点上,防止某一节点/集群过载,随着需求量变化,始终可用。
二、安装 ElasticSearch
安装前的准备
为了更好的操作ES,还要安装 Kibana。
安装前要装好 Docker
1.创建 network
docker network create elastic
2.创建目录 esdatadir
mkdir esdatadir
mkdir esdatadir/config
touch esdatadir/config/elasticsearch.yml
mkdir esdatadir/data
mkdir esdatadir/logs
mkdir esdatadir/plugins
# 设置读写权限
chmod -R 777 esdatadir
3.编辑elasticsearch.yml
http.host: 0.0.0.0
transport.host: 0.0.0.0
cluster.name: "docker-cluster"
node.name: es01
http.cors.enabled: true
http.cors.allow-origin: "*"
Docker 安装 ElasticSearch 7.17.1
1.拉镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.17.1
2.运行容器
容器名称取es01好了
cd esdatadir
docker run -id --name es01 \
-p 9200:9200 \
-p 9300:9300 \
--net elastic \
-v $PWD/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v $PWD/data:/usr/share/elasticsearch/data \
-v $PWD/logs:/usr/share/elasticsearch/logs \
-v $PWD/plugins:/usr/share/elasticsearch/plugins \
-e "discovery.type=single-node" \
docker.elastic.co/elasticsearch/elasticsearch:7.17.1
cd ../
3.验证运行成功
大概等个10秒钟启动完成后
#curl -XGET http://ip:9200
curl -XGET "http://$(ifconfig enp0s3 | head -n2 | grep inet | awk '{print$2}'):9200"
结果大致:
{
"name" : "ea912245d40f",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "VpQjM1qHQyup2DUxdJu0mQ",
"version" : {
"number" : "7.17.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "e5acb99f822233d62d6444ce45a4543dc1c8059a",
"build_date" : "2022-02-23T22:20:54.153567231Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
安装 elasticsearch-analysis-ik
1.下载压缩包
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.1/elasticsearch-analysis-ik-7.17.1.zip
2.解压
mkdir ik
unzip elasticsearch-analysis-ik-7.17.1.zip -d ik
3.复制到plugins目录
cp -r ik esdatadir/plugins/ik
4.验证复制成功
docker exec -it es01 ls plugins/ik -alh
有8个主要的条目,说明成功
5.重启容器
docker restart es01
Docker 安装 Kibana 7.17.1
1.拉镜像
docker pull docker.elastic.co/kibana/kibana:7.17.1
2.运行容器
docker run --name kib01 \
--net elastic \
-p 5601:5601 \
-e "ELASTICSEARCH_HOSTS=http://es01:9200" \
docker.elastic.co/kibana/kibana:7.17.1
按Ctrl+C退出。
要再次运行只需, docker start kib01
3.访问 kibana
用浏览器访问http://{ip/host}:5601即可。
三、搜索
搜索可以通过 REST API 以及 Java Client 这两种方式。
前端 UI 组件可以通过调用 REST API 方式直接访问 ES。后端代码可以通过 Java Client 访问 ES,其本质通过 REST HTTP Client 调用。
3.1 REST API
通过 Kibana 菜单路径 “Management” -> “Dev Tools” -> “Console” 找到调用 API 的面板。可以通过 “Help” 查找使用快捷键以及如何发送请求。
操作 Index
创建 Index 的请求,创建索引过程中,可以指定 Settings、字段的 Mappings 以及索引的别名
# 简单创建
PUT /my-index-000001
# 简单删除
DELETE /my-index-000001
# 建index(settings),static settings 不能 udpate
PUT /my-index-000001
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
# 查 settings
GET /my-index-000001/_settings
# 建index(mappings),常见类型有text、long等等
PUT /test
{
"mappings": {
"properties": {
"field1": {
"type": "text"
}
}
}
}
# 查 mapping
GET /test/_mapping
# 建index(aliases)
PUT /logs
{
"aliases": {
"<logs_{now/M}>": {}
}
}
# 查 alias
GET /logs/_alias
更新mapping
PUT /my-index-000001/_mapping
{
"properties": {
"email": {
"type": "keyword"
}
}
}
查field的mapping
PUT /publications
{
"mappings": {
"properties": {
"id": { "type": "text" },
"title": { "type": "text" },
"abstract": { "type": "text" },
"author": {
"properties": {
"id": { "type": "text" },
"name": { "type": "text" }
}
}
}
}
}
GET /publications/_mapping/field/title
操作单Document
# 建docuemnt(自动生成ID)
POST my-index-000001/_doc/
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 保存document(指定ID)1
PUT my-index-000001/_doc/1
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 建document(指定ID)2,index没有该ID文档才行
PUT my-index-000001/_create/2
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 建document(指定ID)3,index没有该ID文档才行
PUT my-index-000001/_doc/3?op_type=create
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
# 只查出_source字段
GET my-index-000001/_source/1
# 查出整个文档
GET my-index-000001/_doc/1
# 更新文档
PUT test/_doc/1
{
"counter" : 1,
"tags" : ["red"]
}
## counter += 4
POST test/_update/1
{
"script" : {
"source": "ctx._source.counter += params.count",
"lang": "painless",
"params" : {
"count" : 4
}
}
}
## tags 新添元素 blue
POST test/_update/1
{
"script": {
"source": "ctx._source.tags.add(params.tag)",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
## 条件删除tags一个元素
POST test/_update/1
{
"script": {
"source": "if (ctx._source.tags.contains(params.tag)) { ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag)) }",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
# 新增字段
POST test/_update/1
{
"script" : "ctx._source.new_field = 'value_of_new_field'"
}
# 新增字段且会识别无效果更新
POST test/_update/1
{
"doc": {
"name": "new_name"
}
}
# 去除字段
POST test/_update/1
{
"script" : "ctx._source.remove('new_field')"
}
# 去除对象类型字段中某一个嵌套字段
POST test/_update/1
{
"script": "ctx._source['my-object'].remove('my-subfield')"
}
# 如果文档存在执行script,不存在执行upsert
POST test/_update/1
{
"script": {
"source": "ctx._source.counter += params.count",
"lang": "painless",
"params": {
"count": 4
}
},
"upsert": {
"counter": 1
}
}
操作多Document
# 批量查询
GET /my-index-000001/_mget
{
"docs": [
{
"_type": "_doc",
"_id": "1"
},
{
"_type": "_doc",
"_id": "2"
}
]
}
GET /my-index-000001/_mget
{
"ids" : ["1", "2"]
}
# 批量不同的操作
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
# 批量删除指定查询的数据
POST my-index-000001/_delete_by_query?scroll_size=5000
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
# 批量更新指定查询的数据
POST my-index-000001/_update_by_query
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}
Search APIs
# 有分页,match all 搜索
GET /my-index-000001/_search?from=0&size=20
{
"query": {
"match_all": {}
}
}
# 有分页,term搜索
GET /my-index-000001/_search?from=0&size=20
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
# match搜索
GET /my-index-000001/_search
{
"query": {
"match": {
"user.id": {
"query": "kimchy"
}
}
}
}
# range搜索
GET /my-index-000001/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1d/d"
}
}
}
}
# 排序
GET /my-index-000001/_search
{
"query": {
"match": {
"user.id": {
"query": "kimchy"
}
}
},
"sort": {
"_id": "desc"
}
}
GET /my-index-000001/_search?sort=_id:desc
{
"query": {
"match": {
"user.id": {
"query": "kimchy"
}
}
}
}
# prefix搜索
GET /my-index-000001/_search
{
"query": {
"prefix": {
"user.id": {
"value": "ki"
}
}
}
}
# boolean 搜索
## must:查询必须匹配
## must_not:查询must补集
## should:查询可以匹配,没有也没关系
## filter:查询必须匹配,与must区别,它不记录score
GET _search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"count": 2
}
}
}
}
}
3.2 Java Client
初次使用 elasticSearch-java 7.17.1
引入Maven依赖
<dependencies>
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>7.17.1</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>
<dependency>
<groupId>jakarta.json</groupId>
<artifactId>jakarta.json-api</artifactId>
<version>2.0.1</version>
</dependency>
</dependencies>
编写应用代码,展示了Java客户端先连接ES,然后判断是否存在索引products,若不存在,创建索引。接着,逐步进行 term、match、match all 等一系列搜索。
public class ESNativeClient7Application {
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper().enable(SerializationFeature.INDENT_OUTPUT);
public static void main(String[] args) throws IOException, InterruptedException {
// Create the low-level client
try (RestClient restClient = RestClient.builder(
new HttpHost("10.119.6.176", 9200)).build();
// Create the transport with a Jackson mapper
ElasticsearchTransport transport = new RestClientTransport(
restClient, new JacksonJsonpMapper())) {
// And create the API client
ElasticsearchClient client = new ElasticsearchClient(transport);
// Create Index
BooleanResponse resp = client.indices().exists(e -> e.index("products"));
if (!resp.value()) {
client.indices().create(c -> c
.index("products")
.mappings(m -> m
.properties("name", Property.of(o -> o
.text(t -> t
.store(true)
.index(true)
.analyzer("ik_smart"))
)
)
).settings(s -> s
.numberOfShards("3")
.numberOfReplicas("2")
).aliases("<products{now/M}>", a -> a)
);
client.index(c -> c
.index("products")
.id("1")
.document(Product.builder().name("bicycle")
.build()));
}
// Search
SearchResponse<Product> search1 = client.search(s -> s
.index("products")
.query(q -> q
.term(t -> t
.field("name")
.value(v -> v.stringValue("bicycle"))
)),
Product.class);
for (Hit<Product> hit : search1.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search2 = client.search(s -> s
.index("products")
.query(q -> q
.match(m -> m
.field("name")
.query("bicycle")
)),
Product.class);
for (Hit<Product> hit : search2.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search3 = client.search(s -> s
.index("products")
.query(q -> q.matchAll(v -> v.queryName("name"))),
Product.class);
for (Hit<Product> hit : search3.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search4 = client.search(s -> s
.index("products")
.query(q -> q
.prefix(p -> p
.field("name")
.value("bi"))),
Product.class);
for (Hit<Product> hit : search4.hits().hits()) {
processProduct(hit.source());
}
SearchResponse<Product> search5 = client.search(s -> s
.index("products")
.query(q -> q
.bool(b -> b
.must(m -> m
.matchAll(v -> v))
.filter(f -> f
.term(t -> t
.field("name")
.value(v -> v.stringValue("bicycle")))))),
Product.class);
for (Hit<Product> hit : search5.hits().hits()) {
processProduct(hit.source());
}
TimeUnit.SECONDS.sleep(1);
}
}
private static void processProduct(Product source) throws JsonProcessingException {
String jsonStr = OBJECT_MAPPER.writeValueAsString(source);
System.out.println(jsonStr);
}
}
用到了实体 Product
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@JsonIgnoreProperties(ignoreUnknown = true)
public class Product {
@JsonProperty("name")
private String name;
}
整合 Spring Boot
通过start.spring.io创建Spring Boot Maven 项目,版本选择2.6.4,JDK选择8,项目打包选择jar即可
引入依赖:
- spring-boot-starter-web
- spring-boot-configuration-processor
- spring-boot-starter-data-elasticsearch
- spring-boot-starter-test
- lombok
- joda-money 1.0.1
创建领域模型
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document(indexName = "products", writeTypeHint = WriteTypeHint.DEFAULT)
public class Product {
@Id
private Long id;
@Field(type = FieldType.Text, store = true, analyzer = "ik_smart")
private String name;
@Field(type = FieldType.Long, store = true)
private Money price;
}
@Document注解,配置索引的名称,以及@Field配置mapping
创建仓库
public interface ProductRepository extends ElasticsearchRepository<Product, Long> {
Product findByName(String name);
}
类似 JPA Repository 使用 ElasticsearchRepository,定义接口扩展它,通常根据业务需要自定义一些查询方法,命名规范与 spring data jpa一致
。一般find开头,跟着by后面是筛选条件的字段,多个字段用AND/OR连接,每个字段后面可以跟着操作,如:Like、In、GreaterThan等等。
创建服务
@Service
@Slf4j
public class ProductService {
@Resource
private ProductRepository productRepository;
public Optional<Product> queryProductByName(String name) {
Optional<Product> queriedProduct = Optional.ofNullable(productRepository.findByName(name));
queriedProduct.ifPresent(o -> {
log.info("query product by repository: {}", o);
});
return queriedProduct;
}
public void deleteAll() {
productRepository.deleteAll();
log.info("index products deleted all");
}
public void save(Product product) {
productRepository.save(product);
log.info("repository save Product: {}", product);
}
}
ProductService 根据仓库的存取行为进行业务代码编写,这里的业务较为简答
编写上下文配置
@SpringBootApplication
@EnableElasticsearchRepositories
public class ESSpringClientApplication {
public static void main(String[] args) {
SpringApplication app = new SpringApplicationBuilder()
.sources(ESSpringClientApplication.class)
.web(WebApplicationType.NONE)
.build();
app.run(args);
}
@Bean
public Jackson2ObjectMapperBuilderCustomizer customizer() {
return builder -> builder.indentOutput(true);
}
@Bean
public ElasticsearchCustomConversions elasticsearchCustomConversions() {
return new ElasticsearchCustomConversions(
Arrays.asList(new NumberToMoney(), new MoneyToNumber()));
}
@Bean
CommandLineRunner run() {
return new ClientRunner();
}
}
编写 Money 类型的读写转换器
@WritingConverter
public class MoneyToNumber implements Converter<Money, Number> {
@Override
public Number convert(Money source) {
long value = source.getAmountMinorLong();
return value;
}
}
@ReadingConverter
public class NumberToMoney implements Converter<Number, Money> {
@Override
public Money convert(Number source) {
return Money.ofMinor(CurrencyUnit.of("CNY"), source.longValue());
}
}
创建Jackson2ObjectMapperBuilderCustomizer Bean来自定义启用ObjectMapper的缩进输出,为后面ClientRunner进行json输出。
创建ElasticsearchCustomConversions Bean 来注入 Money 类型的自定义转换器。Money会变为Number 存入ES。从ES读到Number转换为Money。
创建CommandLineRunner Bean,它会在项目启动后运行它定义的run()。
@Slf4j
public class ClientRunner implements CommandLineRunner {
@Resource
private ElasticsearchRestTemplate elasticsearchRestTemplate;
@Resource
private ProductService productService;
@Resource
private ObjectMapper objectMapper;
private static final String LINE_SEP = System.getProperty("line.separator");
private ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors() - 1,
Runtime.getRuntime().availableProcessors(), 1, TimeUnit.SECONDS, new ArrayBlockingQueue<>(100));
private CountDownLatch cdl = new CountDownLatch(1);
@Override
public void run(String... args) throws Exception {
productService.deleteAll();
// 准备数据
Product product = Product.builder()
.id(1L)
.name("Bicycle")
.price(Money.ofMinor(CurrencyUnit.of("CNY"), 12000))
.build();
Product product2 = Product.builder()
.id(2L)
.name("Motorcycle")
.price(Money.ofMinor(CurrencyUnit.of("CNY"), 300000))
.build();
poolExecutor.execute(() -> {
// [1]
productService.save(product);
productService.queryProductByName("Bicycle");
// [2]
saveProduct(product2);
log.info("Product(id=2) exists: {}", elasticsearchRestTemplate.exists("2", Product.class));
Criteria criteria = new Criteria("name").is("Motorcycle");
CriteriaQuery criteriaQuery = new CriteriaQuery(criteria);
for (SearchHit<Product> hit : elasticsearchRestTemplate.search(criteriaQuery, Product.class).getSearchHits()) {
processProduct(hit.getContent());
}
// [3]
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.matchAllQuery())
.withPageable(PageRequest.of(0, 20))
.withSorts(SortBuilders.fieldSort("price").order(SortOrder.ASC))
.build();
for (SearchHit<Product> hit : elasticsearchRestTemplate.search(nativeSearchQuery, Product.class).getSearchHits()) {
processProduct(hit.getContent());
}
cdl.countDown();
});
cdl.await(1, TimeUnit.MINUTES);
System.exit(0);
}
private void saveProduct(Product product) {
IndexQuery idxQuery = new IndexQueryBuilder()
.withId(String.valueOf(product.getId()))
.withObject(product)
.build();
elasticsearchRestTemplate.index(idxQuery, IndexCoordinates.of("products"));
log.info("template save Product: {}", product);
try {
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
log.error(e.getMessage());
return;
}
}
private void processProduct(Product content) {
try {
log.info("query data by template:{}{}", LINE_SEP, objectMapper.writeValueAsString(content));
} catch (JsonProcessingException e) {
log.error(e.getMessage());
return;
}
}
}
重点关注CriteriaQuery和NativeSearchQuery,有前面的REST API使用,这里会很好理解

浙公网安备 33010602011771号