Loading

ElasticSearch快速入门

一、简介

Elasticsearch (简称“ES”)是分布式搜索和分析引擎。Logstash 和 Beats 将他们收集的数据存储到 ES。Kibana 提供可视化以及用户交互良好的方式来将ES的数据进行探索、监控还有可视化报表。

  • ElasticSearch 数据仓库,存放数据的空间
  • Logstash/Beats 仓库采购员/搬运工,收集和分类数据
  • Kibana 仓库的管理员,把数据分析后再呈现

ES中的数据模型:文档(Document) 和 索引(Index)

ES将数据序列化为 JSON 格式的文档进行存储,索引是优化的文档集合,文档是字段(键值对)的集合。如果字段是文本数据类型(text),存储的数据结构是倒序索引,支持快速的全文搜索。而字段是数字类(numeric)和地理信息类(geo),结构是BKD树。
需要知道的是倒序索引,会列出每一个唯一的词,不管它在哪一个文档并且出现过几次,并标识该词出现的所有文档。

无模式(schema-less)

对文档写入的模式约束灵活,文档要存多少字段,以及字段类型可以不做约定。即使做了约定,还可以存储没有约定的字段。比如:
要存储图书的信息,事先约定了属性 id、name和price。但是你在写入时,可以写入 description 字段的数据。
想想看这在关系数据库是不允许的,而且存数据前一定要数据建模(schema),对写入有强约束。

ES的模式(schema)这里类似对应的是映射(mapping)

搜索和分析

  • 搜索 REST API 结构化查询,本质上是JSON风格的查询用的特定领域语言(Query DSL)
  • 分析 聚合查询对数据获取摘要,求平均数、中位数等等

可扩展性和弹性

ES 是分布式的搜索和分析引擎。多集群和多节点复制副本可以容灾,分区将同一份数据较为均匀分布在多个集群/节点上,防止某一节点/集群过载,随着需求量变化,始终可用。

二、安装 ElasticSearch

安装前的准备

为了更好的操作ES,还要安装 Kibana。

安装前要装好 Docker

1.创建 network

docker network create elastic

2.创建目录 esdatadir

mkdir esdatadir
mkdir esdatadir/config
touch esdatadir/config/elasticsearch.yml
mkdir esdatadir/data
mkdir esdatadir/logs
mkdir esdatadir/plugins
# 设置读写权限
chmod -R 777 esdatadir

3.编辑elasticsearch.yml

http.host: 0.0.0.0
transport.host: 0.0.0.0
cluster.name: "docker-cluster"
node.name: es01
http.cors.enabled: true
http.cors.allow-origin: "*"

Docker 安装 ElasticSearch 7.17.1

1.拉镜像

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.17.1

2.运行容器
容器名称取es01好了

cd esdatadir
docker run -id --name es01 \
-p 9200:9200 \
-p 9300:9300 \
--net elastic \
-v $PWD/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v $PWD/data:/usr/share/elasticsearch/data \
-v $PWD/logs:/usr/share/elasticsearch/logs \
-v $PWD/plugins:/usr/share/elasticsearch/plugins \
-e "discovery.type=single-node" \
docker.elastic.co/elasticsearch/elasticsearch:7.17.1
cd ../

3.验证运行成功
大概等个10秒钟启动完成后

#curl -XGET http://ip:9200
curl -XGET  "http://$(ifconfig  enp0s3 | head -n2 | grep inet | awk '{print$2}'):9200"

结果大致:

{
  "name" : "ea912245d40f",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "VpQjM1qHQyup2DUxdJu0mQ",
  "version" : {
    "number" : "7.17.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "e5acb99f822233d62d6444ce45a4543dc1c8059a",
    "build_date" : "2022-02-23T22:20:54.153567231Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

安装 elasticsearch-analysis-ik

1.下载压缩包

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.1/elasticsearch-analysis-ik-7.17.1.zip 

2.解压

mkdir ik
unzip elasticsearch-analysis-ik-7.17.1.zip -d ik

3.复制到plugins目录

cp -r ik esdatadir/plugins/ik

4.验证复制成功

docker exec -it es01 ls plugins/ik -alh

有8个主要的条目,说明成功
5.重启容器

docker restart es01

Docker 安装 Kibana 7.17.1

1.拉镜像

docker pull docker.elastic.co/kibana/kibana:7.17.1

2.运行容器

docker run --name kib01 \
--net elastic \
-p 5601:5601 \
-e "ELASTICSEARCH_HOSTS=http://es01:9200" \
docker.elastic.co/kibana/kibana:7.17.1

Ctrl+C退出。
要再次运行只需, docker start kib01

3.访问 kibana
用浏览器访问http://{ip/host}:5601即可。

三、搜索

搜索可以通过 REST API 以及 Java Client 这两种方式。
前端 UI 组件可以通过调用 REST API 方式直接访问 ES。后端代码可以通过 Java Client 访问 ES,其本质通过 REST HTTP Client 调用。

3.1 REST API

通过 Kibana 菜单路径 “Management” -> “Dev Tools” -> “Console” 找到调用 API 的面板。可以通过 “Help” 查找使用快捷键以及如何发送请求。

操作 Index

创建 Index 的请求,创建索引过程中,可以指定 Settings、字段的 Mappings 以及索引的别名

# 简单创建
PUT /my-index-000001
# 简单删除
DELETE /my-index-000001
# 建index(settings),static settings 不能 udpate
PUT /my-index-000001
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  }
}
# 查 settings
GET /my-index-000001/_settings
# 建index(mappings),常见类型有text、long等等
PUT /test
{
  "mappings": {
    "properties": {
      "field1": {
        "type": "text"
      }
    }
  }
}
# 查 mapping
GET /test/_mapping
# 建index(aliases)
PUT /logs
{
  "aliases": {
    "<logs_{now/M}>": {}
  }
}
# 查 alias
GET /logs/_alias

更新mapping

PUT /my-index-000001/_mapping
{
  "properties": {
    "email": {
      "type": "keyword"
    }
  }
}

查field的mapping

PUT /publications
{
  "mappings": {
    "properties": {
      "id": { "type": "text" },
      "title": { "type": "text" },
      "abstract": { "type": "text" },
      "author": {
        "properties": {
          "id": { "type": "text" },
          "name": { "type": "text" }
        }
      }
    }
  }
}
GET /publications/_mapping/field/title

操作单Document

# 建docuemnt(自动生成ID)
POST my-index-000001/_doc/
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}
# 保存document(指定ID)1
PUT my-index-000001/_doc/1
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}
# 建document(指定ID)2,index没有该ID文档才行
PUT my-index-000001/_create/2
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}
# 建document(指定ID)3,index没有该ID文档才行
PUT my-index-000001/_doc/3?op_type=create
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}
# 只查出_source字段
GET my-index-000001/_source/1
# 查出整个文档
GET my-index-000001/_doc/1
# 更新文档
PUT test/_doc/1
{
  "counter" : 1,
  "tags" : ["red"]
}
## counter += 4
POST test/_update/1
{
  "script" : {
    "source": "ctx._source.counter += params.count",
    "lang": "painless",
    "params" : {
      "count" : 4
    }
  }
}
## tags 新添元素 blue
POST test/_update/1
{
  "script": {
    "source": "ctx._source.tags.add(params.tag)",
    "lang": "painless",
    "params": {
      "tag": "blue"
    }
  }
}
## 条件删除tags一个元素
POST test/_update/1
{
  "script": {
    "source": "if (ctx._source.tags.contains(params.tag)) { ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag)) }",
    "lang": "painless",
    "params": {
      "tag": "blue"
    }
  }
}
# 新增字段
POST test/_update/1
{
  "script" : "ctx._source.new_field = 'value_of_new_field'"
}
# 新增字段且会识别无效果更新
POST test/_update/1
{
  "doc": {
    "name": "new_name"
  }
}
# 去除字段
POST test/_update/1
{
  "script" : "ctx._source.remove('new_field')"
}
# 去除对象类型字段中某一个嵌套字段
POST test/_update/1
{
  "script": "ctx._source['my-object'].remove('my-subfield')"
}
# 如果文档存在执行script,不存在执行upsert
POST test/_update/1
{
  "script": {
    "source": "ctx._source.counter += params.count",
    "lang": "painless",
    "params": {
      "count": 4
    }
  },
  "upsert": {
    "counter": 1
  }
}

操作多Document

# 批量查询
GET /my-index-000001/_mget
{
  "docs": [
    {
      "_type": "_doc",
      "_id": "1"
    },
    {
      "_type": "_doc",
      "_id": "2"
    }
  ]
}
GET /my-index-000001/_mget
{
  "ids" : ["1", "2"]
}
# 批量不同的操作
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
# 批量删除指定查询的数据
POST my-index-000001/_delete_by_query?scroll_size=5000
{
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}
# 批量更新指定查询的数据
POST my-index-000001/_update_by_query
{
  "script": {
    "source": "ctx._source.count++",
    "lang": "painless"
  },
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}

Search APIs

# 有分页,match all 搜索
GET /my-index-000001/_search?from=0&size=20
{
  "query": {
    "match_all": {}
  }
}
# 有分页,term搜索
GET /my-index-000001/_search?from=0&size=20
{
  "query": {
    "term": {
      "user.id": "kimchy"
    }
  }
}
# match搜索
GET /my-index-000001/_search
{
  "query": {
    "match": {
      "user.id": {
        "query": "kimchy"
      }
    }
  }
}
# range搜索
GET /my-index-000001/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d"
      }
    }
  }
}
# 排序
GET /my-index-000001/_search
{
 "query": {
    "match": {
      "user.id": {
        "query": "kimchy"
      }
    }
  },
  "sort": {
    "_id": "desc"
  }
}
GET /my-index-000001/_search?sort=_id:desc
{
  "query": {
    "match": {
      "user.id": {
        "query": "kimchy"
      }
    }
  }
}
# prefix搜索
GET /my-index-000001/_search
{
  "query": {
    "prefix": {
      "user.id": {
        "value": "ki"
      }
    }
  }
}
# boolean 搜索
## must:查询必须匹配
## must_not:查询must补集
## should:查询可以匹配,没有也没关系
## filter:查询必须匹配,与must区别,它不记录score
GET _search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "count": 2
        }
      }
    }
  }
}

3.2 Java Client

初次使用 elasticSearch-java 7.17.1

引入Maven依赖

<dependencies>
    <dependency>
        <groupId>co.elastic.clients</groupId>
        <artifactId>elasticsearch-java</artifactId>
        <version>7.17.1</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-core</artifactId>
        <version>2.12.3</version>
    </dependency>

    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.12.3</version>
    </dependency>

    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-annotations</artifactId>
        <version>2.12.3</version>
    </dependency>
    <dependency>
        <groupId>commons-logging</groupId>
        <artifactId>commons-logging</artifactId>
        <version>1.2</version>
    </dependency>
    <dependency>
        <groupId>jakarta.json</groupId>
        <artifactId>jakarta.json-api</artifactId>
        <version>2.0.1</version>
    </dependency>
</dependencies>

编写应用代码,展示了Java客户端先连接ES,然后判断是否存在索引products,若不存在,创建索引。接着,逐步进行 term、match、match all 等一系列搜索。

public class ESNativeClient7Application {

    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper().enable(SerializationFeature.INDENT_OUTPUT);

    public static void main(String[] args) throws IOException, InterruptedException {
        // Create the low-level client
        try (RestClient restClient = RestClient.builder(
                new HttpHost("10.119.6.176", 9200)).build();
             // Create the transport with a Jackson mapper
             ElasticsearchTransport transport = new RestClientTransport(
                     restClient, new JacksonJsonpMapper())) {

            // And create the API client
            ElasticsearchClient client = new ElasticsearchClient(transport);

            // Create Index
            BooleanResponse resp = client.indices().exists(e -> e.index("products"));
            if (!resp.value()) {
                client.indices().create(c -> c
                        .index("products")
                        .mappings(m -> m
                                .properties("name", Property.of(o -> o
                                                .text(t -> t
                                                        .store(true)
                                                        .index(true)
                                                        .analyzer("ik_smart"))
                                        )
                                )
                        ).settings(s -> s
                                .numberOfShards("3")
                                .numberOfReplicas("2")
                        ).aliases("<products{now/M}>", a -> a)
                );

                client.index(c -> c
                        .index("products")
                        .id("1")
                        .document(Product.builder().name("bicycle")
                                .build()));
            }
            // Search
            SearchResponse<Product> search1 = client.search(s -> s
                            .index("products")
                            .query(q -> q
                                    .term(t -> t
                                            .field("name")
                                            .value(v -> v.stringValue("bicycle"))
                                    )),
                    Product.class);

            for (Hit<Product> hit : search1.hits().hits()) {
                processProduct(hit.source());
            }

            SearchResponse<Product> search2 = client.search(s -> s
                            .index("products")
                            .query(q -> q
                                    .match(m -> m
                                            .field("name")
                                            .query("bicycle")
                                    )),
                    Product.class);

            for (Hit<Product> hit : search2.hits().hits()) {
                processProduct(hit.source());
            }

            SearchResponse<Product> search3 = client.search(s -> s
                            .index("products")
                            .query(q -> q.matchAll(v -> v.queryName("name"))),
                    Product.class);

            for (Hit<Product> hit : search3.hits().hits()) {
                processProduct(hit.source());
            }

            SearchResponse<Product> search4 = client.search(s -> s
                            .index("products")
                            .query(q -> q
                                    .prefix(p -> p
                                            .field("name")
                                            .value("bi"))),
                    Product.class);
            for (Hit<Product> hit : search4.hits().hits()) {
                processProduct(hit.source());
            }

            SearchResponse<Product> search5 = client.search(s -> s
                            .index("products")
                            .query(q -> q
                                    .bool(b -> b
                                            .must(m -> m
                                                    .matchAll(v -> v))
                                            .filter(f -> f
                                                    .term(t -> t
                                                            .field("name")
                                                            .value(v -> v.stringValue("bicycle")))))),
                    Product.class);
            for (Hit<Product> hit : search5.hits().hits()) {
                processProduct(hit.source());
            }

            TimeUnit.SECONDS.sleep(1);
        }
    }
    
    private static void processProduct(Product source) throws JsonProcessingException {
        String jsonStr = OBJECT_MAPPER.writeValueAsString(source);
        System.out.println(jsonStr);
    }
}

用到了实体 Product

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@JsonIgnoreProperties(ignoreUnknown = true)
public class Product {
    @JsonProperty("name")
    private String name;
}

整合 Spring Boot

通过start.spring.io创建Spring Boot Maven 项目,版本选择2.6.4,JDK选择8,项目打包选择jar即可
引入依赖:

  • spring-boot-starter-web
  • spring-boot-configuration-processor
  • spring-boot-starter-data-elasticsearch
  • spring-boot-starter-test
  • lombok
  • joda-money 1.0.1

创建领域模型

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document(indexName = "products", writeTypeHint = WriteTypeHint.DEFAULT)
public class Product {

    @Id
    private Long id;

    @Field(type = FieldType.Text, store = true, analyzer = "ik_smart")
    private String name;

    @Field(type = FieldType.Long, store = true)
    private Money price;
}

@Document注解,配置索引的名称,以及@Field配置mapping

创建仓库

public interface ProductRepository extends ElasticsearchRepository<Product, Long> {
    Product findByName(String name);
}

类似 JPA Repository 使用 ElasticsearchRepository,定义接口扩展它,通常根据业务需要自定义一些查询方法,命名规范与 spring data jpa一致
。一般find开头,跟着by后面是筛选条件的字段,多个字段用AND/OR连接,每个字段后面可以跟着操作,如:Like、In、GreaterThan等等。

创建服务

@Service
@Slf4j
public class ProductService {
    @Resource
    private ProductRepository productRepository;

    public Optional<Product> queryProductByName(String name) {
        Optional<Product> queriedProduct = Optional.ofNullable(productRepository.findByName(name));
        queriedProduct.ifPresent(o -> {
            log.info("query product by repository: {}", o);
        });

        return queriedProduct;
    }

    public void deleteAll() {
        productRepository.deleteAll();
        log.info("index products deleted all");
    }

    public void save(Product product) {
        productRepository.save(product);
        log.info("repository save Product: {}", product);
    }
}

ProductService 根据仓库的存取行为进行业务代码编写,这里的业务较为简答

编写上下文配置

@SpringBootApplication
@EnableElasticsearchRepositories
public class ESSpringClientApplication {
    public static void main(String[] args) {
        SpringApplication app = new SpringApplicationBuilder()
                .sources(ESSpringClientApplication.class)
                .web(WebApplicationType.NONE)
                .build();
        app.run(args);
    }

    @Bean
    public Jackson2ObjectMapperBuilderCustomizer customizer() {
        return builder -> builder.indentOutput(true);
    }

    @Bean
    public ElasticsearchCustomConversions elasticsearchCustomConversions() {
        return new ElasticsearchCustomConversions(
                Arrays.asList(new NumberToMoney(), new MoneyToNumber()));
    }

    @Bean
    CommandLineRunner run() {
        return new ClientRunner();
    }
}

编写 Money 类型的读写转换器

@WritingConverter
public class MoneyToNumber implements Converter<Money, Number> {
    @Override
    public Number convert(Money source) {
        long value = source.getAmountMinorLong();
        return value;
    }
}

@ReadingConverter
public class NumberToMoney implements Converter<Number, Money> {
    @Override
    public Money convert(Number source) {
        return Money.ofMinor(CurrencyUnit.of("CNY"), source.longValue());
    }
}

创建Jackson2ObjectMapperBuilderCustomizer Bean来自定义启用ObjectMapper的缩进输出,为后面ClientRunner进行json输出。
创建ElasticsearchCustomConversions Bean 来注入 Money 类型的自定义转换器。Money会变为Number 存入ES。从ES读到Number转换为Money。
创建CommandLineRunner Bean,它会在项目启动后运行它定义的run()。

@Slf4j
public class ClientRunner implements CommandLineRunner {

    @Resource
    private ElasticsearchRestTemplate elasticsearchRestTemplate;

    @Resource
    private ProductService productService;

    @Resource
    private ObjectMapper objectMapper;

    private static final String LINE_SEP = System.getProperty("line.separator");

    private ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors() - 1,
            Runtime.getRuntime().availableProcessors(), 1, TimeUnit.SECONDS, new ArrayBlockingQueue<>(100));

    private CountDownLatch cdl = new CountDownLatch(1);

    @Override
    public void run(String... args) throws Exception {
        productService.deleteAll();
        // 准备数据
        Product product = Product.builder()
                .id(1L)
                .name("Bicycle")
                .price(Money.ofMinor(CurrencyUnit.of("CNY"), 12000))
                .build();

        Product product2 = Product.builder()
                .id(2L)
                .name("Motorcycle")
                .price(Money.ofMinor(CurrencyUnit.of("CNY"), 300000))
                .build();

        poolExecutor.execute(() -> {
            // [1]
            productService.save(product);
            productService.queryProductByName("Bicycle");

            // [2]
            saveProduct(product2);
            log.info("Product(id=2) exists: {}", elasticsearchRestTemplate.exists("2", Product.class));

            Criteria criteria = new Criteria("name").is("Motorcycle");
            CriteriaQuery criteriaQuery = new CriteriaQuery(criteria);
            for (SearchHit<Product> hit : elasticsearchRestTemplate.search(criteriaQuery, Product.class).getSearchHits()) {
                processProduct(hit.getContent());
            }
            // [3]
            NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
                    .withQuery(QueryBuilders.matchAllQuery())
                    .withPageable(PageRequest.of(0, 20))
                    .withSorts(SortBuilders.fieldSort("price").order(SortOrder.ASC))
                    .build();
            for (SearchHit<Product> hit : elasticsearchRestTemplate.search(nativeSearchQuery, Product.class).getSearchHits()) {
                processProduct(hit.getContent());
            }

            cdl.countDown();
        });
        cdl.await(1, TimeUnit.MINUTES);
        System.exit(0);
    }

    private void saveProduct(Product product) {
        IndexQuery idxQuery = new IndexQueryBuilder()
                .withId(String.valueOf(product.getId()))
                .withObject(product)
                .build();
        elasticsearchRestTemplate.index(idxQuery, IndexCoordinates.of("products"));
        log.info("template save Product: {}", product);
        try {
            TimeUnit.SECONDS.sleep(1);
        } catch (InterruptedException e) {
            log.error(e.getMessage());
            return;
        }
    }

    private void processProduct(Product content) {
        try {
            log.info("query data by template:{}{}", LINE_SEP, objectMapper.writeValueAsString(content));
        } catch (JsonProcessingException e) {
            log.error(e.getMessage());
            return;
        }
    }
}

重点关注CriteriaQuery和NativeSearchQuery,有前面的REST API使用,这里会很好理解

posted @ 2022-03-21 16:50  槎城侠客  阅读(343)  评论(0)    收藏  举报