elasticsearch

1.es api使用:https://www.cnblogs.com/sunny1009/articles/7887568.html

源码文章：https://cloud.tencent.com/developer/article/1154813

调优文章：https://cloud.tencent.com/developer/article/1156231

1.solr和elasticSearch对比

总结：

1.二者安装都很简单。

　 2.Solr 利用 Zookeeper 进行分布式管理，而 Elasticsearch 自身带有分布式协调管理功能。

　 3.Solr 支持更多格式的数据，比如JSON、XML、CSV，而 Elasticsearch 仅支持json文件格式。

　 4.Solr 官方提供的功能更多，而 Elasticsearch 本身更注重于核心功能，高级功能多有第三方插件提供

　 5.Solr 在传统的搜索应用中表现好于 Elasticsearch，但在处理实时搜索应用时效率明显低于 Elasticsearch。

　 6.Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用。

7.Solr专注于文本搜索，而Elasticsearch则常用于查询、过滤和分组分析统计
2.Elastic 的底层是开源库 Lucene,需要java8环境，默认9200端口

3.Elastic默认本机访问，如果需要远程访问，可以修改 Elastic 安装目录的config/elasticsearch.yml文件，去掉network.host的注释，将它的值改成0.0.0.0，然后重新启动 Elastic。

network.host: 0.0.0.0 (任何人均可访问，线上改成具体的ip)

3.Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。

Index 里面单条的记录称为 Document（文档）。多条 Document 构成了一个 Index。

Document 可以分组(Type)是虚拟的逻辑分组，用来过滤 Document,不同的 Type 应该有相似的结构（schema）

4.mapping：类似于静态语言中的数据类型，作用就是执行一系列的指令将输入的数据转成可搜索的索引项(搜索数据的指令集合)

一个mapping由一个或多个analyzer组成，一个analyzer又由一个或多个filter组成的。当ES索引文档的时候，它把字段中的内容传递给相应的analyzer，analyzer再传递给各自的filters。

标准analyzer(默认), 这个标准analyzer有三个filter：token filter, lowercase filter和stop token filter

5.Index：分片（sharding，分片策略）->选定具体的node（Master）Index ->同步到对应的slave node

Search：从replSet中选定node（负载策略）->请求分发 ->结果集合并

6.Index、Document等概念

================================================================================

1.数据分类：

结构化(关系型数据库)、全文检索：表：字段数量、字段类型

非结构化：文本文档、图片、视频、音乐

半结构化：json、html、xml

2.基于Lucene,倒排索引：每个Field可以设置是否保存、是否索引、是否分词

添加用户，并授权

[root@node01 ~]# mkdir /opt/lzx/es

[root@node01 lzx]# chown lzx:lzx es

[root@node01 lzx]# yum install unzip -y 安装unip

[root@node01 es]# su lzx

[lzx@node01 es]$ unzip elasticsearch-2.2.1.zip

3.ES内置接口：

4.安装head插件,更好支持REST接口:

[root@node02 bin]# chown -R lzx /opt/lzx/es/ 授予全部权限

[lzx@node01 bin]$ ./plugin install mobz/elasticsearch-head

[root@node01 es]# scp -r elasticsearch-2.2.1/ node02:`pwd`

5.java api使用:https://www.cnblogs.com/sunny1009/articles/7887568.html

6.es管控台：http://192.168.50.11:9200/_plugin/head/

爬网页：wget -o /tmp/wget.log -P /root/data --no-parent --no-verbose -m -D news.cctv.com -N --convert-links --random-wait -A html,HTML,shtml,SHTML http://news.cctv.com

@Before
public void createConn() throws Exception {
    System.out.printf("---create----------");
    Settings settings = Settings.settingsBuilder().put("cluster.name", "lzxcluster").build();
    client = TransportClient.builder().settings(settings).build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("node01"), 9300)).
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("node02"), 9300)).
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("node03"), 9300));
}

@After
public void closeConn() {
    System.out.printf("---close----------");
    client.close();
}

@Test
public void createIndex() {
    System.out.printf("---create--index--------");
    //检查是否已有索引库存在
    IndicesExistsResponse indicesExistsResponse = client.admin().indices().prepareExists("lzxtest").execute().actionGet();
    if (indicesExistsResponse.isExists()) {
        client.admin().indices().prepareDelete("lzxtest").execute();
    }
    Map<String, Object> sets = new HashMap<>();
    //设置副本数2
    sets.put("number_of_replicas", 2);
    client.admin().indices().prepareCreate("lzxtest").setSettings(sets).execute();
}

@Test
public void addData(){
    Map<String,Object> dataMap=new HashMap<>();
    dataMap.put("name","aaa");
    dataMap.put("content","wqlwx is a bad manz");
    dataMap.put("size",28);
    //prepareIndex(索引库,类型)
    IndexResponse indexResponse=client.prepareIndex("lzxtest","testfields")
            .setSource(dataMap)
            .execute().actionGet();
    System.out.println("id:"+indexResponse.getId());
}

@Test
public void queryData(){
    QueryBuilder queryBuilder=new MatchQueryBuilder("content","lzx");
    SearchResponse searchResponse=client.prepareSearch("lzxtest")
            .setTypes("testfields")
            .setQuery(queryBuilder)
            .execute()
            .actionGet();
    SearchHits searchHits= searchResponse.getHits();
    System.out.println("总共内容命中次数"+searchHits.getTotalHits());
    for (SearchHit searchHit:searchHits){
        System.out.println("单个全部内容:"+searchHit.getSourceAsString());
        System.out.println("内容:"+searchHit.getSource().get("content"));
    }
}

@Test
public void queryDataByPage(){
    QueryBuilder queryBuilder=new MatchQueryBuilder("content","lzx");
    SearchResponse searchResponse=client.prepareSearch("lzxtest")
            .setTypes("testfields")
            .addHighlightedField("content") //高亮
            .setHighlighterPreTags("<font color=red>")
            .setHighlighterPostTags("</font>")
            .setQuery(queryBuilder)
            .setFrom(0)   //起始点
            .setSize(2)   //查两个
            .execute()
            .actionGet();
    SearchHits searchHits= searchResponse.getHits();
    System.out.println("总共内容命中次数"+searchHits.getTotalHits());
    for (SearchHit searchHit:searchHits){
        System.out.println("单个全部内容:"+searchHit.getSourceAsString());
        System.out.println("内容:"+searchHit.getSource().get("content"));
        System.out.println("高亮内容:"+searchHit.getHighlightFields().get("content").getFragments()[0]);
    }
}

1.依赖的包

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>5.5.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>5.5.1</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-analysis-ik</artifactId>
    <version>5.5.1</version>
</dependency>

1.Java客户端连接Elasticsearch：客户端版本应和服务端版本一致

TransportClient：作为外部访问者，请求ES的集群；旨在被Java高级REST客户端取代，执行Http请求而不是序列化的java请求，

NodeClient：作为ES集群的一个节点，其他节点对其是感知的

XPackTransportClient：服务装了x-pack插件

2.TransportClient连接：

/**
 * cluster.name:设置ES实例的名称
 * client.transport.sniff:自动嗅探整个集群的状态，把集群中其他ES节点的ip添加到本地的客户端列表中
 * PreBuiltTransportClient:初始化client较老版本发生了变化，此方法有几个重载方法，初始化插件等。
 * */
Settings esSettings = Settings.builder()
        .put("cluster.name", clusterName) 
        .put("client.transport.sniff", true) 
        .build();
client = new PreBuiltTransportClient(esSettings);
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ip), esPort));

/**
 * 如果 ElasticSearch 服务安装了 x-pack 插件，需要 PreBuiltXPackTransportClient 实例才能访问
 * */
Settings settings = Settings.builder().put("cluster.name", "xxx")
        .put("xpack.security.transport.ssl.enabled", false)
        .put("xpack.security.user", "xxx:xxx")
        .put("client.transport.sniff", true).build();
try {
    client = new PreBuiltXPackTransportClient(settings)
            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("xxx.xxx.xxx.xxx"), 9300))
            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("xxx.xxx.xxx.xxx"), 9300));
} catch (UnknownHostException e) {
    e.printStackTrace();
}

2.Index API 允许我们存储一个JSON格式的文档，使数据可以被搜索。文档通过index、type、id唯一确定。我们可以自己提供一个id，或者也使用Index API 为我们自动生成一个。

四种生成方式：

a.手动方式，使用原生的byte[]或者String

b.使用Map方式，会自动转换成与之等价的JSON

c.使用第三方库来序列化beans，如Jackson

CsdnBlog csdn=new CsdnBlog();
csdn.setTag("C");
csdn.setView("100");
csdn.setTitile("编程");
csdn.setDate(new Date().toString());
ObjectMapper mapper = new ObjectMapper();
byte[] json = mapper.writeValueAsBytes(csdn);
IndexResponse response = client.prepareIndex("fendo", "fendodate").setSource(json).get();

d.使用内置的帮助类 XContentFactory.jsonBuilder()

XContentBuilder builder = XContentFactory.jsonBuilder().startObject()
        .field("user", "ccse")
        .field("postDate", new Date())
        .field("message", "this is Elasticsearch").endObject();
IndexResponse response = client.prepareIndex("fendo", "fendodata").setSource(builder).get();

还可以startArray(string)和endArray()方法添加数组。.field()方法可以接受多种对象类型。你可以给它传递

数字、日期、甚至其他XContentBuilder对象

2.GetApi:

根据id查询,operationThreaded 设置为 true 是在不同的线程里执行此次操作

GetResponse response = client.prepareGet("twitter", "tweet", "1").setOperationThreaded(false).get();

3.DelApi:

DeleteResponse response = client.prepareDelete("twitter", "tweet", "1").setOperationThreaded(false).get();

4.过滤删除：

BulkByScrollResponse response =
        DeleteByQueryAction.INSTANCE.newRequestBuilder(client).filter(QueryBuilders.matchQuery("gender", "male"))//查询条件
                .source("persons") //index(索引名)
                .get(); //执行
long deleted = response.getDeleted(); //删除文档的数量

异步方式：

DeleteByQueryAction.INSTANCE.newRequestBuilder(client)
        .filter(QueryBuilders.matchQuery("gender", "male")) //查询
        .source("persons") //index(索引名)
        .execute(new ActionListener<BulkByScrollResponse>() { //回调监听
            @Override
            public void onResponse(BulkByScrollResponse response) {
                long deleted = response.getDeleted(); //删除文档的数量
            }
            @Override
            public void onFailure(Exception e) {
            }
        });

Upsert：更新插入

IndexRequest indexRequest = new IndexRequest("index", "type", "1").source(jsonBuilder()
        .startObject()
        .field("name", "Joe Smith")
        .field("gender", "male")
        .endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1").doc(jsonBuilder()
        .startObject()
        .field("gender", "male")
        .endObject())
        .upsert(indexRequest); //如果不存在此文档 ，就增加 `indexRequest`
client.update(updateRequest).get();

多值获取：一次获取多个文档

MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
        .add("twitter", "tweet", "1") //一个id的方式
        .add("twitter", "tweet", "2", "3", "4") //多个id的方式
        .add("another", "type", "foo") //可以从另外一个索引获取
        .get();
for (MultiGetItemResponse itemResponse : multiGetItemResponses) { //迭代返回值
    GetResponse response = itemResponse.getResponse();
    if (response.isExists()) { //判断是否存在
        String json = response.getSourceAsString(); //_source 字段
    }
}

来自为知笔记(Wiz)

posted on 2018-03-06 18:19 xiaojiayu0011 阅读(197) 评论(0) 收藏举报

刷新页面返回顶部

elasticsearch

导航

公告