elasticsearch-jdbc调研

1.之前有个项目需要从mysql/oralce数据库中的数据同步到elasticsearch中,之前的实现方式是逐条读写,将五六张表,转换成elasticsearch的一个文档。

部署步骤:
以2017年3月20号最新版V2.3.4.1为例:
操作系统;
Ubuntu14.04
1.下载

wget http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.4.1/elasticsearch-jdbc-2.3.4.1-dist.zip

wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.3.3/elasticsearch-2.3.3.tar.gz

2.解压

tar –zxvf elasticsearch-2.3.4.tar.gz
unzip elasticsearch-jdbc-2.3.4.1-dist.zip

3.修改文件名称

4.设置环境变量。
vi /etc/profile 
export JDBC_IMPORTER_HOME=/jmhan/elasticsearch-jdbc

使环境变量生效: 
source /etc/profile

假设有一张表book
字段如下

5.添加 执行脚本
vi mysql_import_es.sh 
内容:

!/bin/sh

bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"test_cluster",
"url":"jdbc:mysql://127.0.0.1:3306/res", #mysql数据库地址
"user":"root",
"password":"root",
"sql":"select * from book",
"elasticsearch" : {
"host" : "127.0.0.1",
"port" : 9300
},
"index" : "test_index", #新的index
"type" : "test_index" #新的type
}
}'| java
-cp "${lib}/*"
-Dlog4j.configurationFile=${bin}/log4j2.xml
org.xbib.tools.Runner
org.xbib.tools.JDBCImporter

6.为 mysql_import_es.sh 添加可执行权限。 
chmod a+x mysql_import_es.sh

7.head插件下载
./bin/plugin install mobz/elasticsearch-head
8.启动 root用户
./bin/elasticsearch -Des.insecure.allow.root=true
9.执行shell脚本
./mysql_import_es.sh

7.验证
http://172.31.6.18:9200/_plugin/head/

二、扩展

1.mysql数据变化后自动更新es中个数据,定时器

给shell脚本增加定时器。

2.数据新增
之前如果在shell脚本里没有的sql没有没有加限制是多次执行shell脚本es数据会多次叠加的(亲测),所以在实际应用在需要加上数据insert的具体时间,同步的sql修改为:

vi mysql_import_es.sh 
内容:

!/bin/sh

bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"test_cluster",
"url":"jdbc:mysql://127.0.0.1:3306/res", #mysql数据库地址
"user":"root",
"password":"root",
"sql":"select * from book",
"elasticsearch" : {
"host" : "127.0.0.1",
"port" : 9300
},
"index" : "test_index", #新的index
"type" : "test_index" #新的type
}
}'| java
-cp "${lib}/*"
-Dlog4j.configurationFile=${bin}/log4j2.xml
org.xbib.tools.Runner
org.xbib.tools.JDBCImporter

{

"type" : "jdbc",

"jdbc" : {

"password" : "",

"index" : "my_jdbc_index",

"statefile" : "statefile.json",

"metrics" : { 

  "lastexecutionstart" : "2017-03-21T06:37:09.165Z",

  "lastexecutionend" : "2017-03-21T06:37:09.501Z",

  "counter" : "1" 

},  

"type" : "my_jdbc_type",

"user" : "",

"url" : "jdbc:mysql://127.0.0.1:3306/res",

"sql" : [ { 

  "statement" : "select * from td_a where time > ?", 

  "parameter" : [ "$metrics.lastexecutionstart" ]

} ] 

}
}
}

2.很多表

新增shell脚本
其他方法待调研

3.es中field 名称可以指定

vi mysql_import_es.sh

select a.id,a.name,a.age from a
select a.id,a.name,a.age,b.math,b.classroom from a,b

4.官网说的,新增field的时候最好加上这次字段
There are column labels with an underscore as prefix that are mapped to special Elasticsearch document parameters for indexing:
_index the index this object should be indexed into
_
_type the type this object should be indexed into
_
_id the id of this object
_
_version the version of this object
_
_parent the parent of this object
_
_ttl the time-to-live of this object
_
_routing the routing of this object

...

/时间分割线2017/03/21***************/

posted @ 2017-03-22 12:18  jmhan  阅读(147)  评论(0)    收藏  举报