Python 批量插入ES
使用Python批量插入数据到ES中,如果是一条条插入,会发现效率很低,这时需要使用ES的批量插入bulk的功能。
以下示例代码,是将masscan输出的结果文件,抽取ip,port,和时间戳,插入到es中的。
#!/usr/bin/python
# coding=utf-8
import json
import time
from elasticsearch import Elasticsearch
from elasticsearch import helpers
import ssl
es = Elasticsearch(
[{"host": "xx.xx.xx.xx", "port": "xx"}])
print(es.info())
# 添加timestamp
time_now = int(time.time())
time_local = time.localtime(time_now)
timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time_local)
date_t, time_t = timestamp.split(' ')
time_format = '{}T{}.000Z'.format(date_t, time_t)
print(time_format)
ip_ports = []
# 提取 masscan.json 中的 ip:port 信息
def handle_masscan(target):
index = 0
with open(target, 'r') as f:
for line in f:
index += 1
if line.startswith('{ '):
temp = json.loads(line[:-2])
ip = str(temp["ip"]).strip()
port = str(temp["ports"][0]["port"]).strip()
ip_port = [ip, port]
ip_ports.append(ip_port)
def timer(func):
def wrapper(*args, **kwargs):
start = time.time()
res = func(*args, **kwargs)
print('共耗时约 {:.2f} 秒'.format(time.time() - start))
return res
return wrapper
@timer
def gen():
actions = []
for line in ip_ports:
# 拼接插入数据结构
action = {
"_index": "server_port_info_2020_q4",
"_type": "doc",
"_source": {
"ip": line[0],
"port": line[1],
"@timestamp": time_format,
}
}
actions.append(action)
g(es, actions)
if __name__ == '__main__':
target = '../port_info_2_es/masscan.json'
handle_masscan(target)
gen()
pass
参考:
Elasticsearch - 使用Python批量写入数据:
https://www.cnblogs.com/Neeo/articles/10788573.html
使用Python-elasticsearch-bulk批量快速向elasticsearch插入数据:
https://blog.csdn.net/weixin_39198406/article/details/82983256
Bulk helpers:
https://elasticsearch-py.readthedocs.io/en/7.10.0/helpers.html
-------------------------------------------
个性签名:如果世上的事都按你说的道理走 世界就不是现在这样了!
如果觉得这篇文章对你有小小的帮助的话,记得在右下角点个“推荐”哦,博主在此感谢!

浙公网安备 33010602011771号