欢迎来到starnight_cyber的博客

Python 批量插入ES

  使用Python批量插入数据到ES中,如果是一条条插入,会发现效率很低,这时需要使用ES的批量插入bulk的功能。

  以下示例代码,是将masscan输出的结果文件,抽取ip,port,和时间戳,插入到es中的。

#!/usr/bin/python
# coding=utf-8

import json
import time
from elasticsearch import Elasticsearch
from elasticsearch import helpers
import ssl

es = Elasticsearch(
    [{"host": "xx.xx.xx.xx", "port": "xx"}])

print(es.info())


# 添加timestamp
time_now = int(time.time())
time_local = time.localtime(time_now)
timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time_local)
date_t, time_t = timestamp.split(' ')
time_format = '{}T{}.000Z'.format(date_t, time_t)
print(time_format)


ip_ports = []
# 提取 masscan.json 中的 ip:port 信息


def handle_masscan(target):
    index = 0
    with open(target, 'r') as f:
        for line in f:
            index += 1
            if line.startswith('{ '):
                temp = json.loads(line[:-2])
                ip = str(temp["ip"]).strip()
                port = str(temp["ports"][0]["port"]).strip()
                ip_port = [ip, port]
                ip_ports.append(ip_port)


def timer(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        res = func(*args, **kwargs)
        print('共耗时约 {:.2f} 秒'.format(time.time() - start))
        return res

    return wrapper

@timer
def gen():
    actions = []
    for line in ip_ports:
        # 拼接插入数据结构
        action = {
            "_index": "server_port_info_2020_q4",
            "_type": "doc",
            "_source": {
                "ip": line[0],
                "port": line[1],
                "@timestamp": time_format,
            }
        }
        actions.append(action)
    g(es, actions)


if __name__ == '__main__':
    target = '../port_info_2_es/masscan.json'
    handle_masscan(target)
    gen()
    pass

参考:

  Elasticsearch - 使用Python批量写入数据:

    https://www.cnblogs.com/Neeo/articles/10788573.html

  使用Python-elasticsearch-bulk批量快速向elasticsearch插入数据:
    https://blog.csdn.net/weixin_39198406/article/details/82983256

  Bulk helpers:

    https://elasticsearch-py.readthedocs.io/en/7.10.0/helpers.html

posted @ 2021-02-05 10:35  starnight_cyber  阅读(1669)  评论(0编辑  收藏  举报