为基于elasticsearch和flask的web端检索系统增加新的检索功能

序言

距离上次写博客已经有20天的光阴了,期间我为之前构建的系统增加了新的检索功能,这是记录了之前检索系统的博客:

实习工作小结——下载和导入Elasticsearch模块实现自动比对的功能https://timtian.blog.csdn.net/article/details/124319573?spm=1001.2014.3001.5502接下来先看结果展示:

系统展示

基于Flask和ES的检索系统的系统演示

在上面的视屏中可以看到,我们实现了多字段的检索,entity_id和entity_type是分开检索的,但是模板基本一致,都都rename(重命名功能),以及字段的增删功能。

前端字段增删功能的实现

首先我们要注意的是,按某一行的“-”号按钮,就一定要把该删除行,而不是删掉其他的行。我的想法是通过JS的DOM树的parentNode方法实现,首先先得知道该页面的html是怎样的:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>search entity id</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='css/bootstrap.min.css') }}">
    <script src="//cdn.bootcss.com/jquery/1.11.3/jquery.min.js"></script>
    <script src="{{ url_for('static', filename='js/bootstrap.bundle.js') }}"></script>
    <script src="{{ url_for('static', filename='js/bootstrap.bundle.min.js') }}"></script>
</head>
<body>
<nav aria-label="breadcrumb">
  <ol class="breadcrumb">
    <li class="breadcrumb-item"><a href="{{ url_for('index') }}">首页</a></li>
    <li class="breadcrumb-item active" aria-current="page">entity_id</li>
    <li class="breadcrumb-item"><a href="{{ url_for('entity_type') }}">entity_type</a></li>
  </ol>
</nav>
<form method="post" enctype="multipart/form-data" action="{{ url_for('search_entity_id') }}">
    <div class="jumbotron">
    <h1 class="display-4">Enter Entity_id</h1>
    <hr class="my-4">
<div class="input-group mb-3">
  <select class="custom-select" name="luoji_1">
    <option value="AND" selected>AND</option>
    <option value="OR">OR</option>
    <option value="NOT">NOT</option>
  </select>
  <div class="input-group-prepend">
    <span class="input-group-text">entity_id</span>
  </div>
  <input name="entity_id_1" type="text" class="form-control" aria-describedby="basic-addon1">
       <div class="input-group-append" id="1">
           <span class="input-group-text" id="add">+</span>
  </div><br/></div><div class="input-group mb-3" id="flag1">
</div>

  <div class="input-group mb-3">
  <div class="input-group-prepend">
    <span class="input-group-text" id="basic-addon3" >Rename your file</span>
  </div>
  <input type="text" class="form-control" id="basic-url" name="rename">
</div>
    <hr class="my-4">
    <input class="btn btn-primary btn-lg" type="submit" onclick="getNumber()">
    <ul>
    {% for message in get_flashed_messages() %}
        <div class="alert alert-warning alert-dismissible fade show" role="alert">
            {{ message }}
  <button type="button" class="close" data-dismiss="alert" aria-label="Close">
    <span aria-hidden="true">&times;</span>
  </button>

</div>
    {% endfor %}
    </ul>
    </div>

</form>
</body>
</html>

可以看出来, “-”号按钮的那一行是“-”的爷爷辈分,所以我们可以获取“-”号的太爷爷辈分的节点,进而删除爷爷辈分的节点。代码如下:

function jian(obj){
                //alert(obj);
                var parentOfGrandparent = obj.parentNode.parentNode.parentNode;
                var grandparent = obj.parentNode.parentNode;
                parentOfGrandparent.removeChild(grandparent);
                number = document.getElementById("total_num");
                //cnt--;

                aim = obj.parentNode.id;
                var pos = 0;
                for(var i=0;i<a.length;i++){
                    if(a[i] == aim){
                        pos = i;
                    }
                }
                a.splice(pos, 1);
            }

你会发现多出来了一段:

                aim = obj.parentNode.id;
                var pos = 0;
                for(var i=0;i<a.length;i++){
                    if(a[i] == aim){
                        pos = i;
                    }
                }
                a.splice(pos, 1);

这一段是为了删除a数组中的编号,前端需要和后端交互,后端就需要知道前端删掉了哪些行,又保留了那些行。我们可以将行号存放在一个数组中,记录前端存在的行号,而上述步骤就是删去被删除行的行号。通过for循环遍历存放行号的数组a,aim是删去的行号,若a[i]与之相同,便将a[i]移除。aim是通过“-”号的父节点的id属性来获取的。于是在创建一行时,就需要赋予id这个属性,以方便删除时使用。

接下来看添加行的JS代码:

            var flag1 = document.getElementById("flag1");
            var add = document.getElementById("add");
            var cnt = 1;
            var a = new Array();
            a.push(cnt);
            add.onclick = function(){

                cnt++;
                //history++;
                number = document.getElementById("total_num");
                //number.innerHTML = cnt + "";
                a.push(cnt);

                div_tmp = document.createElement("div");
                div_tmp.setAttribute("class", "input-group mb-3");
                flag1.appendChild(div_tmp);

                select = document.createElement("select");
                select.setAttribute("class", "custom-select");
                select.setAttribute("name", "luoji_" + cnt);
                div_tmp.appendChild(select);

                option1 = document.createElement("option");
                option1.setAttribute("value", "AND");
                //option1.setAttribute("selected");
                option1.innerHTML = "AND";
                select.appendChild(option1);

                option2 = document.createElement("option");
                option2.setAttribute("value", "OR");
                option2.innerHTML = "OR";
                select.appendChild(option2);

                option3 = document.createElement("option");
                option3.setAttribute("value", "NOT");
                option3.innerHTML = "NOT";
                select.appendChild(option3);

                div_tmp3 = document.createElement("div");
                div_tmp3.setAttribute("class", "input-group-prepend");
                div_tmp.appendChild(div_tmp3);

                span_tmp = document.createElement("span");
                span_tmp.setAttribute("class", "input-group-text");
                span_tmp.innerHTML = "entity_id";
                div_tmp3.appendChild(span_tmp);

                input_tmp = document.createElement("input");
                input_tmp.setAttribute("class", "form-control");
                input_tmp.setAttribute("type", "text");
                input_tmp.setAttribute("name", "entity_id_" + cnt);
                div_tmp.appendChild(input_tmp);

                div_tmp2 = document.createElement("div");
                div_tmp2.setAttribute("class", "input-group-append");
                div_tmp2.setAttribute("id", cnt + "");
                div_tmp.appendChild(div_tmp2);

                span_tmp3 = document.createElement("span");
                span_tmp3.setAttribute("class", "input-group-text");
                span_tmp3.innerHTML = "-";
                span_tmp3.setAttribute("onclick", "jian(this)");
                div_tmp2.appendChild(span_tmp3);

                span_tmp3 = document.createElement("span");

            }

同样创建行的功能也是同过DOM树实现的。这个其实没什么好讲,就是通过原有的第一行的HTML来构建新的行。重点是里面的属性,属性的标号通过cnt来计数,每增加一行,cnt就自增,以告知当前是新增的第几行,假设a数组增加了五行,那a数组的内容是这样的{1,2,3,4,5}。然后假设删掉第三行,那a数组的内容是这样的{1,2,4,5}。此时在增加一行是这样的{1,2,4,5, 6},而不是这样的{1,2,4,5, 3}。总之,已经删掉的行是不会再重复了。然后后端就可以通过这个a数组,获得当前前端页面存在的每个行的id值,进而获得各个选项的内容。你会发现这些选项均与id相关,例如:entity_id_5,luoji_5。luoji_5指的是行号为5的逻辑符是NOT还是AND抑或是OR。而entity_id_5则是行号为5的行的entity_id的内容。

还有一个js代码是发送ajax,用来发送数组a的内容给后端,代码如下,不再赘述。

        function getNumber(){
            $.ajax({
                type: 'POST',
                url: '/testGet',
                dataType: 'json',
                data: {
                    //'number': document.getElementById("total_num").innerHTML,
                    'array': a,

                },
                success: function(res){
                    console.log(res)
                },
                error: function(){
                    console.log('error')
                }
            })
	}

entity_type页面实现类似,这里不再赘述。

系统的后端实现

代码如下:

@app.route('/search_entity_id/', methods=["POST", "GET"])
def search_entity_id():
    if request.method == "POST":
        entity_id = []
        luoji= []
        for num in lis:
            entity_id.append(request.form["entity_id_" + num])
            luoji.append(request.form["luoji_" + num])
        rename = request.form["rename"]
        search_by_entity_id(entity_id, luoji, rename)
        return render_template('search_entity_id.html')
    elif request.method == "GET":
        return render_template('search_entity_id.html')


def search_by_entity_id(entity_id, luoji, rename):
    es = Elasticsearch(hosts=['http://52.14.194.191:9200'], request_timeout=60)
    body1 = {
        "query": {
            "bool": {
                "must": [],
                "should": [],
                "must_not": []
            }
        },
        "track_total_hits": "true"
    }
    for i in range(len(luoji)):
        if luoji[i] == 'AND':
            flash(luoji[i])
            body1["query"]["bool"]["must"].append(
                {
                    "match": {
                        "entities.entity_id": entity_id[i]
                    }

                })
        elif luoji[i] == 'OR':
            flash(luoji[i])
            body1["query"]["bool"]["should"].append(
                {
                    "match": {
                        "entities.entity_id": entity_id[i]
                    }

                })
        elif luoji[i] == 'NOT':
            flash(luoji[i])
            body1["query"]["bool"]["must_not"].append(
                {
                    "match": {
                        "entities.entity_id": entity_id[i]
                    }

                })

    res = es.search(index="pubmed-paper-index-2", body=body1)
    total_value = res['hits']['total']['value']

    for i in range(total_value // 10):
        body = {
            "query": {
                "bool": {
                    "must": [],
                    "should": [],
                    "must_not": []
                }
            },

            "track_total_hits": "true",
            "from": i,
            "size": 10

        }

        for i in range(len(luoji)):
            if luoji[i] == "AND":
                body["query"]["bool"]["must"].append(
                    {
                        "match": {
                            "entities.entity_id": entity_id[i]
                        }

                    })
            elif luoji[i] == "OR":
                body["query"]["bool"]["should"].append(
                    {
                        "match": {
                            "entities.entity_id": entity_id[i]
                        }

                    })
            elif luoji[i] == "NOT":
                body["query"]["bool"]["must_not"].append(
                    {
                        "match": {
                            "entities.entity_id": entity_id[i]
                        }

                    })

        results = es.search(index="pubmed-paper-index-2", body=body)
        result_list = results['hits']['hits']
        try:
            for result in result_list:
                pmid = result['_source']['pmid']
                title = result['_source']['title']
                abstract = result['_source']['abstract']
                entity_list = result['_source']['entities']

                for entity in entity_list:
                    tmp_entity_string = entity['entity_string']
                    tmp_entity_type = entity['entity_type']
                    tmp_entity_id = entity['entity_id']
                    tmp_entity_span = str(entity['span'])

                    if rename is not None:
                        f = open(f"query/{rename}.csv", 'a', newline='')
                        writer = csv.writer(f)
                        writer.writerow([
                            pmid,
                            title,
                            abstract,
                            tmp_entity_string,
                            tmp_entity_type,
                            tmp_entity_id,
                            tmp_entity_span,
                        ])
                        f.close()
                    else:
                        f = open(f"query/data.csv", 'a', newline='')
                        writer = csv.writer(f)
                        writer.writerow([
                            pmid,
                            title,
                            abstract,
                            tmp_entity_string,
                            tmp_entity_type,
                            tmp_entity_id,
                            tmp_entity_span,
                        ])
                        f.close()
        except Exception:
            pass
    if rename is not None:
        os.system(f"aws s3 cp query/{rename}.csv s3://meta-adhoc/nlp/es/ --recursive")
    else:
        os.system(f"aws s3 cp query/data.csv s3://meta-adhoc/nlp/es/ --recursive")
    flash('Over!!!')

它的逻辑是按照AND、NOT和OR的逻辑分别转化为es的must、must_not和should语法,然后进行检索,检索结果存放放在ec2对应的位置即可。不是很复杂。


以上就是这么多

END

posted @ 2022-06-21 15:51  TIM3347_Tian  阅读(11)  评论(0)    收藏  举报  来源