ElasticSearch利用自定义normalizer实现keyword字段忽略大小写搜索

ElasticSearch的keyword字段用于存储不需要分词的文本,检索时作为整体处理,也可用于排序。但是keyword区分大小写,存储GUID、MD5类数据时检索很不方便,可以通过自定义normalizer实现忽略大小写的效果,但是要在创建索引时提前设置好mapping。比如:

 1 PUT /test_index
 2 {
 3     "settings": {
 4         "analysis": {
 5             "normalizer": {
 6                 "case_insensitive_normalizer": {
 7                     "type": "custom",
 8                     "filter": [ "lowercase" ]
 9                 }
10             }
11         }
12     },
13     "mappings": {
14         "properties": {
15             "md5": {
16                 "type": "keyword",
17                 "normalizer": "case_insensitive_normalizer"
18             }
19         }
20     }
21 }

 

这样再写入md5数据时用大小写都可以检索到。但是这样操作不是很方便,每创建一个索引都要提前创建映射,对于需要按日期分index的场景不友好,此时可以利用dynamic_templates统一设置。

 1 PUT /_template/default
 2 {
 3     "order": -1,
 4     "index_patterns": [ "*" ],
 5     "settings": {
 6         "analysis": {
 7             "normalizer": {
 8                 "case_insensitive_normalizer": {
 9                     "type": "custom",
10                     "filter": [ "lowercase" ]
11                 }
12             }
13         }
14     },
15     "mappings": {
16         "dynamic_templates": [
17             {
18                 "id_as_keyword": {
19                     "mapping": {
20                         "type": "keyword",
21                         "normalizer": "case_insensitive_normalizer"
22                     },
23                     "match_mapping_type": "string",
24                     "path_match": "*.id"
25                 }
26             },
27             {
28                 "md5_as_keyword": {
29                     "mapping": {
30                         "type": "keyword",
31                         "normalizer": "case_insensitive_normalizer"
32                     },
33                     "match_mapping_type": "string",
34                     "match": "md5"
35                 }
36             },
37             {
38                 "strings_as_text": {
39                     "match_mapping_type": "string",
40                     "mapping": {
41                         "type": "text",
42                         "fields": {
43                             "keyword": {
44                                 "type": "keyword",
45                                 "ignore_above": 64,
46                                 "normalizer": "case_insensitive_normalizer"
47                             }
48                         }
49                     }
50                 }
51             }
52         ]
53     }
54 }

 

上述请求创建了一个默认模板,并定义了3个dynamic_template,前两个设置了所有*.id字段和md5字段都是忽略大小写的keyword,第三个设置其它string类的字段都设置为text,并且对长度小于64的增加.keyword子字段,同样忽略大小写。这样再创建个新索引并写入一条数据后查看mappings:

1 POST /new_index/_doc
2 {
3     "md5": "92609868BE899029B0453F98A17BF51B",
4     "name": "abc.exe",
5     "user": {
6         "id": "AAA",
7         "name": "administrator"
8     }
9 }
 1 GET /new_index/_mappings
 2 {
 3     "new_index": {
 4         "mappings": {
 5             "dynamic_templates": [
 6                 {
 7                     "id_as_keyword": {
 8                         "path_match": "*.id",
 9                         "match_mapping_type": "string",
10                         "mapping": {
11                             "normalizer": "case_insensitive_normalizer",
12                             "type": "keyword"
13                         }
14                     }
15                 },
16                 {
17                     "md5_as_keyword": {
18                         "match": "md5",
19                         "match_mapping_type": "string",
20                         "mapping": {
21                             "normalizer": "case_insensitive_normalizer",
22                             "type": "keyword"
23                         }
24                     }
25                 },
26                 {
27                     "strings_as_text": {
28                         "match_mapping_type": "string",
29                         "mapping": {
30                             "fields": {
31                                 "keyword": {
32                                     "normalizer": "case_insensitive_normalizer",
33                                     "ignore_above": 64,
34                                     "type": "keyword"
35                                 }
36                             },
37                             "type": "text"
38                         }
39                     }
40                 }
41             ],
42             "properties": {
43                 "md5": {
44                     "type": "keyword",
45                     "normalizer": "case_insensitive_normalizer"
46                 },
47                 "name": {
48                     "type": "text",
49                     "fields": {
50                         "keyword": {
51                             "type": "keyword",
52                             "ignore_above": 64,
53                             "normalizer": "case_insensitive_normalizer"
54                         }
55                     }
56                 },
57                 "user": {
58                     "properties": {
59                         "id": {
60                             "type": "keyword",
61                             "normalizer": "case_insensitive_normalizer"
62                         },
63                         "name": {
64                             "type": "text",
65                             "fields": {
66                                 "keyword": {
67                                     "type": "keyword",
68                                     "ignore_above": 64,
69                                     "normalizer": "case_insensitive_normalizer"
70                                 }
71                             }
72                         }
73                     }
74                 }
75             }
76         }
77     }
78 }

 

可以看到properties里,md5和user.id是keyword,name和user.name是text,并且增加了.keyword子字段。做个查询:

1 GET /new_index/_search
2 {
3     "query": {
4         "match": {
5             "user.id": "aaa"
6         }
7     }
8 }

可以正常检索出之前添加的记录,即使已存储的数据user.id是大写的AAA也不影响。

 

posted on 2025-11-11 11:07  BoyTNT  阅读(0)  评论(0)    收藏  举报

导航