ElasticSearch利用自定义normalizer实现keyword字段忽略大小写搜索
ElasticSearch的keyword字段用于存储不需要分词的文本,检索时作为整体处理,也可用于排序。但是keyword区分大小写,存储GUID、MD5类数据时检索很不方便,可以通过自定义normalizer实现忽略大小写的效果,但是要在创建索引时提前设置好mapping。比如:
1 PUT /test_index 2 { 3 "settings": { 4 "analysis": { 5 "normalizer": { 6 "case_insensitive_normalizer": { 7 "type": "custom", 8 "filter": [ "lowercase" ] 9 } 10 } 11 } 12 }, 13 "mappings": { 14 "properties": { 15 "md5": { 16 "type": "keyword", 17 "normalizer": "case_insensitive_normalizer" 18 } 19 } 20 } 21 }
这样再写入md5数据时用大小写都可以检索到。但是这样操作不是很方便,每创建一个索引都要提前创建映射,对于需要按日期分index的场景不友好,此时可以利用dynamic_templates统一设置。
1 PUT /_template/default 2 { 3 "order": -1, 4 "index_patterns": [ "*" ], 5 "settings": { 6 "analysis": { 7 "normalizer": { 8 "case_insensitive_normalizer": { 9 "type": "custom", 10 "filter": [ "lowercase" ] 11 } 12 } 13 } 14 }, 15 "mappings": { 16 "dynamic_templates": [ 17 { 18 "id_as_keyword": { 19 "mapping": { 20 "type": "keyword", 21 "normalizer": "case_insensitive_normalizer" 22 }, 23 "match_mapping_type": "string", 24 "path_match": "*.id" 25 } 26 }, 27 { 28 "md5_as_keyword": { 29 "mapping": { 30 "type": "keyword", 31 "normalizer": "case_insensitive_normalizer" 32 }, 33 "match_mapping_type": "string", 34 "match": "md5" 35 } 36 }, 37 { 38 "strings_as_text": { 39 "match_mapping_type": "string", 40 "mapping": { 41 "type": "text", 42 "fields": { 43 "keyword": { 44 "type": "keyword", 45 "ignore_above": 64, 46 "normalizer": "case_insensitive_normalizer" 47 } 48 } 49 } 50 } 51 } 52 ] 53 } 54 }
上述请求创建了一个默认模板,并定义了3个dynamic_template,前两个设置了所有*.id字段和md5字段都是忽略大小写的keyword,第三个设置其它string类的字段都设置为text,并且对长度小于64的增加.keyword子字段,同样忽略大小写。这样再创建个新索引并写入一条数据后查看mappings:
1 POST /new_index/_doc 2 { 3 "md5": "92609868BE899029B0453F98A17BF51B", 4 "name": "abc.exe", 5 "user": { 6 "id": "AAA", 7 "name": "administrator" 8 } 9 }
1 GET /new_index/_mappings 2 { 3 "new_index": { 4 "mappings": { 5 "dynamic_templates": [ 6 { 7 "id_as_keyword": { 8 "path_match": "*.id", 9 "match_mapping_type": "string", 10 "mapping": { 11 "normalizer": "case_insensitive_normalizer", 12 "type": "keyword" 13 } 14 } 15 }, 16 { 17 "md5_as_keyword": { 18 "match": "md5", 19 "match_mapping_type": "string", 20 "mapping": { 21 "normalizer": "case_insensitive_normalizer", 22 "type": "keyword" 23 } 24 } 25 }, 26 { 27 "strings_as_text": { 28 "match_mapping_type": "string", 29 "mapping": { 30 "fields": { 31 "keyword": { 32 "normalizer": "case_insensitive_normalizer", 33 "ignore_above": 64, 34 "type": "keyword" 35 } 36 }, 37 "type": "text" 38 } 39 } 40 } 41 ], 42 "properties": { 43 "md5": { 44 "type": "keyword", 45 "normalizer": "case_insensitive_normalizer" 46 }, 47 "name": { 48 "type": "text", 49 "fields": { 50 "keyword": { 51 "type": "keyword", 52 "ignore_above": 64, 53 "normalizer": "case_insensitive_normalizer" 54 } 55 } 56 }, 57 "user": { 58 "properties": { 59 "id": { 60 "type": "keyword", 61 "normalizer": "case_insensitive_normalizer" 62 }, 63 "name": { 64 "type": "text", 65 "fields": { 66 "keyword": { 67 "type": "keyword", 68 "ignore_above": 64, 69 "normalizer": "case_insensitive_normalizer" 70 } 71 } 72 } 73 } 74 } 75 } 76 } 77 } 78 }
可以看到properties里,md5和user.id是keyword,name和user.name是text,并且增加了.keyword子字段。做个查询:
1 GET /new_index/_search 2 { 3 "query": { 4 "match": { 5 "user.id": "aaa" 6 } 7 } 8 }
可以正常检索出之前添加的记录,即使已存储的数据user.id是大写的AAA也不影响。
浙公网安备 33010602011771号