提问人:RobertPro 提问时间:11/16/2023 更新时间:11/18/2023 访问量:35
修复 ElasticSearch/OpenSearch 查询的模糊问题
Fix fuzziness for ElasticSearch/OpenSearch query
问:
我在尝试进行简单查询时遇到问题,请参阅以下数据:
拥有此数据:
POST test/_doc/1
{
"id": 1,
"title": "Test Name"
}
POST test/_doc/2
{
"id": 2,
"title": "TestName"
}
而这个查询:
GET test/_search
{
"query": {
"match": {
"title": {
"query": "TestName",
"fuzziness": "AUTO"
}
}
}
}
使用此输出:
{
...
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.605183,
"hits": [
{
"_index": "test",
"_id": "2",
"_score": 1.605183,
"_source": {
"id": 2,
"title": "TestName"
}
}
]
}
}
为什么输出不返回两条记录?
我该如何解决?
答:
1赞
Paulo
11/16/2023
#1
顶级域名;
Fizziness 在 elasticsearch 中是有限制的。限制在 Levenshtien 距离上,设置为最大值 2。
这意味着您将无法匹配超过 2 次编辑的任何内容。
要了解
POST 77491663/_doc/1
{
"id": 1,
"title": "Test Name"
}
POST 77491663/_doc/2
{
"id": 2,
"title": "TestName"
}
POST 77491663/_doc/3
{
"id": 2,
"title": "TestNa"
}
GET 77491663/_search
{
"query": {
"match": {
"title": {
"query": "TestName",
"fuzziness": "2"
}
}
}
}
应该给你
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0925692,
"hits": [
{
"_index": "77491663",
"_id": "2",
"_score": 1.0925692,
"_source": {
"id": 2,
"title": "TestName"
}
},
{
"_index": "77491663",
"_id": "3",
"_score": 0.7283795,
"_source": {
"id": 2,
"title": "TestNa"
}
}
]
}
}
要修复
您可能需要研究分析仪的功能。
例如,如果你要使用 ngram,你就会让它工作。
评论
0赞
RobertPro
11/17/2023
我能够让它与 ngram 一起工作,但看起来只有在它全是大写或小写的情况下才有效,这是我可以改变的,但我更喜欢做不敏感的查询。
0赞
Paulo
11/17/2023
您可能需要添加小写的处理器?elastic.co/guide/en/elasticsearch/reference/current/......
0赞
RobertPro
11/17/2023
是的,我能够让它工作,很快就会在这里更新。
0赞
Mouad Slimane
11/16/2023
#2
如果您使用的是默认的 elasticsearch,则该值将被拆分并存储到单独的 中,这意味着当您使用值 elasticsearch 进行搜索时,请检查是否匹配,是否具有模糊性或匹配级别,而不是短语,这就是您无法获得第一个文档的原因analyzer
Test Name
inverted index
TestName
TestName
Test
Name
Test Name
0赞
RobertPro
11/18/2023
#3
因此,解决方案是使用 elasticsearch edge n-gram 完成的,我还必须在分析器中添加过滤器小写字母。
谢谢@paulo!
PUT test
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"edge_ngram": {
"type": "text",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
POST test/_doc/1
{
"title": "Test Name"
}
POST test/_doc/2
{
"title": "TestName"
}
GET test/_search
{
"query": {
"match": {
"title.edge_ngram": {
"query": "Test Name",
"fuzziness": "AUTO"
}
}
}
}
现在它返回预期的输出:
{
...
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 3.1782691,
"hits": [
{
"_index": "test",
"_id": "1",
"_score": 3.1782691,
"_source": {
"title": "Test Name"
}
},
{
"_index": "test",
"_id": "2",
"_score": 0.68817455,
"_source": {
"title": "TestName"
}
}
]
}
}
评论