ElasticSearch - Match Query with fuzziness searching alphanumeric
up vote
0
down vote
favorite
Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.
Please find my below query that am running in kibana
GET index_name/_search
{
"query": {
"match" : {
"values" : {
"query" : "A661752110",
"operator" : "and",
"fuzziness": 1,
"boost": 1.0,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am expecting results as below :
A661752110
A66175211012
A661752110111
A661752110-12
A661752110-111
But am getting results like :
A661752110
A661752111
A661752119
Please find my mapping details :
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"id":{
"type": "text"
}
}
}
}
}
java elasticsearch curl kibana elastic-stack
add a comment |
up vote
0
down vote
favorite
Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.
Please find my below query that am running in kibana
GET index_name/_search
{
"query": {
"match" : {
"values" : {
"query" : "A661752110",
"operator" : "and",
"fuzziness": 1,
"boost": 1.0,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am expecting results as below :
A661752110
A66175211012
A661752110111
A661752110-12
A661752110-111
But am getting results like :
A661752110
A661752111
A661752119
Please find my mapping details :
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"id":{
"type": "text"
}
}
}
}
}
java elasticsearch curl kibana elastic-stack
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.
Please find my below query that am running in kibana
GET index_name/_search
{
"query": {
"match" : {
"values" : {
"query" : "A661752110",
"operator" : "and",
"fuzziness": 1,
"boost": 1.0,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am expecting results as below :
A661752110
A66175211012
A661752110111
A661752110-12
A661752110-111
But am getting results like :
A661752110
A661752111
A661752119
Please find my mapping details :
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"id":{
"type": "text"
}
}
}
}
}
java elasticsearch curl kibana elastic-stack
Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.
Please find my below query that am running in kibana
GET index_name/_search
{
"query": {
"match" : {
"values" : {
"query" : "A661752110",
"operator" : "and",
"fuzziness": 1,
"boost": 1.0,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Am expecting results as below :
A661752110
A66175211012
A661752110111
A661752110-12
A661752110-111
But am getting results like :
A661752110
A661752111
A661752119
Please find my mapping details :
PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"id":{
"type": "text"
}
}
}
}
}
java elasticsearch curl kibana elastic-stack
java elasticsearch curl kibana elastic-stack
asked Nov 7 at 12:23
Karthikeyan
560312
560312
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:
- Substitution of one character for another: _f_ox → _b_ox
- Insertion of a new character: sic → sic_k_
- Deletion of a character: b_l_ack → back
- Transposition of two adjacent characters: _st_ar → _ts_ar
The edit distance is controlled in the search request with the fuzziness
parameter. You specified a fuzziness
of 1
which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".
The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.
Some suggestions to achieve what you want:
- If you want
A661752110-12
andA661752110-111
to match. You can use a tokenizer that splits text when it finds a-
. This is what the standard tokenizer does for example. - If you further want
A66175211012
andA661752110111
, the best choice will be to use a regexp query like this
{ "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }
regexp
wont satisfy my requirement becauseA661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg :bcf84729
orxc948xcs90
like wise.
– Karthikeyan
Nov 7 at 17:10
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:
- Substitution of one character for another: _f_ox → _b_ox
- Insertion of a new character: sic → sic_k_
- Deletion of a character: b_l_ack → back
- Transposition of two adjacent characters: _st_ar → _ts_ar
The edit distance is controlled in the search request with the fuzziness
parameter. You specified a fuzziness
of 1
which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".
The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.
Some suggestions to achieve what you want:
- If you want
A661752110-12
andA661752110-111
to match. You can use a tokenizer that splits text when it finds a-
. This is what the standard tokenizer does for example. - If you further want
A66175211012
andA661752110111
, the best choice will be to use a regexp query like this
{ "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }
regexp
wont satisfy my requirement becauseA661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg :bcf84729
orxc948xcs90
like wise.
– Karthikeyan
Nov 7 at 17:10
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
add a comment |
up vote
1
down vote
Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:
- Substitution of one character for another: _f_ox → _b_ox
- Insertion of a new character: sic → sic_k_
- Deletion of a character: b_l_ack → back
- Transposition of two adjacent characters: _st_ar → _ts_ar
The edit distance is controlled in the search request with the fuzziness
parameter. You specified a fuzziness
of 1
which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".
The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.
Some suggestions to achieve what you want:
- If you want
A661752110-12
andA661752110-111
to match. You can use a tokenizer that splits text when it finds a-
. This is what the standard tokenizer does for example. - If you further want
A66175211012
andA661752110111
, the best choice will be to use a regexp query like this
{ "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }
regexp
wont satisfy my requirement becauseA661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg :bcf84729
orxc948xcs90
like wise.
– Karthikeyan
Nov 7 at 17:10
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
add a comment |
up vote
1
down vote
up vote
1
down vote
Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:
- Substitution of one character for another: _f_ox → _b_ox
- Insertion of a new character: sic → sic_k_
- Deletion of a character: b_l_ack → back
- Transposition of two adjacent characters: _st_ar → _ts_ar
The edit distance is controlled in the search request with the fuzziness
parameter. You specified a fuzziness
of 1
which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".
The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.
Some suggestions to achieve what you want:
- If you want
A661752110-12
andA661752110-111
to match. You can use a tokenizer that splits text when it finds a-
. This is what the standard tokenizer does for example. - If you further want
A66175211012
andA661752110111
, the best choice will be to use a regexp query like this
{ "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }
Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:
- Substitution of one character for another: _f_ox → _b_ox
- Insertion of a new character: sic → sic_k_
- Deletion of a character: b_l_ack → back
- Transposition of two adjacent characters: _st_ar → _ts_ar
The edit distance is controlled in the search request with the fuzziness
parameter. You specified a fuzziness
of 1
which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".
The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.
Some suggestions to achieve what you want:
- If you want
A661752110-12
andA661752110-111
to match. You can use a tokenizer that splits text when it finds a-
. This is what the standard tokenizer does for example. - If you further want
A66175211012
andA661752110111
, the best choice will be to use a regexp query like this
{ "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }
edited Nov 12 at 20:07
martin-g
11.9k1825
11.9k1825
answered Nov 7 at 14:28
Benoit Guigal
3461317
3461317
regexp
wont satisfy my requirement becauseA661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg :bcf84729
orxc948xcs90
like wise.
– Karthikeyan
Nov 7 at 17:10
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
add a comment |
regexp
wont satisfy my requirement becauseA661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg :bcf84729
orxc948xcs90
like wise.
– Karthikeyan
Nov 7 at 17:10
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
regexp
wont satisfy my requirement because A661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729
or xc948xcs90
like wise.– Karthikeyan
Nov 7 at 17:10
regexp
wont satisfy my requirement because A661752110
is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729
or xc948xcs90
like wise.– Karthikeyan
Nov 7 at 17:10
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
– Benoit Guigal
Nov 8 at 8:43
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189421%2felasticsearch-match-query-with-fuzziness-searching-alphanumeric%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown