ElasticSearch - Match Query with fuzziness searching alphanumeric











up vote
0
down vote

favorite












Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.



Please find my below query that am running in kibana



GET index_name/_search
{
"query": {
"match" : {
"values" : {
"query" : "A661752110",
"operator" : "and",
"fuzziness": 1,
"boost": 1.0,
"prefix_length": 0,
"max_expansions": 100

}
}
}
}


Am expecting results as below :



A661752110
A66175211012
A661752110111
A661752110-12
A661752110-111


But am getting results like :



A661752110
A661752111
A661752119


Please find my mapping details :



PUT index_name
{
"settings": {
"analysis": {
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": ["lowercase", "asciifolding"]
}
}
}
},

"mappings": {
"doc": {
"properties": {
"values": {
"type": "text",
"analyzer": "attr_analyzer"
},
"id":{
"type": "text"
}
}
}
}
}









share|improve this question


























    up vote
    0
    down vote

    favorite












    Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.



    Please find my below query that am running in kibana



    GET index_name/_search
    {
    "query": {
    "match" : {
    "values" : {
    "query" : "A661752110",
    "operator" : "and",
    "fuzziness": 1,
    "boost": 1.0,
    "prefix_length": 0,
    "max_expansions": 100

    }
    }
    }
    }


    Am expecting results as below :



    A661752110
    A66175211012
    A661752110111
    A661752110-12
    A661752110-111


    But am getting results like :



    A661752110
    A661752111
    A661752119


    Please find my mapping details :



    PUT index_name
    {
    "settings": {
    "analysis": {
    "analyzer": {
    "attr_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "char_filter": [
    "html_strip"
    ],
    "filter": ["lowercase", "asciifolding"]
    }
    }
    }
    },

    "mappings": {
    "doc": {
    "properties": {
    "values": {
    "type": "text",
    "analyzer": "attr_analyzer"
    },
    "id":{
    "type": "text"
    }
    }
    }
    }
    }









    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.



      Please find my below query that am running in kibana



      GET index_name/_search
      {
      "query": {
      "match" : {
      "values" : {
      "query" : "A661752110",
      "operator" : "and",
      "fuzziness": 1,
      "boost": 1.0,
      "prefix_length": 0,
      "max_expansions": 100

      }
      }
      }
      }


      Am expecting results as below :



      A661752110
      A66175211012
      A661752110111
      A661752110-12
      A661752110-111


      But am getting results like :



      A661752110
      A661752111
      A661752119


      Please find my mapping details :



      PUT index_name
      {
      "settings": {
      "analysis": {
      "analyzer": {
      "attr_analyzer": {
      "type": "custom",
      "tokenizer": "whitespace",
      "char_filter": [
      "html_strip"
      ],
      "filter": ["lowercase", "asciifolding"]
      }
      }
      }
      },

      "mappings": {
      "doc": {
      "properties": {
      "values": {
      "type": "text",
      "analyzer": "attr_analyzer"
      },
      "id":{
      "type": "text"
      }
      }
      }
      }
      }









      share|improve this question













      Using Match Query with fuzziness and querying alphanumeric term and the results is not coming properly.



      Please find my below query that am running in kibana



      GET index_name/_search
      {
      "query": {
      "match" : {
      "values" : {
      "query" : "A661752110",
      "operator" : "and",
      "fuzziness": 1,
      "boost": 1.0,
      "prefix_length": 0,
      "max_expansions": 100

      }
      }
      }
      }


      Am expecting results as below :



      A661752110
      A66175211012
      A661752110111
      A661752110-12
      A661752110-111


      But am getting results like :



      A661752110
      A661752111
      A661752119


      Please find my mapping details :



      PUT index_name
      {
      "settings": {
      "analysis": {
      "analyzer": {
      "attr_analyzer": {
      "type": "custom",
      "tokenizer": "whitespace",
      "char_filter": [
      "html_strip"
      ],
      "filter": ["lowercase", "asciifolding"]
      }
      }
      }
      },

      "mappings": {
      "doc": {
      "properties": {
      "values": {
      "type": "text",
      "analyzer": "attr_analyzer"
      },
      "id":{
      "type": "text"
      }
      }
      }
      }
      }






      java elasticsearch curl kibana elastic-stack






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 7 at 12:23









      Karthikeyan

      560312




      560312
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:




          • Substitution of one character for another: _f_ox → _b_ox

          • Insertion of a new character: sic → sic_k_

          • Deletion of a character: b_l_ack → back

          • Transposition of two adjacent characters: _st_ar → _ts_ar


          The edit distance is controlled in the search request with the fuzziness parameter. You specified a fuzziness of 1 which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".



          The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.



          Some suggestions to achieve what you want:




          • If you want A661752110-12 and A661752110-111 to match. You can use a tokenizer that splits text when it finds a -. This is what the standard tokenizer does for example.

          • If you further want A66175211012and A661752110111, the best choice will be to use a regexp query like this


          { "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }






          share|improve this answer























          • regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
            – Karthikeyan
            Nov 7 at 17:10










          • How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
            – Benoit Guigal
            Nov 8 at 8:43











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189421%2felasticsearch-match-query-with-fuzziness-searching-alphanumeric%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:




          • Substitution of one character for another: _f_ox → _b_ox

          • Insertion of a new character: sic → sic_k_

          • Deletion of a character: b_l_ack → back

          • Transposition of two adjacent characters: _st_ar → _ts_ar


          The edit distance is controlled in the search request with the fuzziness parameter. You specified a fuzziness of 1 which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".



          The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.



          Some suggestions to achieve what you want:




          • If you want A661752110-12 and A661752110-111 to match. You can use a tokenizer that splits text when it finds a -. This is what the standard tokenizer does for example.

          • If you further want A66175211012and A661752110111, the best choice will be to use a regexp query like this


          { "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }






          share|improve this answer























          • regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
            – Karthikeyan
            Nov 7 at 17:10










          • How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
            – Benoit Guigal
            Nov 8 at 8:43















          up vote
          1
          down vote













          Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:




          • Substitution of one character for another: _f_ox → _b_ox

          • Insertion of a new character: sic → sic_k_

          • Deletion of a character: b_l_ack → back

          • Transposition of two adjacent characters: _st_ar → _ts_ar


          The edit distance is controlled in the search request with the fuzziness parameter. You specified a fuzziness of 1 which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".



          The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.



          Some suggestions to achieve what you want:




          • If you want A661752110-12 and A661752110-111 to match. You can use a tokenizer that splits text when it finds a -. This is what the standard tokenizer does for example.

          • If you further want A66175211012and A661752110111, the best choice will be to use a regexp query like this


          { "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }






          share|improve this answer























          • regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
            – Karthikeyan
            Nov 7 at 17:10










          • How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
            – Benoit Guigal
            Nov 8 at 8:43













          up vote
          1
          down vote










          up vote
          1
          down vote









          Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:




          • Substitution of one character for another: _f_ox → _b_ox

          • Insertion of a new character: sic → sic_k_

          • Deletion of a character: b_l_ack → back

          • Transposition of two adjacent characters: _st_ar → _ts_ar


          The edit distance is controlled in the search request with the fuzziness parameter. You specified a fuzziness of 1 which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".



          The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.



          Some suggestions to achieve what you want:




          • If you want A661752110-12 and A661752110-111 to match. You can use a tokenizer that splits text when it finds a -. This is what the standard tokenizer does for example.

          • If you further want A66175211012and A661752110111, the best choice will be to use a regexp query like this


          { "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }






          share|improve this answer














          Fuzzy matching allows to treat two words that are "fuzzily" similar as if they were the same word. Elasticsearch uses the Damareau-Levenshtein distance to measure the similarity of two strings. The Damareau-Levenshtein distance measures the number of single character edit to a string, allowing four kind of edits:




          • Substitution of one character for another: _f_ox → _b_ox

          • Insertion of a new character: sic → sic_k_

          • Deletion of a character: b_l_ack → back

          • Transposition of two adjacent characters: _st_ar → _ts_ar


          The edit distance is controlled in the search request with the fuzziness parameter. You specified a fuzziness of 1 which means Elasticsearch will only returns strings obtained by performing one edit (substitution, insertion, deletion or transposition) to "A661752110".



          The words you were expecting that did not show up have an edit distance strictly greater than 1. Please note that in Elasticsearch the max value authorized is 2.



          Some suggestions to achieve what you want:




          • If you want A661752110-12 and A661752110-111 to match. You can use a tokenizer that splits text when it finds a -. This is what the standard tokenizer does for example.

          • If you further want A66175211012and A661752110111, the best choice will be to use a regexp query like this


          { "query": { "regexp": { "values": { "value": "A661752110.{,3}" } } } }







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 12 at 20:07









          martin-g

          11.9k1825




          11.9k1825










          answered Nov 7 at 14:28









          Benoit Guigal

          3461317




          3461317












          • regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
            – Karthikeyan
            Nov 7 at 17:10










          • How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
            – Benoit Guigal
            Nov 8 at 8:43


















          • regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
            – Karthikeyan
            Nov 7 at 17:10










          • How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
            – Benoit Guigal
            Nov 8 at 8:43
















          regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
          – Karthikeyan
          Nov 7 at 17:10




          regexp wont satisfy my requirement because A661752110 is just an example., like wise i have many different form of values to be search. But its all alphanumeric. for eg : bcf84729 or xc948xcs90 like wise.
          – Karthikeyan
          Nov 7 at 17:10












          How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
          – Benoit Guigal
          Nov 8 at 8:43




          How many of them do you have ? You can use a bool query to combine multiple queries into an AND query elastic.co/guide/en/elasticsearch/reference/current/…
          – Benoit Guigal
          Nov 8 at 8:43


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189421%2felasticsearch-match-query-with-fuzziness-searching-alphanumeric%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Hercules Kyvelos

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud