Elasticsearch aggregation on values in nested list (array)












0















I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:



{
"categories": [
"Category1",
"Category2"
],
"product_name": "productx"
}


Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query



{
"categories": [
{"name": "Category1"},
{"name": "Category2"}
],
"product_name": "productx"
}









share|improve this question



























    0















    I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:



    {
    "categories": [
    "Category1",
    "Category2"
    ],
    "product_name": "productx"
    }


    Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query



    {
    "categories": [
    {"name": "Category1"},
    {"name": "Category2"}
    ],
    "product_name": "productx"
    }









    share|improve this question

























      0












      0








      0








      I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:



      {
      "categories": [
      "Category1",
      "Category2"
      ],
      "product_name": "productx"
      }


      Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query



      {
      "categories": [
      {"name": "Category1"},
      {"name": "Category2"}
      ],
      "product_name": "productx"
      }









      share|improve this question














      I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:



      {
      "categories": [
      "Category1",
      "Category2"
      ],
      "product_name": "productx"
      }


      Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query



      {
      "categories": [
      {"name": "Category1"},
      {"name": "Category2"}
      ],
      "product_name": "productx"
      }






      elasticsearch elasticsearch-aggregation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 16 '18 at 3:24









      Sameera GodakandaSameera Godakanda

      32




      32
























          1 Answer
          1






          active

          oldest

          votes


















          0














          Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.



          Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.



          Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.



          For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.



          Aggregation Query



          POST <your_index_name>/_search
          {
          "size": 0,
          "aggs": {
          "myaggs": {
          "terms": {
          "size": 100,
          "script": {
          "inline": """
          def myString = "";
          def list = new ArrayList();
          for(int i=0; i<doc['categories'].length; i++){
          myString = doc['categories'][i] + ", " + doc['product'].value;
          list.add(myString);
          }
          return list;
          """
          }
          }
          }
          }
          }


          Aggregation Response



          {
          "took": 1,
          "timed_out": false,
          "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
          },
          "hits": {
          "total": 1,
          "max_score": 0,
          "hits":
          },
          "aggregations": {
          "myaggs": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
          {
          "key": "category1, productx",
          "doc_count": 1
          },
          {
          "key": "category2, productx",
          "doc_count": 1
          }
          ]
          }
          }
          }


          Hope it helps!






          share|improve this answer





















          • 1





            I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

            – Sameera Godakanda
            Nov 16 '18 at 8:56













          • The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

            – Kamal
            Nov 16 '18 at 9:19








          • 1





            Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

            – Sameera Godakanda
            Nov 16 '18 at 11:18











          • That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

            – Kamal
            Nov 16 '18 at 11:41











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330971%2felasticsearch-aggregation-on-values-in-nested-list-array%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.



          Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.



          Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.



          For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.



          Aggregation Query



          POST <your_index_name>/_search
          {
          "size": 0,
          "aggs": {
          "myaggs": {
          "terms": {
          "size": 100,
          "script": {
          "inline": """
          def myString = "";
          def list = new ArrayList();
          for(int i=0; i<doc['categories'].length; i++){
          myString = doc['categories'][i] + ", " + doc['product'].value;
          list.add(myString);
          }
          return list;
          """
          }
          }
          }
          }
          }


          Aggregation Response



          {
          "took": 1,
          "timed_out": false,
          "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
          },
          "hits": {
          "total": 1,
          "max_score": 0,
          "hits":
          },
          "aggregations": {
          "myaggs": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
          {
          "key": "category1, productx",
          "doc_count": 1
          },
          {
          "key": "category2, productx",
          "doc_count": 1
          }
          ]
          }
          }
          }


          Hope it helps!






          share|improve this answer





















          • 1





            I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

            – Sameera Godakanda
            Nov 16 '18 at 8:56













          • The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

            – Kamal
            Nov 16 '18 at 9:19








          • 1





            Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

            – Sameera Godakanda
            Nov 16 '18 at 11:18











          • That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

            – Kamal
            Nov 16 '18 at 11:41
















          0














          Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.



          Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.



          Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.



          For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.



          Aggregation Query



          POST <your_index_name>/_search
          {
          "size": 0,
          "aggs": {
          "myaggs": {
          "terms": {
          "size": 100,
          "script": {
          "inline": """
          def myString = "";
          def list = new ArrayList();
          for(int i=0; i<doc['categories'].length; i++){
          myString = doc['categories'][i] + ", " + doc['product'].value;
          list.add(myString);
          }
          return list;
          """
          }
          }
          }
          }
          }


          Aggregation Response



          {
          "took": 1,
          "timed_out": false,
          "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
          },
          "hits": {
          "total": 1,
          "max_score": 0,
          "hits":
          },
          "aggregations": {
          "myaggs": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
          {
          "key": "category1, productx",
          "doc_count": 1
          },
          {
          "key": "category2, productx",
          "doc_count": 1
          }
          ]
          }
          }
          }


          Hope it helps!






          share|improve this answer





















          • 1





            I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

            – Sameera Godakanda
            Nov 16 '18 at 8:56













          • The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

            – Kamal
            Nov 16 '18 at 9:19








          • 1





            Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

            – Sameera Godakanda
            Nov 16 '18 at 11:18











          • That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

            – Kamal
            Nov 16 '18 at 11:41














          0












          0








          0







          Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.



          Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.



          Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.



          For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.



          Aggregation Query



          POST <your_index_name>/_search
          {
          "size": 0,
          "aggs": {
          "myaggs": {
          "terms": {
          "size": 100,
          "script": {
          "inline": """
          def myString = "";
          def list = new ArrayList();
          for(int i=0; i<doc['categories'].length; i++){
          myString = doc['categories'][i] + ", " + doc['product'].value;
          list.add(myString);
          }
          return list;
          """
          }
          }
          }
          }
          }


          Aggregation Response



          {
          "took": 1,
          "timed_out": false,
          "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
          },
          "hits": {
          "total": 1,
          "max_score": 0,
          "hits":
          },
          "aggregations": {
          "myaggs": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
          {
          "key": "category1, productx",
          "doc_count": 1
          },
          {
          "key": "category2, productx",
          "doc_count": 1
          }
          ]
          }
          }
          }


          Hope it helps!






          share|improve this answer















          Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.



          Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.



          Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.



          For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.



          Aggregation Query



          POST <your_index_name>/_search
          {
          "size": 0,
          "aggs": {
          "myaggs": {
          "terms": {
          "size": 100,
          "script": {
          "inline": """
          def myString = "";
          def list = new ArrayList();
          for(int i=0; i<doc['categories'].length; i++){
          myString = doc['categories'][i] + ", " + doc['product'].value;
          list.add(myString);
          }
          return list;
          """
          }
          }
          }
          }
          }


          Aggregation Response



          {
          "took": 1,
          "timed_out": false,
          "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
          },
          "hits": {
          "total": 1,
          "max_score": 0,
          "hits":
          },
          "aggregations": {
          "myaggs": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
          {
          "key": "category1, productx",
          "doc_count": 1
          },
          {
          "key": "category2, productx",
          "doc_count": 1
          }
          ]
          }
          }
          }


          Hope it helps!







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 16 '18 at 11:41

























          answered Nov 16 '18 at 7:03









          KamalKamal

          1,6531920




          1,6531920








          • 1





            I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

            – Sameera Godakanda
            Nov 16 '18 at 8:56













          • The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

            – Kamal
            Nov 16 '18 at 9:19








          • 1





            Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

            – Sameera Godakanda
            Nov 16 '18 at 11:18











          • That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

            – Kamal
            Nov 16 '18 at 11:41














          • 1





            I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

            – Sameera Godakanda
            Nov 16 '18 at 8:56













          • The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

            – Kamal
            Nov 16 '18 at 9:19








          • 1





            Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

            – Sameera Godakanda
            Nov 16 '18 at 11:18











          • That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

            – Kamal
            Nov 16 '18 at 11:41








          1




          1





          I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

          – Sameera Godakanda
          Nov 16 '18 at 8:56







          I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type.

          – Sameera Godakanda
          Nov 16 '18 at 8:56















          The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

          – Kamal
          Nov 16 '18 at 9:19







          The aggregation I've provided works if you replace categories with categories.keyword. However yes, I'd suggest you to play around with nested and non-nested fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of nested aggregations.

          – Kamal
          Nov 16 '18 at 9:19






          1




          1





          Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

          – Sameera Godakanda
          Nov 16 '18 at 11:18





          Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there.

          – Sameera Godakanda
          Nov 16 '18 at 11:18













          That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

          – Kamal
          Nov 16 '18 at 11:41





          That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :|

          – Kamal
          Nov 16 '18 at 11:41


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330971%2felasticsearch-aggregation-on-values-in-nested-list-array%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Xamarin.form Move up view when keyboard appear

          Post-Redirect-Get with Spring WebFlux and Thymeleaf

          Anylogic : not able to use stopDelay()