Loop around and select rows by a subset of the multi-index











up vote
1
down vote

favorite












I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.



I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.



For example:



                  data1
key1 key2 key3
A A A 10
A A B 11
A B A 12
A B C 13
A C A 14


Assume I am only interested in key1 + key2.



There are 3 unique combinations of key1 + key2:



(A A)
(A B)
(A C)


First time around the loop I would want to extract:



                  data1
key1 key2 key3
A A A 10
A A B 11


Second time around the loop I would want to extract:



                  data1
key1 key2 key3
A B A 12
A B C 13


Third time around the loop I would want to extract:



                  data1
key1 key2 key3
A C A 14


How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.



Thanks



**EDIT IN RESPONSE TO A COMMENT BELOW **



In psuedo-code, I was originally thinking something along the lines of:



[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>

[2] for each group in groups

[3] df_thisGroup = <extract the rows of data for this group from the main dataframe>

[4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>

[5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>

[6] move to next group


My question would be how to do steps [1] & [2] & [3]










share|improve this question




























    up vote
    1
    down vote

    favorite












    I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.



    I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.



    For example:



                      data1
    key1 key2 key3
    A A A 10
    A A B 11
    A B A 12
    A B C 13
    A C A 14


    Assume I am only interested in key1 + key2.



    There are 3 unique combinations of key1 + key2:



    (A A)
    (A B)
    (A C)


    First time around the loop I would want to extract:



                      data1
    key1 key2 key3
    A A A 10
    A A B 11


    Second time around the loop I would want to extract:



                      data1
    key1 key2 key3
    A B A 12
    A B C 13


    Third time around the loop I would want to extract:



                      data1
    key1 key2 key3
    A C A 14


    How do I do this?
    I am a COMPLETE newbie at python so the more explanation the better.



    Thanks



    **EDIT IN RESPONSE TO A COMMENT BELOW **



    In psuedo-code, I was originally thinking something along the lines of:



    [1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>

    [2] for each group in groups

    [3] df_thisGroup = <extract the rows of data for this group from the main dataframe>

    [4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>

    [5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>

    [6] move to next group


    My question would be how to do steps [1] & [2] & [3]










    share|improve this question


























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.



      I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.



      For example:



                        data1
      key1 key2 key3
      A A A 10
      A A B 11
      A B A 12
      A B C 13
      A C A 14


      Assume I am only interested in key1 + key2.



      There are 3 unique combinations of key1 + key2:



      (A A)
      (A B)
      (A C)


      First time around the loop I would want to extract:



                        data1
      key1 key2 key3
      A A A 10
      A A B 11


      Second time around the loop I would want to extract:



                        data1
      key1 key2 key3
      A B A 12
      A B C 13


      Third time around the loop I would want to extract:



                        data1
      key1 key2 key3
      A C A 14


      How do I do this?
      I am a COMPLETE newbie at python so the more explanation the better.



      Thanks



      **EDIT IN RESPONSE TO A COMMENT BELOW **



      In psuedo-code, I was originally thinking something along the lines of:



      [1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>

      [2] for each group in groups

      [3] df_thisGroup = <extract the rows of data for this group from the main dataframe>

      [4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>

      [5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>

      [6] move to next group


      My question would be how to do steps [1] & [2] & [3]










      share|improve this question















      I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.



      I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.



      For example:



                        data1
      key1 key2 key3
      A A A 10
      A A B 11
      A B A 12
      A B C 13
      A C A 14


      Assume I am only interested in key1 + key2.



      There are 3 unique combinations of key1 + key2:



      (A A)
      (A B)
      (A C)


      First time around the loop I would want to extract:



                        data1
      key1 key2 key3
      A A A 10
      A A B 11


      Second time around the loop I would want to extract:



                        data1
      key1 key2 key3
      A B A 12
      A B C 13


      Third time around the loop I would want to extract:



                        data1
      key1 key2 key3
      A C A 14


      How do I do this?
      I am a COMPLETE newbie at python so the more explanation the better.



      Thanks



      **EDIT IN RESPONSE TO A COMMENT BELOW **



      In psuedo-code, I was originally thinking something along the lines of:



      [1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>

      [2] for each group in groups

      [3] df_thisGroup = <extract the rows of data for this group from the main dataframe>

      [4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>

      [5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>

      [6] move to next group


      My question would be how to do steps [1] & [2] & [3]







      python pandas dataframe






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 5 at 21:39

























      asked Nov 5 at 3:36









      Andrewfreestuff

      286




      286
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby, with the level argument set to your keys of interest.



          keys = ['key1','key2']

          dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}


          Here, you have grouped by key1 and key2, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0, df1, etc... You can see all of the dataframes you created using:



          >>> dfs.keys()
          dict_keys(['df0', 'df1', 'df2'])


          And you can access them as you would any normal dictionary values:



          >>> dfs['df0']
          data1
          key1 key2 key3
          A A A 10
          B 11

          >>> dfs['df1']
          data1
          key1 key2 key3
          A B A 12
          C 13

          ....





          share|improve this answer





















          • Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
            – Andrewfreestuff
            Nov 5 at 8:15










          • No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
            – sacul
            Nov 5 at 15:54










          • Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
            – Andrewfreestuff
            Nov 5 at 19:21










          • There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
            – sacul
            Nov 5 at 20:07










          • I have edited my question to try and add the explanation
            – Andrewfreestuff
            Nov 5 at 21:36











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147987%2floop-around-and-select-rows-by-a-subset-of-the-multi-index%23new-answer', 'question_page');
          }
          );

          Post as a guest
































          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby, with the level argument set to your keys of interest.



          keys = ['key1','key2']

          dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}


          Here, you have grouped by key1 and key2, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0, df1, etc... You can see all of the dataframes you created using:



          >>> dfs.keys()
          dict_keys(['df0', 'df1', 'df2'])


          And you can access them as you would any normal dictionary values:



          >>> dfs['df0']
          data1
          key1 key2 key3
          A A A 10
          B 11

          >>> dfs['df1']
          data1
          key1 key2 key3
          A B A 12
          C 13

          ....





          share|improve this answer





















          • Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
            – Andrewfreestuff
            Nov 5 at 8:15










          • No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
            – sacul
            Nov 5 at 15:54










          • Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
            – Andrewfreestuff
            Nov 5 at 19:21










          • There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
            – sacul
            Nov 5 at 20:07










          • I have edited my question to try and add the explanation
            – Andrewfreestuff
            Nov 5 at 21:36















          up vote
          1
          down vote













          You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby, with the level argument set to your keys of interest.



          keys = ['key1','key2']

          dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}


          Here, you have grouped by key1 and key2, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0, df1, etc... You can see all of the dataframes you created using:



          >>> dfs.keys()
          dict_keys(['df0', 'df1', 'df2'])


          And you can access them as you would any normal dictionary values:



          >>> dfs['df0']
          data1
          key1 key2 key3
          A A A 10
          B 11

          >>> dfs['df1']
          data1
          key1 key2 key3
          A B A 12
          C 13

          ....





          share|improve this answer





















          • Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
            – Andrewfreestuff
            Nov 5 at 8:15










          • No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
            – sacul
            Nov 5 at 15:54










          • Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
            – Andrewfreestuff
            Nov 5 at 19:21










          • There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
            – sacul
            Nov 5 at 20:07










          • I have edited my question to try and add the explanation
            – Andrewfreestuff
            Nov 5 at 21:36













          up vote
          1
          down vote










          up vote
          1
          down vote









          You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby, with the level argument set to your keys of interest.



          keys = ['key1','key2']

          dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}


          Here, you have grouped by key1 and key2, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0, df1, etc... You can see all of the dataframes you created using:



          >>> dfs.keys()
          dict_keys(['df0', 'df1', 'df2'])


          And you can access them as you would any normal dictionary values:



          >>> dfs['df0']
          data1
          key1 key2 key3
          A A A 10
          B 11

          >>> dfs['df1']
          data1
          key1 key2 key3
          A B A 12
          C 13

          ....





          share|improve this answer












          You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby, with the level argument set to your keys of interest.



          keys = ['key1','key2']

          dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}


          Here, you have grouped by key1 and key2, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0, df1, etc... You can see all of the dataframes you created using:



          >>> dfs.keys()
          dict_keys(['df0', 'df1', 'df2'])


          And you can access them as you would any normal dictionary values:



          >>> dfs['df0']
          data1
          key1 key2 key3
          A A A 10
          B 11

          >>> dfs['df1']
          data1
          key1 key2 key3
          A B A 12
          C 13

          ....






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 5 at 3:43









          sacul

          26k41638




          26k41638












          • Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
            – Andrewfreestuff
            Nov 5 at 8:15










          • No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
            – sacul
            Nov 5 at 15:54










          • Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
            – Andrewfreestuff
            Nov 5 at 19:21










          • There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
            – sacul
            Nov 5 at 20:07










          • I have edited my question to try and add the explanation
            – Andrewfreestuff
            Nov 5 at 21:36


















          • Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
            – Andrewfreestuff
            Nov 5 at 8:15










          • No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
            – sacul
            Nov 5 at 15:54










          • Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
            – Andrewfreestuff
            Nov 5 at 19:21










          • There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
            – sacul
            Nov 5 at 20:07










          • I have edited my question to try and add the explanation
            – Andrewfreestuff
            Nov 5 at 21:36
















          Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
          – Andrewfreestuff
          Nov 5 at 8:15




          Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
          – Andrewfreestuff
          Nov 5 at 8:15












          No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
          – sacul
          Nov 5 at 15:54




          No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
          – sacul
          Nov 5 at 15:54












          Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
          – Andrewfreestuff
          Nov 5 at 19:21




          Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
          – Andrewfreestuff
          Nov 5 at 19:21












          There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
          – sacul
          Nov 5 at 20:07




          There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
          – sacul
          Nov 5 at 20:07












          I have edited my question to try and add the explanation
          – Andrewfreestuff
          Nov 5 at 21:36




          I have edited my question to try and add the explanation
          – Andrewfreestuff
          Nov 5 at 21:36


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147987%2floop-around-and-select-rows-by-a-subset-of-the-multi-index%23new-answer', 'question_page');
          }
          );

          Post as a guest




















































































          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini