Loop around and select rows by a subset of the multi-index

up vote
1
down vote

favorite

I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.

I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.

For example:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

A     B     A      12

A     B     C      13

A     C     A      14

Assume I am only interested in key1 + key2.

There are 3 unique combinations of key1 + key2:

(A A)

(A B)

(A C)

First time around the loop I would want to extract:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

Second time around the loop I would want to extract:

                  data1

key1  key2  key3

A     B     A      12

A     B     C      13

Third time around the loop I would want to extract:

                  data1

key1  key2  key3

A     C     A      14

How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.

Thanks

**EDIT IN RESPONSE TO A COMMENT BELOW **

In psuedo-code, I was originally thinking something along the lines of:

[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>



[2] for each group in groups



[3]       df_thisGroup = <extract the rows of data for this group from the main dataframe>



[4]      <process df_thisGroup, and save the results out into a new dataframe.  No need to alter the main dataframe>



[5]      <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it.  This might make processing later groups faster?>



[6] move to next group

My question would be how to do steps [1] & [2] & [3]

edited Nov 5 at 21:39

asked Nov 5 at 3:36

Andrewfreestuff

286

add a comment |

up vote
1
down vote

favorite

I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.

For example:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

A     B     A      12

A     B     C      13

A     C     A      14

Assume I am only interested in key1 + key2.

There are 3 unique combinations of key1 + key2:

(A A)

(A B)

(A C)

First time around the loop I would want to extract:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

Second time around the loop I would want to extract:

                  data1

key1  key2  key3

A     B     A      12

A     B     C      13

Third time around the loop I would want to extract:

                  data1

key1  key2  key3

A     C     A      14

How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.

Thanks

**EDIT IN RESPONSE TO A COMMENT BELOW **

In psuedo-code, I was originally thinking something along the lines of:

[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>



[2] for each group in groups



[3]       df_thisGroup = <extract the rows of data for this group from the main dataframe>



[4]      <process df_thisGroup, and save the results out into a new dataframe.  No need to alter the main dataframe>



[5]      <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it.  This might make processing later groups faster?>



[6] move to next group

My question would be how to do steps [1] & [2] & [3]

edited Nov 5 at 21:39

asked Nov 5 at 3:36

Andrewfreestuff

286

add a comment |

up vote
1
down vote

favorite

I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.

For example:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

A     B     A      12

A     B     C      13

A     C     A      14

Assume I am only interested in key1 + key2.

There are 3 unique combinations of key1 + key2:

(A A)

(A B)

(A C)

First time around the loop I would want to extract:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

Second time around the loop I would want to extract:

                  data1

key1  key2  key3

A     B     A      12

A     B     C      13

Third time around the loop I would want to extract:

                  data1

key1  key2  key3

A     C     A      14

How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.

Thanks

**EDIT IN RESPONSE TO A COMMENT BELOW **

In psuedo-code, I was originally thinking something along the lines of:

[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>



[2] for each group in groups



[3]       df_thisGroup = <extract the rows of data for this group from the main dataframe>



[4]      <process df_thisGroup, and save the results out into a new dataframe.  No need to alter the main dataframe>



[5]      <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it.  This might make processing later groups faster?>



[6] move to next group

My question would be how to do steps [1] & [2] & [3]

edited Nov 5 at 21:39

asked Nov 5 at 3:36

Andrewfreestuff

286

I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.

For example:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

A     B     A      12

A     B     C      13

A     C     A      14

Assume I am only interested in key1 + key2.

There are 3 unique combinations of key1 + key2:

(A A)

(A B)

(A C)

First time around the loop I would want to extract:

                  data1

key1  key2  key3

A     A     A      10

A     A     B      11

Second time around the loop I would want to extract:

                  data1

key1  key2  key3

A     B     A      12

A     B     C      13

Third time around the loop I would want to extract:

                  data1

key1  key2  key3

A     C     A      14

How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.

Thanks

**EDIT IN RESPONSE TO A COMMENT BELOW **

In psuedo-code, I was originally thinking something along the lines of:

[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>



[2] for each group in groups



[3]       df_thisGroup = <extract the rows of data for this group from the main dataframe>



[4]      <process df_thisGroup, and save the results out into a new dataframe.  No need to alter the main dataframe>



[5]      <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it.  This might make processing later groups faster?>



[6] move to next group

My question would be how to do steps [1] & [2] & [3]

python pandas dataframe

edited Nov 5 at 21:39

asked Nov 5 at 3:36

Andrewfreestuff

286

edited Nov 5 at 21:39

asked Nov 5 at 3:36

Andrewfreestuff

286

edited Nov 5 at 21:39

asked Nov 5 at 3:36

Andrewfreestuff

286

asked Nov 5 at 3:36

Andrewfreestuff

286

asked Nov 5 at 3:36

Andrewfreestuff

286

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby, with the level argument set to your keys of interest.

keys = ['key1','key2']



dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}

Here, you have grouped by key1 and key2, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0, df1, etc... You can see all of the dataframes you created using:

>>> dfs.keys()

dict_keys(['df0', 'df1', 'df2'])

And you can access them as you would any normal dictionary values:

>>> dfs['df0']

                data1

key1 key2 key3       

A    A    A        10

          B        11



>>> dfs['df1']

                data1

key1 key2 key3       

A    B    A        12

          C        13



....

answered Nov 5 at 3:43

sacul

26k41638

Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15

No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54

Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21

There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07

I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147987%2floop-around-and-select-rows-by-a-subset-of-the-multi-index%23new-answer', 'question_page');
}
);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

keys = ['key1','key2']



dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}

>>> dfs.keys()

dict_keys(['df0', 'df1', 'df2'])

And you can access them as you would any normal dictionary values:

>>> dfs['df0']

                data1

key1 key2 key3       

A    A    A        10

          B        11



>>> dfs['df1']

                data1

key1 key2 key3       

A    B    A        12

          C        13



....

answered Nov 5 at 3:43

sacul

26k41638

Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15

No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54

Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21

There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07

I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36

add a comment |

up vote
1
down vote

keys = ['key1','key2']



dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}

>>> dfs.keys()

dict_keys(['df0', 'df1', 'df2'])

And you can access them as you would any normal dictionary values:

>>> dfs['df0']

                data1

key1 key2 key3       

A    A    A        10

          B        11



>>> dfs['df1']

                data1

key1 key2 key3       

A    B    A        12

          C        13



....

answered Nov 5 at 3:43

sacul

26k41638

Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15

No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54

Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21

There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07

I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36

add a comment |

up vote
1
down vote

keys = ['key1','key2']



dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}

>>> dfs.keys()

dict_keys(['df0', 'df1', 'df2'])

And you can access them as you would any normal dictionary values:

>>> dfs['df0']

                data1

key1 key2 key3       

A    A    A        10

          B        11



>>> dfs['df1']

                data1

key1 key2 key3       

A    B    A        12

          C        13



....

answered Nov 5 at 3:43

sacul

26k41638

keys = ['key1','key2']



dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}

>>> dfs.keys()

dict_keys(['df0', 'df1', 'df2'])

And you can access them as you would any normal dictionary values:

>>> dfs['df0']

                data1

key1 key2 key3       

A    A    A        10

          B        11



>>> dfs['df1']

                data1

key1 key2 key3       

A    B    A        12

          C        13



....

answered Nov 5 at 3:43

sacul

26k41638

answered Nov 5 at 3:43

sacul

26k41638

answered Nov 5 at 3:43

sacul

26k41638

answered Nov 5 at 3:43

sacul

26k41638

Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15

No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54

Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21

There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07

I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36

add a comment |

Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15

No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54

Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21

There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07

I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36

Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15

No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54

Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21

There probably are ways to do this, also via groupby, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07

I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk