Loop around and select rows by a subset of the multi-index
up vote
1
down vote
favorite
I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.
I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.
For example:
data1
key1 key2 key3
A A A 10
A A B 11
A B A 12
A B C 13
A C A 14
Assume I am only interested in key1 + key2.
There are 3 unique combinations of key1 + key2:
(A A)
(A B)
(A C)
First time around the loop I would want to extract:
data1
key1 key2 key3
A A A 10
A A B 11
Second time around the loop I would want to extract:
data1
key1 key2 key3
A B A 12
A B C 13
Third time around the loop I would want to extract:
data1
key1 key2 key3
A C A 14
How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.
Thanks
**EDIT IN RESPONSE TO A COMMENT BELOW **
In psuedo-code, I was originally thinking something along the lines of:
[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>
[2] for each group in groups
[3] df_thisGroup = <extract the rows of data for this group from the main dataframe>
[4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>
[5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>
[6] move to next group
My question would be how to do steps [1] & [2] & [3]
python pandas dataframe
add a comment |
up vote
1
down vote
favorite
I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.
I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.
For example:
data1
key1 key2 key3
A A A 10
A A B 11
A B A 12
A B C 13
A C A 14
Assume I am only interested in key1 + key2.
There are 3 unique combinations of key1 + key2:
(A A)
(A B)
(A C)
First time around the loop I would want to extract:
data1
key1 key2 key3
A A A 10
A A B 11
Second time around the loop I would want to extract:
data1
key1 key2 key3
A B A 12
A B C 13
Third time around the loop I would want to extract:
data1
key1 key2 key3
A C A 14
How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.
Thanks
**EDIT IN RESPONSE TO A COMMENT BELOW **
In psuedo-code, I was originally thinking something along the lines of:
[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>
[2] for each group in groups
[3] df_thisGroup = <extract the rows of data for this group from the main dataframe>
[4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>
[5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>
[6] move to next group
My question would be how to do steps [1] & [2] & [3]
python pandas dataframe
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.
I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.
For example:
data1
key1 key2 key3
A A A 10
A A B 11
A B A 12
A B C 13
A C A 14
Assume I am only interested in key1 + key2.
There are 3 unique combinations of key1 + key2:
(A A)
(A B)
(A C)
First time around the loop I would want to extract:
data1
key1 key2 key3
A A A 10
A A B 11
Second time around the loop I would want to extract:
data1
key1 key2 key3
A B A 12
A B C 13
Third time around the loop I would want to extract:
data1
key1 key2 key3
A C A 14
How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.
Thanks
**EDIT IN RESPONSE TO A COMMENT BELOW **
In psuedo-code, I was originally thinking something along the lines of:
[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>
[2] for each group in groups
[3] df_thisGroup = <extract the rows of data for this group from the main dataframe>
[4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>
[5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>
[6] move to next group
My question would be how to do steps [1] & [2] & [3]
python pandas dataframe
I have a data frame with multiple indexes and I want to loop around this data frame pulling out groups of rows for processing.
I want to loop over all of the combinations in the multi-index but for a subset of the index, not all of them. I have no knowledge before hand what the key/index values will be, but I do know how many there are.
For example:
data1
key1 key2 key3
A A A 10
A A B 11
A B A 12
A B C 13
A C A 14
Assume I am only interested in key1 + key2.
There are 3 unique combinations of key1 + key2:
(A A)
(A B)
(A C)
First time around the loop I would want to extract:
data1
key1 key2 key3
A A A 10
A A B 11
Second time around the loop I would want to extract:
data1
key1 key2 key3
A B A 12
A B C 13
Third time around the loop I would want to extract:
data1
key1 key2 key3
A C A 14
How do I do this?
I am a COMPLETE newbie at python so the more explanation the better.
Thanks
**EDIT IN RESPONSE TO A COMMENT BELOW **
In psuedo-code, I was originally thinking something along the lines of:
[1] groups = <get the set/list of unique key1+key2 groups in the main dataframe>
[2] for each group in groups
[3] df_thisGroup = <extract the rows of data for this group from the main dataframe>
[4] <process df_thisGroup, and save the results out into a new dataframe. No need to alter the main dataframe>
[5] <optional: remove this group from the main dataframe as we no longer need it, we have finished processing it. This might make processing later groups faster?>
[6] move to next group
My question would be how to do steps [1] & [2] & [3]
python pandas dataframe
python pandas dataframe
edited Nov 5 at 21:39
asked Nov 5 at 3:36
Andrewfreestuff
286
286
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby
, with the level
argument set to your keys of interest.
keys = ['key1','key2']
dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}
Here, you have grouped by key1
and key2
, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0
, df1
, etc... You can see all of the dataframes you created using:
>>> dfs.keys()
dict_keys(['df0', 'df1', 'df2'])
And you can access them as you would any normal dictionary values:
>>> dfs['df0']
data1
key1 key2 key3
A A A 10
B 11
>>> dfs['df1']
data1
key1 key2 key3
A B A 12
C 13
....
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
There probably are ways to do this, also viagroupby
, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby
, with the level
argument set to your keys of interest.
keys = ['key1','key2']
dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}
Here, you have grouped by key1
and key2
, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0
, df1
, etc... You can see all of the dataframes you created using:
>>> dfs.keys()
dict_keys(['df0', 'df1', 'df2'])
And you can access them as you would any normal dictionary values:
>>> dfs['df0']
data1
key1 key2 key3
A A A 10
B 11
>>> dfs['df1']
data1
key1 key2 key3
A B A 12
C 13
....
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
There probably are ways to do this, also viagroupby
, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
add a comment |
up vote
1
down vote
You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby
, with the level
argument set to your keys of interest.
keys = ['key1','key2']
dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}
Here, you have grouped by key1
and key2
, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0
, df1
, etc... You can see all of the dataframes you created using:
>>> dfs.keys()
dict_keys(['df0', 'df1', 'df2'])
And you can access them as you would any normal dictionary values:
>>> dfs['df0']
data1
key1 key2 key3
A A A 10
B 11
>>> dfs['df1']
data1
key1 key2 key3
A B A 12
C 13
....
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
There probably are ways to do this, also viagroupby
, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
add a comment |
up vote
1
down vote
up vote
1
down vote
You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby
, with the level
argument set to your keys of interest.
keys = ['key1','key2']
dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}
Here, you have grouped by key1
and key2
, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0
, df1
, etc... You can see all of the dataframes you created using:
>>> dfs.keys()
dict_keys(['df0', 'df1', 'df2'])
And you can access them as you would any normal dictionary values:
>>> dfs['df0']
data1
key1 key2 key3
A A A 10
B 11
>>> dfs['df1']
data1
key1 key2 key3
A B A 12
C 13
....
You need to think about how you are going to store your dataframes. I would recommend a dictionary. In order to populate your dictionary, you can use groupby
, with the level
argument set to your keys of interest.
keys = ['key1','key2']
dfs = {f'df{i}': data for i, (g,data) in enumerate(df.groupby(level=keys))}
Here, you have grouped by key1
and key2
, and then, you are creating a dictionary that holds a dataframe for each combination of those keys. They will be labeled df0
, df1
, etc... You can see all of the dataframes you created using:
>>> dfs.keys()
dict_keys(['df0', 'df1', 'df2'])
And you can access them as you would any normal dictionary values:
>>> dfs['df0']
data1
key1 key2 key3
A A A 10
B 11
>>> dfs['df1']
data1
key1 key2 key3
A B A 12
C 13
....
answered Nov 5 at 3:43
sacul
26k41638
26k41638
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
There probably are ways to do this, also viagroupby
, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
add a comment |
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
There probably are ways to do this, also viagroupby
, but you'll have to explain how you want to process the groups...
– sacul
Nov 5 at 20:07
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
Thanks sacul. My dataset is very large, the dataframe is about 2 gigs. Does making the dictionary make new copies of each group of records, or does it just point to the existing dataframe objects and therefore not increase the total size by much?
– Andrewfreestuff
Nov 5 at 8:15
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
No, it would create copies, but since you said I want to loop around this data frame pulling out groups of rows for processing, I figured that's what you wanted. What were you envisioning?
– sacul
Nov 5 at 15:54
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
Taking copies may not end up being unworkable, I was just checking because if it was just pointing to the original data it would have been perfect. What I was thinking was along the lines of looping around one "group" at a time and extracting just that group's data, processing it, then looping around to the next group and extract just that data and so on. This way I only ever have 1 groups extra copy of data at a time, rather than a full copy. Is that possible? As noted before, im new to python so all help is greatly appreciated.
– Andrewfreestuff
Nov 5 at 19:21
There probably are ways to do this, also via
groupby
, but you'll have to explain how you want to process the groups...– sacul
Nov 5 at 20:07
There probably are ways to do this, also via
groupby
, but you'll have to explain how you want to process the groups...– sacul
Nov 5 at 20:07
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
I have edited my question to try and add the explanation
– Andrewfreestuff
Nov 5 at 21:36
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53147987%2floop-around-and-select-rows-by-a-subset-of-the-multi-index%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password