Split multiple pandas dataframes according to thresholds and produce a count of binary classes between...
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:
date ppt fld
01/01/2016 0.23 0
02/01/2016 1.6 0
03/01/2016 10.5 1
04/01/2016 25.4 1
05/01/2016 0.3 0
06/01/2016 6.5 1
07/01/2016 11.2 1
08/01/2016 5.5 0
...
I have applied the following code to split a single dataframe using a mask:
mask5 = df['ppt3'] >= 5
ppt5 = df[~mask5] #Under 5mm
ppt5p = df[mask5] #Over 5mm
mask10 = ppt5p['ppt3'] >= 10
ppt10 = ppt5p[~mask10] #5-10mm
ppt10p = ppt5p[mask10] #Over 10mm
mask20 = ppt10p['ppt3'] >= 20
ppt20 = ppt10p[~mask20] #10-20mm
ppt20p = ppt10p[mask20] #Over 20mm
And then used the following to produce counts of each interval:
print(ppt5['fld'].value_counts()) #Under 5mm
print(ppt10['fld'].value_counts()) #5-10mm
print(ppt20['fld'].value_counts()) #10-20mm
print(ppt20p['fld'].value_counts()) #Over 20mm
Which produces the following:
0.0 3
1.0 0
Name: SzT, dtype: int64
0.0 1
1.0 1
Name: SzT, dtype: int64
0.0 0
1.0 2
Name: SzT, dtype: int64
0.0 0
1.0 1
Name: SzT, dtype: int64
So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.
But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?
Thanks so much
python pandas
add a comment |
I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:
date ppt fld
01/01/2016 0.23 0
02/01/2016 1.6 0
03/01/2016 10.5 1
04/01/2016 25.4 1
05/01/2016 0.3 0
06/01/2016 6.5 1
07/01/2016 11.2 1
08/01/2016 5.5 0
...
I have applied the following code to split a single dataframe using a mask:
mask5 = df['ppt3'] >= 5
ppt5 = df[~mask5] #Under 5mm
ppt5p = df[mask5] #Over 5mm
mask10 = ppt5p['ppt3'] >= 10
ppt10 = ppt5p[~mask10] #5-10mm
ppt10p = ppt5p[mask10] #Over 10mm
mask20 = ppt10p['ppt3'] >= 20
ppt20 = ppt10p[~mask20] #10-20mm
ppt20p = ppt10p[mask20] #Over 20mm
And then used the following to produce counts of each interval:
print(ppt5['fld'].value_counts()) #Under 5mm
print(ppt10['fld'].value_counts()) #5-10mm
print(ppt20['fld'].value_counts()) #10-20mm
print(ppt20p['fld'].value_counts()) #Over 20mm
Which produces the following:
0.0 3
1.0 0
Name: SzT, dtype: int64
0.0 1
1.0 1
Name: SzT, dtype: int64
0.0 0
1.0 2
Name: SzT, dtype: int64
0.0 0
1.0 1
Name: SzT, dtype: int64
So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.
But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?
Thanks so much
python pandas
add a comment |
I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:
date ppt fld
01/01/2016 0.23 0
02/01/2016 1.6 0
03/01/2016 10.5 1
04/01/2016 25.4 1
05/01/2016 0.3 0
06/01/2016 6.5 1
07/01/2016 11.2 1
08/01/2016 5.5 0
...
I have applied the following code to split a single dataframe using a mask:
mask5 = df['ppt3'] >= 5
ppt5 = df[~mask5] #Under 5mm
ppt5p = df[mask5] #Over 5mm
mask10 = ppt5p['ppt3'] >= 10
ppt10 = ppt5p[~mask10] #5-10mm
ppt10p = ppt5p[mask10] #Over 10mm
mask20 = ppt10p['ppt3'] >= 20
ppt20 = ppt10p[~mask20] #10-20mm
ppt20p = ppt10p[mask20] #Over 20mm
And then used the following to produce counts of each interval:
print(ppt5['fld'].value_counts()) #Under 5mm
print(ppt10['fld'].value_counts()) #5-10mm
print(ppt20['fld'].value_counts()) #10-20mm
print(ppt20p['fld'].value_counts()) #Over 20mm
Which produces the following:
0.0 3
1.0 0
Name: SzT, dtype: int64
0.0 1
1.0 1
Name: SzT, dtype: int64
0.0 0
1.0 2
Name: SzT, dtype: int64
0.0 0
1.0 1
Name: SzT, dtype: int64
So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.
But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?
Thanks so much
python pandas
I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:
date ppt fld
01/01/2016 0.23 0
02/01/2016 1.6 0
03/01/2016 10.5 1
04/01/2016 25.4 1
05/01/2016 0.3 0
06/01/2016 6.5 1
07/01/2016 11.2 1
08/01/2016 5.5 0
...
I have applied the following code to split a single dataframe using a mask:
mask5 = df['ppt3'] >= 5
ppt5 = df[~mask5] #Under 5mm
ppt5p = df[mask5] #Over 5mm
mask10 = ppt5p['ppt3'] >= 10
ppt10 = ppt5p[~mask10] #5-10mm
ppt10p = ppt5p[mask10] #Over 10mm
mask20 = ppt10p['ppt3'] >= 20
ppt20 = ppt10p[~mask20] #10-20mm
ppt20p = ppt10p[mask20] #Over 20mm
And then used the following to produce counts of each interval:
print(ppt5['fld'].value_counts()) #Under 5mm
print(ppt10['fld'].value_counts()) #5-10mm
print(ppt20['fld'].value_counts()) #10-20mm
print(ppt20p['fld'].value_counts()) #Over 20mm
Which produces the following:
0.0 3
1.0 0
Name: SzT, dtype: int64
0.0 1
1.0 1
Name: SzT, dtype: int64
0.0 0
1.0 2
Name: SzT, dtype: int64
0.0 0
1.0 1
Name: SzT, dtype: int64
So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.
But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?
Thanks so much
python pandas
python pandas
asked Nov 23 '18 at 15:33
SHV_laSHV_la
718
718
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449398%2fsplit-multiple-pandas-dataframes-according-to-thresholds-and-produce-a-count-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449398%2fsplit-multiple-pandas-dataframes-according-to-thresholds-and-produce-a-count-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown