Split multiple pandas dataframes according to thresholds and produce a count of binary classes between...

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a series of dataframes all containing a year's worth of continuous data for rainfall and binary data indicating whether or not a flood occurred. I wish to produce a count of the number of days on which a flood occurs or does not occur on days within given intervals/above thresholds of rainfall. My data looks a bit like this:

    date    ppt    fld

01/01/2016  0.23   0

02/01/2016  1.6    0

03/01/2016  10.5   1

04/01/2016  25.4   1

05/01/2016  0.3    0

06/01/2016  6.5    1

07/01/2016  11.2   1

08/01/2016  5.5    0

...

I have applied the following code to split a single dataframe using a mask:

mask5 = df['ppt3'] >= 5

ppt5 = df[~mask5] #Under 5mm

ppt5p = df[mask5] #Over 5mm



mask10 = ppt5p['ppt3'] >= 10

ppt10 = ppt5p[~mask10] #5-10mm

ppt10p = ppt5p[mask10] #Over 10mm



mask20 = ppt10p['ppt3'] >= 20

ppt20 = ppt10p[~mask20] #10-20mm

ppt20p = ppt10p[mask20] #Over 20mm

And then used the following to produce counts of each interval:

print(ppt5['fld'].value_counts()) #Under 5mm

print(ppt10['fld'].value_counts()) #5-10mm

print(ppt20['fld'].value_counts()) #10-20mm

print(ppt20p['fld'].value_counts()) #Over 20mm

Which produces the following:

0.0     3

1.0     0

Name: SzT, dtype: int64

0.0     1

1.0     1

Name: SzT, dtype: int64

0.0     0 

1.0     2

Name: SzT, dtype: int64

0.0     0

1.0     1

Name: SzT, dtype: int64

So what this tells me is that on all the days with less than 5mm no floods occurred; on the days with between 5 and 10mm there was one day with a flood and one with no flood; on both days with between 10 and 20mm a flood occurred, and on the day with over 20mm a flood occurred. Great stuff.

But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?

Thanks so much

asked Nov 23 '18 at 15:33

SHV_la

718

add a comment |

    date    ppt    fld

01/01/2016  0.23   0

02/01/2016  1.6    0

03/01/2016  10.5   1

04/01/2016  25.4   1

05/01/2016  0.3    0

06/01/2016  6.5    1

07/01/2016  11.2   1

08/01/2016  5.5    0

...

I have applied the following code to split a single dataframe using a mask:

mask5 = df['ppt3'] >= 5

ppt5 = df[~mask5] #Under 5mm

ppt5p = df[mask5] #Over 5mm



mask10 = ppt5p['ppt3'] >= 10

ppt10 = ppt5p[~mask10] #5-10mm

ppt10p = ppt5p[mask10] #Over 10mm



mask20 = ppt10p['ppt3'] >= 20

ppt20 = ppt10p[~mask20] #10-20mm

ppt20p = ppt10p[mask20] #Over 20mm

And then used the following to produce counts of each interval:

print(ppt5['fld'].value_counts()) #Under 5mm

print(ppt10['fld'].value_counts()) #5-10mm

print(ppt20['fld'].value_counts()) #10-20mm

print(ppt20p['fld'].value_counts()) #Over 20mm

Which produces the following:

0.0     3

1.0     0

Name: SzT, dtype: int64

0.0     1

1.0     1

Name: SzT, dtype: int64

0.0     0 

1.0     2

Name: SzT, dtype: int64

0.0     0

1.0     1

Name: SzT, dtype: int64

But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?

Thanks so much

asked Nov 23 '18 at 15:33

SHV_la

718

add a comment |

    date    ppt    fld

01/01/2016  0.23   0

02/01/2016  1.6    0

03/01/2016  10.5   1

04/01/2016  25.4   1

05/01/2016  0.3    0

06/01/2016  6.5    1

07/01/2016  11.2   1

08/01/2016  5.5    0

...

I have applied the following code to split a single dataframe using a mask:

mask5 = df['ppt3'] >= 5

ppt5 = df[~mask5] #Under 5mm

ppt5p = df[mask5] #Over 5mm



mask10 = ppt5p['ppt3'] >= 10

ppt10 = ppt5p[~mask10] #5-10mm

ppt10p = ppt5p[mask10] #Over 10mm



mask20 = ppt10p['ppt3'] >= 20

ppt20 = ppt10p[~mask20] #10-20mm

ppt20p = ppt10p[mask20] #Over 20mm

And then used the following to produce counts of each interval:

print(ppt5['fld'].value_counts()) #Under 5mm

print(ppt10['fld'].value_counts()) #5-10mm

print(ppt20['fld'].value_counts()) #10-20mm

print(ppt20p['fld'].value_counts()) #Over 20mm

Which produces the following:

0.0     3

1.0     0

Name: SzT, dtype: int64

0.0     1

1.0     1

Name: SzT, dtype: int64

0.0     0 

1.0     2

Name: SzT, dtype: int64

0.0     0

1.0     1

Name: SzT, dtype: int64

But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?

Thanks so much

asked Nov 23 '18 at 15:33

SHV_la

718

    date    ppt    fld

01/01/2016  0.23   0

02/01/2016  1.6    0

03/01/2016  10.5   1

04/01/2016  25.4   1

05/01/2016  0.3    0

06/01/2016  6.5    1

07/01/2016  11.2   1

08/01/2016  5.5    0

...

I have applied the following code to split a single dataframe using a mask:

mask5 = df['ppt3'] >= 5

ppt5 = df[~mask5] #Under 5mm

ppt5p = df[mask5] #Over 5mm



mask10 = ppt5p['ppt3'] >= 10

ppt10 = ppt5p[~mask10] #5-10mm

ppt10p = ppt5p[mask10] #Over 10mm



mask20 = ppt10p['ppt3'] >= 20

ppt20 = ppt10p[~mask20] #10-20mm

ppt20p = ppt10p[mask20] #Over 20mm

And then used the following to produce counts of each interval:

print(ppt5['fld'].value_counts()) #Under 5mm

print(ppt10['fld'].value_counts()) #5-10mm

print(ppt20['fld'].value_counts()) #10-20mm

print(ppt20p['fld'].value_counts()) #Over 20mm

Which produces the following:

0.0     3

1.0     0

Name: SzT, dtype: int64

0.0     1

1.0     1

Name: SzT, dtype: int64

0.0     0 

1.0     2

Name: SzT, dtype: int64

0.0     0

1.0     1

Name: SzT, dtype: int64

But I have 20 dataframes to do this for, are there any ideas out there of how I might speed this process up/doing this more efficiently?

Thanks so much

python pandas

asked Nov 23 '18 at 15:33

SHV_la

718

asked Nov 23 '18 at 15:33

SHV_la

718

asked Nov 23 '18 at 15:33

SHV_la

718

asked Nov 23 '18 at 15:33

SHV_la

718

asked Nov 23 '18 at 15:33

SHV_la

718

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449398%2fsplit-multiple-pandas-dataframes-according-to-thresholds-and-produce-a-count-of%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk