Crosstab on multiple columns

up vote
1
down vote

favorite

I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.

In code, I am starting with df and looking for expected.

import pandas as pd



df = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left"},

        {"name": "Alice", "day": "friday", "location": "right"},

        {"name": "Bob", "day": "monday", "location": "left"},

    ]

)



print(df)







expected = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},

        {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},

        {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},

    ]

).set_index(['name', 'day', ])

print(expected)

Printed:

In [13]: df                                                                                                                                                                                  

Out[13]: 

      day location   name

0  friday     left  Alice

1  friday    right  Alice

2  monday     left    Bob









In [12]: expected                                                                                                                                                                            

Out[12]: 

             location  row_percent

name  day                         

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

edited Nov 5 at 4:06

user3483203

28.3k72351

asked Nov 5 at 3:49

Hatshepsut

1,24111023

add a comment |

up vote
1
down vote

favorite

I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.

In code, I am starting with df and looking for expected.

import pandas as pd



df = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left"},

        {"name": "Alice", "day": "friday", "location": "right"},

        {"name": "Bob", "day": "monday", "location": "left"},

    ]

)



print(df)







expected = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},

        {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},

        {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},

    ]

).set_index(['name', 'day', ])

print(expected)

Printed:

In [13]: df                                                                                                                                                                                  

Out[13]: 

      day location   name

0  friday     left  Alice

1  friday    right  Alice

2  monday     left    Bob









In [12]: expected                                                                                                                                                                            

Out[12]: 

             location  row_percent

name  day                         

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

edited Nov 5 at 4:06

user3483203

28.3k72351

asked Nov 5 at 3:49

Hatshepsut

1,24111023

add a comment |

up vote
1
down vote

favorite

I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.

In code, I am starting with df and looking for expected.

import pandas as pd



df = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left"},

        {"name": "Alice", "day": "friday", "location": "right"},

        {"name": "Bob", "day": "monday", "location": "left"},

    ]

)



print(df)







expected = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},

        {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},

        {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},

    ]

).set_index(['name', 'day', ])

print(expected)

Printed:

In [13]: df                                                                                                                                                                                  

Out[13]: 

      day location   name

0  friday     left  Alice

1  friday    right  Alice

2  monday     left    Bob









In [12]: expected                                                                                                                                                                            

Out[12]: 

             location  row_percent

name  day                         

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

edited Nov 5 at 4:06

user3483203

28.3k72351

asked Nov 5 at 3:49

Hatshepsut

1,24111023

I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.

In code, I am starting with df and looking for expected.

import pandas as pd



df = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left"},

        {"name": "Alice", "day": "friday", "location": "right"},

        {"name": "Bob", "day": "monday", "location": "left"},

    ]

)



print(df)







expected = pd.DataFrame(

    [

        {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},

        {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},

        {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},

    ]

).set_index(['name', 'day', ])

print(expected)

Printed:

In [13]: df                                                                                                                                                                                  

Out[13]: 

      day location   name

0  friday     left  Alice

1  friday    right  Alice

2  monday     left    Bob









In [12]: expected                                                                                                                                                                            

Out[12]: 

             location  row_percent

name  day                         

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

python pandas

edited Nov 5 at 4:06

user3483203

28.3k72351

asked Nov 5 at 3:49

Hatshepsut

1,24111023

edited Nov 5 at 4:06

user3483203

28.3k72351

asked Nov 5 at 3:49

Hatshepsut

1,24111023

edited Nov 5 at 4:06

user3483203

28.3k72351

edited Nov 5 at 4:06

user3483203

28.3k72351

edited Nov 5 at 4:06

user3483203

28.3k72351

asked Nov 5 at 3:49

Hatshepsut

1,24111023

asked Nov 5 at 3:49

Hatshepsut

1,24111023

asked Nov 5 at 3:49

Hatshepsut

1,24111023

add a comment |

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

Using groupby and value_counts:

df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

name   day     location

Alice  friday  left         50.0

               right        50.0

Bob    monday  left        100.0

Name: location, dtype: float64

With a bit more cleaning for your desired output:

out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

          .rename('row_percent').reset_index(2))

             location  row_percent

name  day

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

out == expected

              location  row_percent

name  day

Alice friday      True         True

      friday      True         True

Bob   monday      True         True

answered Nov 5 at 3:52

user3483203

28.3k72351

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53148069%2fcrosstab-on-multiple-columns%23new-answer', 'question_page');
}
);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

Using groupby and value_counts:

df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

name   day     location

Alice  friday  left         50.0

               right        50.0

Bob    monday  left        100.0

Name: location, dtype: float64

With a bit more cleaning for your desired output:

out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

          .rename('row_percent').reset_index(2))

             location  row_percent

name  day

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

out == expected

              location  row_percent

name  day

Alice friday      True         True

      friday      True         True

Bob   monday      True         True

answered Nov 5 at 3:52

user3483203

28.3k72351

add a comment |

up vote
3
down vote

accepted

Using groupby and value_counts:

df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

name   day     location

Alice  friday  left         50.0

               right        50.0

Bob    monday  left        100.0

Name: location, dtype: float64

With a bit more cleaning for your desired output:

out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

          .rename('row_percent').reset_index(2))

             location  row_percent

name  day

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

out == expected

              location  row_percent

name  day

Alice friday      True         True

      friday      True         True

Bob   monday      True         True

answered Nov 5 at 3:52

user3483203

28.3k72351

add a comment |

up vote
3
down vote

accepted

Using groupby and value_counts:

df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

name   day     location

Alice  friday  left         50.0

               right        50.0

Bob    monday  left        100.0

Name: location, dtype: float64

With a bit more cleaning for your desired output:

out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

          .rename('row_percent').reset_index(2))

             location  row_percent

name  day

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

out == expected

              location  row_percent

name  day

Alice friday      True         True

      friday      True         True

Bob   monday      True         True

answered Nov 5 at 3:52

user3483203

28.3k72351

Using groupby and value_counts:

df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

name   day     location

Alice  friday  left         50.0

               right        50.0

Bob    monday  left        100.0

Name: location, dtype: float64

With a bit more cleaning for your desired output:

out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)

          .rename('row_percent').reset_index(2))

             location  row_percent

name  day

Alice friday     left         50.0

      friday    right         50.0

Bob   monday     left        100.0

out == expected

              location  row_percent

name  day

Alice friday      True         True

      friday      True         True

Bob   monday      True         True

answered Nov 5 at 3:52

user3483203

28.3k72351

answered Nov 5 at 3:52

user3483203

28.3k72351

answered Nov 5 at 3:52

user3483203

28.3k72351

answered Nov 5 at 3:52

user3483203

28.3k72351

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk