Crosstab on multiple columns











up vote
1
down vote

favorite












I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.



In code, I am starting with df and looking for expected.



import pandas as pd

df = pd.DataFrame(
[
{"name": "Alice", "day": "friday", "location": "left"},
{"name": "Alice", "day": "friday", "location": "right"},
{"name": "Bob", "day": "monday", "location": "left"},
]
)

print(df)



expected = pd.DataFrame(
[
{"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},
{"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},
{"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},
]
).set_index(['name', 'day', ])
print(expected)


Printed:



In [13]: df                                                                                                                                                                                  
Out[13]:
day location name
0 friday left Alice
1 friday right Alice
2 monday left Bob




In [12]: expected
Out[12]:
location row_percent
name day
Alice friday left 50.0
friday right 50.0
Bob monday left 100.0









share|improve this question




























    up vote
    1
    down vote

    favorite












    I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.



    In code, I am starting with df and looking for expected.



    import pandas as pd

    df = pd.DataFrame(
    [
    {"name": "Alice", "day": "friday", "location": "left"},
    {"name": "Alice", "day": "friday", "location": "right"},
    {"name": "Bob", "day": "monday", "location": "left"},
    ]
    )

    print(df)



    expected = pd.DataFrame(
    [
    {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},
    {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},
    {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},
    ]
    ).set_index(['name', 'day', ])
    print(expected)


    Printed:



    In [13]: df                                                                                                                                                                                  
    Out[13]:
    day location name
    0 friday left Alice
    1 friday right Alice
    2 monday left Bob




    In [12]: expected
    Out[12]:
    location row_percent
    name day
    Alice friday left 50.0
    friday right 50.0
    Bob monday left 100.0









    share|improve this question


























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.



      In code, I am starting with df and looking for expected.



      import pandas as pd

      df = pd.DataFrame(
      [
      {"name": "Alice", "day": "friday", "location": "left"},
      {"name": "Alice", "day": "friday", "location": "right"},
      {"name": "Bob", "day": "monday", "location": "left"},
      ]
      )

      print(df)



      expected = pd.DataFrame(
      [
      {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},
      {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},
      {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},
      ]
      ).set_index(['name', 'day', ])
      print(expected)


      Printed:



      In [13]: df                                                                                                                                                                                  
      Out[13]:
      day location name
      0 friday left Alice
      1 friday right Alice
      2 monday left Bob




      In [12]: expected
      Out[12]:
      location row_percent
      name day
      Alice friday left 50.0
      friday right 50.0
      Bob monday left 100.0









      share|improve this question















      I have a dataframe with a name, day, and location. For each name-day-location triple, I want to know what proportion of the rows with that name-day have that location.



      In code, I am starting with df and looking for expected.



      import pandas as pd

      df = pd.DataFrame(
      [
      {"name": "Alice", "day": "friday", "location": "left"},
      {"name": "Alice", "day": "friday", "location": "right"},
      {"name": "Bob", "day": "monday", "location": "left"},
      ]
      )

      print(df)



      expected = pd.DataFrame(
      [
      {"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},
      {"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},
      {"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},
      ]
      ).set_index(['name', 'day', ])
      print(expected)


      Printed:



      In [13]: df                                                                                                                                                                                  
      Out[13]:
      day location name
      0 friday left Alice
      1 friday right Alice
      2 monday left Bob




      In [12]: expected
      Out[12]:
      location row_percent
      name day
      Alice friday left 50.0
      friday right 50.0
      Bob monday left 100.0






      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 5 at 4:06









      user3483203

      28.3k72351




      28.3k72351










      asked Nov 5 at 3:49









      Hatshepsut

      1,24111023




      1,24111023
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Using groupby and value_counts:



          df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)




          name   day     location
          Alice friday left 50.0
          right 50.0
          Bob monday left 100.0
          Name: location, dtype: float64




          With a bit more cleaning for your desired output:



          out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
          .rename('row_percent').reset_index(2))




                       location  row_percent
          name day
          Alice friday left 50.0
          friday right 50.0
          Bob monday left 100.0




          out == expected




                        location  row_percent
          name day
          Alice friday True True
          friday True True
          Bob monday True True





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53148069%2fcrosstab-on-multiple-columns%23new-answer', 'question_page');
            }
            );

            Post as a guest
































            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            3
            down vote



            accepted










            Using groupby and value_counts:



            df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)




            name   day     location
            Alice friday left 50.0
            right 50.0
            Bob monday left 100.0
            Name: location, dtype: float64




            With a bit more cleaning for your desired output:



            out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
            .rename('row_percent').reset_index(2))




                         location  row_percent
            name day
            Alice friday left 50.0
            friday right 50.0
            Bob monday left 100.0




            out == expected




                          location  row_percent
            name day
            Alice friday True True
            friday True True
            Bob monday True True





            share|improve this answer

























              up vote
              3
              down vote



              accepted










              Using groupby and value_counts:



              df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)




              name   day     location
              Alice friday left 50.0
              right 50.0
              Bob monday left 100.0
              Name: location, dtype: float64




              With a bit more cleaning for your desired output:



              out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
              .rename('row_percent').reset_index(2))




                           location  row_percent
              name day
              Alice friday left 50.0
              friday right 50.0
              Bob monday left 100.0




              out == expected




                            location  row_percent
              name day
              Alice friday True True
              friday True True
              Bob monday True True





              share|improve this answer























                up vote
                3
                down vote



                accepted







                up vote
                3
                down vote



                accepted






                Using groupby and value_counts:



                df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)




                name   day     location
                Alice friday left 50.0
                right 50.0
                Bob monday left 100.0
                Name: location, dtype: float64




                With a bit more cleaning for your desired output:



                out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
                .rename('row_percent').reset_index(2))




                             location  row_percent
                name day
                Alice friday left 50.0
                friday right 50.0
                Bob monday left 100.0




                out == expected




                              location  row_percent
                name day
                Alice friday True True
                friday True True
                Bob monday True True





                share|improve this answer












                Using groupby and value_counts:



                df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)




                name   day     location
                Alice friday left 50.0
                right 50.0
                Bob monday left 100.0
                Name: location, dtype: float64




                With a bit more cleaning for your desired output:



                out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
                .rename('row_percent').reset_index(2))




                             location  row_percent
                name day
                Alice friday left 50.0
                friday right 50.0
                Bob monday left 100.0




                out == expected




                              location  row_percent
                name day
                Alice friday True True
                friday True True
                Bob monday True True






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 5 at 3:52









                user3483203

                28.3k72351




                28.3k72351






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53148069%2fcrosstab-on-multiple-columns%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest




















































































                    這個網誌中的熱門文章

                    Tangent Lines Diagram Along Smooth Curve

                    Yusuf al-Mu'taman ibn Hud

                    Zucchini