concatenate the strings of multiple columns in multiple rows pandas?











up vote
0
down vote

favorite












I have two date frames as below:



import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})

df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
})


Now I have merged the two frames as below:



df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')


I want to group by



result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()


But I want the out put to look like this:



output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
'testName': ['4402,4402,3602','3747','8754'],
'moduleName': ['singing,singing,singing', 'dance','vocals'],
'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
'Work':['AP,OD,OD','TC','PU']})


How do I achieve this? I basically need to figure out how to .apply(','.join) for multiple columns together?










share|improve this question
























  • According to the output you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with 'aaaa', can you explain?
    – Ben.T
    Nov 8 at 20:57












  • Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly.
    – sayo
    Nov 8 at 22:04















up vote
0
down vote

favorite












I have two date frames as below:



import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})

df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
})


Now I have merged the two frames as below:



df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')


I want to group by



result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()


But I want the out put to look like this:



output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
'testName': ['4402,4402,3602','3747','8754'],
'moduleName': ['singing,singing,singing', 'dance','vocals'],
'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
'Work':['AP,OD,OD','TC','PU']})


How do I achieve this? I basically need to figure out how to .apply(','.join) for multiple columns together?










share|improve this question
























  • According to the output you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with 'aaaa', can you explain?
    – Ben.T
    Nov 8 at 20:57












  • Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly.
    – sayo
    Nov 8 at 22:04













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have two date frames as below:



import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})

df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
})


Now I have merged the two frames as below:



df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')


I want to group by



result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()


But I want the out put to look like this:



output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
'testName': ['4402,4402,3602','3747','8754'],
'moduleName': ['singing,singing,singing', 'dance','vocals'],
'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
'Work':['AP,OD,OD','TC','PU']})


How do I achieve this? I basically need to figure out how to .apply(','.join) for multiple columns together?










share|improve this question















I have two date frames as below:



import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})

df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
})


Now I have merged the two frames as below:



df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')


I want to group by



result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()


But I want the out put to look like this:



output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
'testName': ['4402,4402,3602','3747','8754'],
'moduleName': ['singing,singing,singing', 'dance','vocals'],
'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
'Work':['AP,OD,OD','TC','PU']})


How do I achieve this? I basically need to figure out how to .apply(','.join) for multiple columns together?







pandas pandas-groupby pandas-apply






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 22:06

























asked Nov 8 at 20:28









sayo

297




297












  • According to the output you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with 'aaaa', can you explain?
    – Ben.T
    Nov 8 at 20:57












  • Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly.
    – sayo
    Nov 8 at 22:04


















  • According to the output you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with 'aaaa', can you explain?
    – Ben.T
    Nov 8 at 20:57












  • Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly.
    – sayo
    Nov 8 at 22:04
















According to the output you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with 'aaaa', can you explain?
– Ben.T
Nov 8 at 20:57






According to the output you define, it seems that 'SerialNo' is not the only column you want to groupby, otherwise you would not have 2 rows with 'aaaa', can you explain?
– Ben.T
Nov 8 at 20:57














Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly.
– sayo
Nov 8 at 22:04




Yes @Ben.T you are correct the 'aaaa' should all be one row I have changed my question accordingly.
– sayo
Nov 8 at 22:04












1 Answer
1






active

oldest

votes

















up vote
0
down vote













You can use either:



result.groupby('serialNo').agg(list) #To get a list of values


Output:



                                 Name            testName  
serialNo
aaaa [Sayonti, Sayonti, Sayonti] [4402, 4402, 3602]
bbbb [Ruchi] [3747]
ffff [Gowtam] [8754]

moduleName endResult
serialNo
aaaa [singing, singing, singing] [WARNING, WARNING, WARNING]
bbbb [dance] [FAILED]
ffff [vocals] [FAILED]

Food Work
serialNo
aaaa [Strawberry, Pepsi, Pepsi] [AP, OD, OD]
bbbb [Coke] [TC]
ffff [Nuts] [PU]


Or



result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings


Output:



                               Name          testName  
serialNo
aaaa Sayonti, Sayonti, Sayonti 4402, 4402, 3602
bbbb Ruchi 3747
ffff Gowtam 8754

moduleName endResult
serialNo
aaaa singing, singing, singing WARNING, WARNING, WARNING
bbbb dance FAILED
ffff vocals FAILED

Food Work
serialNo
aaaa Strawberry, Pepsi, Pepsi AP, OD, OD
bbbb Coke TC
ffff Nuts PU





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53215650%2fconcatenate-the-strings-of-multiple-columns-in-multiple-rows-pandas%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    You can use either:



    result.groupby('serialNo').agg(list) #To get a list of values


    Output:



                                     Name            testName  
    serialNo
    aaaa [Sayonti, Sayonti, Sayonti] [4402, 4402, 3602]
    bbbb [Ruchi] [3747]
    ffff [Gowtam] [8754]

    moduleName endResult
    serialNo
    aaaa [singing, singing, singing] [WARNING, WARNING, WARNING]
    bbbb [dance] [FAILED]
    ffff [vocals] [FAILED]

    Food Work
    serialNo
    aaaa [Strawberry, Pepsi, Pepsi] [AP, OD, OD]
    bbbb [Coke] [TC]
    ffff [Nuts] [PU]


    Or



    result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings


    Output:



                                   Name          testName  
    serialNo
    aaaa Sayonti, Sayonti, Sayonti 4402, 4402, 3602
    bbbb Ruchi 3747
    ffff Gowtam 8754

    moduleName endResult
    serialNo
    aaaa singing, singing, singing WARNING, WARNING, WARNING
    bbbb dance FAILED
    ffff vocals FAILED

    Food Work
    serialNo
    aaaa Strawberry, Pepsi, Pepsi AP, OD, OD
    bbbb Coke TC
    ffff Nuts PU





    share|improve this answer

























      up vote
      0
      down vote













      You can use either:



      result.groupby('serialNo').agg(list) #To get a list of values


      Output:



                                       Name            testName  
      serialNo
      aaaa [Sayonti, Sayonti, Sayonti] [4402, 4402, 3602]
      bbbb [Ruchi] [3747]
      ffff [Gowtam] [8754]

      moduleName endResult
      serialNo
      aaaa [singing, singing, singing] [WARNING, WARNING, WARNING]
      bbbb [dance] [FAILED]
      ffff [vocals] [FAILED]

      Food Work
      serialNo
      aaaa [Strawberry, Pepsi, Pepsi] [AP, OD, OD]
      bbbb [Coke] [TC]
      ffff [Nuts] [PU]


      Or



      result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings


      Output:



                                     Name          testName  
      serialNo
      aaaa Sayonti, Sayonti, Sayonti 4402, 4402, 3602
      bbbb Ruchi 3747
      ffff Gowtam 8754

      moduleName endResult
      serialNo
      aaaa singing, singing, singing WARNING, WARNING, WARNING
      bbbb dance FAILED
      ffff vocals FAILED

      Food Work
      serialNo
      aaaa Strawberry, Pepsi, Pepsi AP, OD, OD
      bbbb Coke TC
      ffff Nuts PU





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        You can use either:



        result.groupby('serialNo').agg(list) #To get a list of values


        Output:



                                         Name            testName  
        serialNo
        aaaa [Sayonti, Sayonti, Sayonti] [4402, 4402, 3602]
        bbbb [Ruchi] [3747]
        ffff [Gowtam] [8754]

        moduleName endResult
        serialNo
        aaaa [singing, singing, singing] [WARNING, WARNING, WARNING]
        bbbb [dance] [FAILED]
        ffff [vocals] [FAILED]

        Food Work
        serialNo
        aaaa [Strawberry, Pepsi, Pepsi] [AP, OD, OD]
        bbbb [Coke] [TC]
        ffff [Nuts] [PU]


        Or



        result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings


        Output:



                                       Name          testName  
        serialNo
        aaaa Sayonti, Sayonti, Sayonti 4402, 4402, 3602
        bbbb Ruchi 3747
        ffff Gowtam 8754

        moduleName endResult
        serialNo
        aaaa singing, singing, singing WARNING, WARNING, WARNING
        bbbb dance FAILED
        ffff vocals FAILED

        Food Work
        serialNo
        aaaa Strawberry, Pepsi, Pepsi AP, OD, OD
        bbbb Coke TC
        ffff Nuts PU





        share|improve this answer












        You can use either:



        result.groupby('serialNo').agg(list) #To get a list of values


        Output:



                                         Name            testName  
        serialNo
        aaaa [Sayonti, Sayonti, Sayonti] [4402, 4402, 3602]
        bbbb [Ruchi] [3747]
        ffff [Gowtam] [8754]

        moduleName endResult
        serialNo
        aaaa [singing, singing, singing] [WARNING, WARNING, WARNING]
        bbbb [dance] [FAILED]
        ffff [vocals] [FAILED]

        Food Work
        serialNo
        aaaa [Strawberry, Pepsi, Pepsi] [AP, OD, OD]
        bbbb [Coke] [TC]
        ffff [Nuts] [PU]


        Or



        result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings


        Output:



                                       Name          testName  
        serialNo
        aaaa Sayonti, Sayonti, Sayonti 4402, 4402, 3602
        bbbb Ruchi 3747
        ffff Gowtam 8754

        moduleName endResult
        serialNo
        aaaa singing, singing, singing WARNING, WARNING, WARNING
        bbbb dance FAILED
        ffff vocals FAILED

        Food Work
        serialNo
        aaaa Strawberry, Pepsi, Pepsi AP, OD, OD
        bbbb Coke TC
        ffff Nuts PU






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 8 at 22:26









        Scott Boston

        49.7k72754




        49.7k72754






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53215650%2fconcatenate-the-strings-of-multiple-columns-in-multiple-rows-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Hercules Kyvelos

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud