Pandas DataFrame turn a list of jsons column into informative row, per “id”





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















Consider the following DataFrame:



import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],
'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
[{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
[{'aa': 3, 'ac': 2}] ]})
df
Out[134]:
id json_col
0 1 [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]
1 2 [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]
2 3 [{'aa': 3, 'ac': 2}]


We can see that we have a list of jsons for each id.



I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:



   id  aa   ab   ac
0 1 1 1.0 NaN
1 1 3 2.0 6.0
2 2 1 2.0 1.0
3 2 5 NaN NaN
4 3 3 NaN 2.0


We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame



Is there a pythonic way to do so using panda, numpy or json functionality?





Adding the run times of the solutions



setup = """
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3],
'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
[{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
[{'aa': 3, 'ac': 2}] ]})
"""

s1 = """
df = pd.concat(
[pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],
sort=False
)
"""

s2 = """
recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
df2 = pd.DataFrame.from_records(recs)
"""

%timeit(s1, setup)
52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit(s2, setup)
50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)









share|improve this question































    1















    Consider the following DataFrame:



    import pandas as pd

    df = pd.DataFrame({'id': [1, 2, 3],
    'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
    [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
    [{'aa': 3, 'ac': 2}] ]})
    df
    Out[134]:
    id json_col
    0 1 [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]
    1 2 [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]
    2 3 [{'aa': 3, 'ac': 2}]


    We can see that we have a list of jsons for each id.



    I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:



       id  aa   ab   ac
    0 1 1 1.0 NaN
    1 1 3 2.0 6.0
    2 2 1 2.0 1.0
    3 2 5 NaN NaN
    4 3 3 NaN 2.0


    We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame



    Is there a pythonic way to do so using panda, numpy or json functionality?





    Adding the run times of the solutions



    setup = """
    import pandas as pd
    df = pd.DataFrame({'id': [1, 2, 3],
    'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
    [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
    [{'aa': 3, 'ac': 2}] ]})
    """

    s1 = """
    df = pd.concat(
    [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],
    sort=False
    )
    """

    s2 = """
    recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
    df2 = pd.DataFrame.from_records(recs)
    """

    %timeit(s1, setup)
    52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    %timeit(s2, setup)
    50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)









    share|improve this question



























      1












      1








      1








      Consider the following DataFrame:



      import pandas as pd

      df = pd.DataFrame({'id': [1, 2, 3],
      'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
      [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
      [{'aa': 3, 'ac': 2}] ]})
      df
      Out[134]:
      id json_col
      0 1 [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]
      1 2 [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]
      2 3 [{'aa': 3, 'ac': 2}]


      We can see that we have a list of jsons for each id.



      I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:



         id  aa   ab   ac
      0 1 1 1.0 NaN
      1 1 3 2.0 6.0
      2 2 1 2.0 1.0
      3 2 5 NaN NaN
      4 3 3 NaN 2.0


      We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame



      Is there a pythonic way to do so using panda, numpy or json functionality?





      Adding the run times of the solutions



      setup = """
      import pandas as pd
      df = pd.DataFrame({'id': [1, 2, 3],
      'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
      [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
      [{'aa': 3, 'ac': 2}] ]})
      """

      s1 = """
      df = pd.concat(
      [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],
      sort=False
      )
      """

      s2 = """
      recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
      df2 = pd.DataFrame.from_records(recs)
      """

      %timeit(s1, setup)
      52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
      %timeit(s2, setup)
      50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)









      share|improve this question
















      Consider the following DataFrame:



      import pandas as pd

      df = pd.DataFrame({'id': [1, 2, 3],
      'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
      [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
      [{'aa': 3, 'ac': 2}] ]})
      df
      Out[134]:
      id json_col
      0 1 [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]
      1 2 [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]
      2 3 [{'aa': 3, 'ac': 2}]


      We can see that we have a list of jsons for each id.



      I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:



         id  aa   ab   ac
      0 1 1 1.0 NaN
      1 1 3 2.0 6.0
      2 2 1 2.0 1.0
      3 2 5 NaN NaN
      4 3 3 NaN 2.0


      We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame



      Is there a pythonic way to do so using panda, numpy or json functionality?





      Adding the run times of the solutions



      setup = """
      import pandas as pd
      df = pd.DataFrame({'id': [1, 2, 3],
      'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],
      [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],
      [{'aa': 3, 'ac': 2}] ]})
      """

      s1 = """
      df = pd.concat(
      [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],
      sort=False
      )
      """

      s2 = """
      recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
      df2 = pd.DataFrame.from_records(recs)
      """

      %timeit(s1, setup)
      52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
      %timeit(s2, setup)
      50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)






      python json pandas numpy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 25 '18 at 10:32







      Eran Moshe

















      asked Nov 25 '18 at 9:09









      Eran MosheEran Moshe

      1,418723




      1,418723
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:



          In [51]: df = pd.concat(
          [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],
          sort=False
          )

          In [52]: df.index.name = 'id'

          In [53]: df.reset_index()
          Out[53]:
          id aa ab ac
          0 1 1 1.0 NaN
          1 1 3 2.0 6.0
          2 2 1 2.0 1.0
          3 2 5 NaN NaN
          4 3 3 NaN 2.0





          share|improve this answer


























          • Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

            – Eran Moshe
            Nov 25 '18 at 10:12











          • @EranMoshe It's very likely that the other answer works a little bit slower though.

            – Kasrâmvd
            Nov 25 '18 at 10:16











          • I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

            – Eran Moshe
            Nov 25 '18 at 10:17













          • @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

            – Kasrâmvd
            Nov 25 '18 at 10:20






          • 1





            After checking it again, they run roughly the same. I'll edit it though.

            – Eran Moshe
            Nov 25 '18 at 10:30



















          1














          a short way to accomplish this would be the following, although I don't personally consider it very pythonic as the code is a little hard to read, and not terribly performant, but for small data wrangling this should do the trick:



          recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
          df2 = pd.DataFrame.from_records(recs)
          # outputs:
          aa ab ac id
          0 1 1.0 NaN 1
          1 3 2.0 6.0 1
          2 1 2.0 1.0 2
          3 5 NaN NaN 2
          4 3 NaN 2.0 3




          How it works:




          1. The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).



          2. This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form



            [{'id': 1, 'aa': 1, 'ab': 1},
            {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},
            {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},
            {'id': 2, 'aa': 5},
            {'id': 3, 'aa': 3, 'ac': 2}]


          3. A new data frame is then simply constructed from the records.







          share|improve this answer


























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466076%2fpandas-dataframe-turn-a-list-of-jsons-column-into-informative-row-per-id%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:



            In [51]: df = pd.concat(
            [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],
            sort=False
            )

            In [52]: df.index.name = 'id'

            In [53]: df.reset_index()
            Out[53]:
            id aa ab ac
            0 1 1 1.0 NaN
            1 1 3 2.0 6.0
            2 2 1 2.0 1.0
            3 2 5 NaN NaN
            4 3 3 NaN 2.0





            share|improve this answer


























            • Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

              – Eran Moshe
              Nov 25 '18 at 10:12











            • @EranMoshe It's very likely that the other answer works a little bit slower though.

              – Kasrâmvd
              Nov 25 '18 at 10:16











            • I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

              – Eran Moshe
              Nov 25 '18 at 10:17













            • @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

              – Kasrâmvd
              Nov 25 '18 at 10:20






            • 1





              After checking it again, they run roughly the same. I'll edit it though.

              – Eran Moshe
              Nov 25 '18 at 10:30
















            1














            Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:



            In [51]: df = pd.concat(
            [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],
            sort=False
            )

            In [52]: df.index.name = 'id'

            In [53]: df.reset_index()
            Out[53]:
            id aa ab ac
            0 1 1 1.0 NaN
            1 1 3 2.0 6.0
            2 2 1 2.0 1.0
            3 2 5 NaN NaN
            4 3 3 NaN 2.0





            share|improve this answer


























            • Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

              – Eran Moshe
              Nov 25 '18 at 10:12











            • @EranMoshe It's very likely that the other answer works a little bit slower though.

              – Kasrâmvd
              Nov 25 '18 at 10:16











            • I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

              – Eran Moshe
              Nov 25 '18 at 10:17













            • @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

              – Kasrâmvd
              Nov 25 '18 at 10:20






            • 1





              After checking it again, they run roughly the same. I'll edit it though.

              – Eran Moshe
              Nov 25 '18 at 10:30














            1












            1








            1







            Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:



            In [51]: df = pd.concat(
            [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],
            sort=False
            )

            In [52]: df.index.name = 'id'

            In [53]: df.reset_index()
            Out[53]:
            id aa ab ac
            0 1 1 1.0 NaN
            1 1 3 2.0 6.0
            2 2 1 2.0 1.0
            3 2 5 NaN NaN
            4 3 3 NaN 2.0





            share|improve this answer















            Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:



            In [51]: df = pd.concat(
            [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],
            sort=False
            )

            In [52]: df.index.name = 'id'

            In [53]: df.reset_index()
            Out[53]:
            id aa ab ac
            0 1 1 1.0 NaN
            1 1 3 2.0 6.0
            2 2 1 2.0 1.0
            3 2 5 NaN NaN
            4 3 3 NaN 2.0






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 25 '18 at 10:08

























            answered Nov 25 '18 at 9:29









            KasrâmvdKasrâmvd

            80.1k1093131




            80.1k1093131













            • Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

              – Eran Moshe
              Nov 25 '18 at 10:12











            • @EranMoshe It's very likely that the other answer works a little bit slower though.

              – Kasrâmvd
              Nov 25 '18 at 10:16











            • I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

              – Eran Moshe
              Nov 25 '18 at 10:17













            • @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

              – Kasrâmvd
              Nov 25 '18 at 10:20






            • 1





              After checking it again, they run roughly the same. I'll edit it though.

              – Eran Moshe
              Nov 25 '18 at 10:30



















            • Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

              – Eran Moshe
              Nov 25 '18 at 10:12











            • @EranMoshe It's very likely that the other answer works a little bit slower though.

              – Kasrâmvd
              Nov 25 '18 at 10:16











            • I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

              – Eran Moshe
              Nov 25 '18 at 10:17













            • @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

              – Kasrâmvd
              Nov 25 '18 at 10:20






            • 1





              After checking it again, they run roughly the same. I'll edit it though.

              – Eran Moshe
              Nov 25 '18 at 10:30

















            Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

            – Eran Moshe
            Nov 25 '18 at 10:12





            Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

            – Eran Moshe
            Nov 25 '18 at 10:12













            @EranMoshe It's very likely that the other answer works a little bit slower though.

            – Kasrâmvd
            Nov 25 '18 at 10:16





            @EranMoshe It's very likely that the other answer works a little bit slower though.

            – Kasrâmvd
            Nov 25 '18 at 10:16













            I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

            – Eran Moshe
            Nov 25 '18 at 10:17







            I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

            – Eran Moshe
            Nov 25 '18 at 10:17















            @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

            – Kasrâmvd
            Nov 25 '18 at 10:20





            @EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

            – Kasrâmvd
            Nov 25 '18 at 10:20




            1




            1





            After checking it again, they run roughly the same. I'll edit it though.

            – Eran Moshe
            Nov 25 '18 at 10:30





            After checking it again, they run roughly the same. I'll edit it though.

            – Eran Moshe
            Nov 25 '18 at 10:30













            1














            a short way to accomplish this would be the following, although I don't personally consider it very pythonic as the code is a little hard to read, and not terribly performant, but for small data wrangling this should do the trick:



            recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
            df2 = pd.DataFrame.from_records(recs)
            # outputs:
            aa ab ac id
            0 1 1.0 NaN 1
            1 3 2.0 6.0 1
            2 1 2.0 1.0 2
            3 5 NaN NaN 2
            4 3 NaN 2.0 3




            How it works:




            1. The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).



            2. This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form



              [{'id': 1, 'aa': 1, 'ab': 1},
              {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},
              {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},
              {'id': 2, 'aa': 5},
              {'id': 3, 'aa': 3, 'ac': 2}]


            3. A new data frame is then simply constructed from the records.







            share|improve this answer






























              1














              a short way to accomplish this would be the following, although I don't personally consider it very pythonic as the code is a little hard to read, and not terribly performant, but for small data wrangling this should do the trick:



              recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
              df2 = pd.DataFrame.from_records(recs)
              # outputs:
              aa ab ac id
              0 1 1.0 NaN 1
              1 3 2.0 6.0 1
              2 1 2.0 1.0 2
              3 5 NaN NaN 2
              4 3 NaN 2.0 3




              How it works:




              1. The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).



              2. This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form



                [{'id': 1, 'aa': 1, 'ab': 1},
                {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},
                {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},
                {'id': 2, 'aa': 5},
                {'id': 3, 'aa': 3, 'ac': 2}]


              3. A new data frame is then simply constructed from the records.







              share|improve this answer




























                1












                1








                1







                a short way to accomplish this would be the following, although I don't personally consider it very pythonic as the code is a little hard to read, and not terribly performant, but for small data wrangling this should do the trick:



                recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
                df2 = pd.DataFrame.from_records(recs)
                # outputs:
                aa ab ac id
                0 1 1.0 NaN 1
                1 3 2.0 6.0 1
                2 1 2.0 1.0 2
                3 5 NaN NaN 2
                4 3 NaN 2.0 3




                How it works:




                1. The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).



                2. This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form



                  [{'id': 1, 'aa': 1, 'ab': 1},
                  {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},
                  {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},
                  {'id': 2, 'aa': 5},
                  {'id': 3, 'aa': 3, 'ac': 2}]


                3. A new data frame is then simply constructed from the records.







                share|improve this answer















                a short way to accomplish this would be the following, although I don't personally consider it very pythonic as the code is a little hard to read, and not terribly performant, but for small data wrangling this should do the trick:



                recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()
                df2 = pd.DataFrame.from_records(recs)
                # outputs:
                aa ab ac id
                0 1 1.0 NaN 1
                1 3 2.0 6.0 1
                2 1 2.0 1.0 2
                3 5 NaN NaN 2
                4 3 NaN 2.0 3




                How it works:




                1. The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).



                2. This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form



                  [{'id': 1, 'aa': 1, 'ab': 1},
                  {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},
                  {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},
                  {'id': 2, 'aa': 5},
                  {'id': 3, 'aa': 3, 'ac': 2}]


                3. A new data frame is then simply constructed from the records.








                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 25 '18 at 9:57

























                answered Nov 25 '18 at 9:27









                Haleemur AliHaleemur Ali

                12.9k21741




                12.9k21741






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466076%2fpandas-dataframe-turn-a-list-of-jsons-column-into-informative-row-per-id%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Hercules Kyvelos

                    Tangent Lines Diagram Along Smooth Curve

                    Yusuf al-Mu'taman ibn Hud