Pandas DataFrame turn a list of jsons column into informative row, per “id”

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

Consider the following DataFrame:

import pandas as pd



df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

df

Out[134]: 

   id                                           json_col

0   1  [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]

1   2           [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]

2   3                               [{'aa': 3, 'ac': 2}]

We can see that we have a list of jsons for each id.

I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame

Is there a pythonic way to do so using panda, numpy or json functionality?

Adding the run times of the solutions

setup = """

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

"""



s1 = """

df = pd.concat(

       [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],

       sort=False

     )                             

"""



s2 = """

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

"""



%timeit(s1, setup)

52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit(s2, setup)

50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

edited Nov 25 '18 at 10:32

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

add a comment |

Consider the following DataFrame:

import pandas as pd



df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

df

Out[134]: 

   id                                           json_col

0   1  [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]

1   2           [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]

2   3                               [{'aa': 3, 'ac': 2}]

We can see that we have a list of jsons for each id.

I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame

Is there a pythonic way to do so using panda, numpy or json functionality?

Adding the run times of the solutions

setup = """

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

"""



s1 = """

df = pd.concat(

       [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],

       sort=False

     )                             

"""



s2 = """

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

"""



%timeit(s1, setup)

52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit(s2, setup)

50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

edited Nov 25 '18 at 10:32

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

add a comment |

Consider the following DataFrame:

import pandas as pd



df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

df

Out[134]: 

   id                                           json_col

0   1  [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]

1   2           [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]

2   3                               [{'aa': 3, 'ac': 2}]

We can see that we have a list of jsons for each id.

I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame

Is there a pythonic way to do so using panda, numpy or json functionality?

Adding the run times of the solutions

setup = """

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

"""



s1 = """

df = pd.concat(

       [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],

       sort=False

     )                             

"""



s2 = """

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

"""



%timeit(s1, setup)

52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit(s2, setup)

50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

edited Nov 25 '18 at 10:32

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

Consider the following DataFrame:

import pandas as pd



df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

df

Out[134]: 

   id                                           json_col

0   1  [{'aa': 1, 'ab': 1}, {'aa': 3, 'ab': 2, 'ac': 6}]

1   2           [{'aa': 1, 'ab': 2, 'ac': 1}, {'aa': 5}]

2   3                               [{'aa': 3, 'ac': 2}]

We can see that we have a list of jsons for each id.

I'd like, for each 'id' and for each corresponding json in its list, to have a 'row' in the DataFrame. So the following DataFrame will look like this:

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

We can see, id '1' had 2 corresponding jsons in it's list and therefor it gets 2 rows in the new DataFrame

Is there a pythonic way to do so using panda, numpy or json functionality?

Adding the run times of the solutions

setup = """

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],

               'json_col': [ [{'aa' : 1, 'ab' : 1}, {'aa' : 3, 'ab' : 2, 'ac': 6}],

                             [{'aa' : 1, 'ab' : 2, 'ac': 1}, {'aa' : 5}],

                             [{'aa': 3, 'ac': 2}] ]})

"""



s1 = """

df = pd.concat(

       [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(df['json_col'], 1)],

       sort=False

     )                             

"""



s2 = """

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

"""



%timeit(s1, setup)

52.3 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit(s2, setup)

50.6 ns ± 3.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

python json pandas numpy

edited Nov 25 '18 at 10:32

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

edited Nov 25 '18 at 10:32

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

edited Nov 25 '18 at 10:32

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

asked Nov 25 '18 at 9:09

Eran Moshe

1,418723

add a comment |

2 Answers
2

active

oldest

votes

Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:

In [51]: df = pd.concat(

           [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],

           sort=False

         )



In [52]: df.index.name = 'id'



In [53]: df.reset_index()

Out[53]: 

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

edited Nov 25 '18 at 10:08

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

– Eran Moshe
Nov 25 '18 at 10:12

@EranMoshe It's very likely that the other answer works a little bit slower though.

– Kasrâmvd
Nov 25 '18 at 10:16

I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

– Eran Moshe
Nov 25 '18 at 10:17

@EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

– Kasrâmvd
Nov 25 '18 at 10:20

1

After checking it again, they run roughly the same. I'll edit it though.

– Eran Moshe
Nov 25 '18 at 10:30

add a comment |

a short way to accomplish this would be the following, although I don't personally consider it very pythonic as the code is a little hard to read, and not terribly performant, but for small data wrangling this should do the trick:

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

# outputs:

   aa   ab   ac  id

0   1  1.0  NaN   1

1   3  2.0  6.0   1

2   1  2.0  1.0   2

3   5  NaN  NaN   2

4   3  NaN  2.0   3

How it works:

The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).

This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form

[{'id': 1, 'aa': 1, 'ab': 1},

 {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},

 {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},

 {'id': 2, 'aa': 5},

 {'id': 3, 'aa': 3, 'ac': 2}]

A new data frame is then simply constructed from the records.

edited Nov 25 '18 at 9:57

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466076%2fpandas-dataframe-turn-a-list-of-jsons-column-into-informative-row-per-id%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:

In [51]: df = pd.concat(

           [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],

           sort=False

         )



In [52]: df.index.name = 'id'



In [53]: df.reset_index()

Out[53]: 

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

edited Nov 25 '18 at 10:08

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

– Eran Moshe
Nov 25 '18 at 10:12

@EranMoshe It's very likely that the other answer works a little bit slower though.

– Kasrâmvd
Nov 25 '18 at 10:16

I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

– Eran Moshe
Nov 25 '18 at 10:17

@EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

– Kasrâmvd
Nov 25 '18 at 10:20

1

After checking it again, they run roughly the same. I'll edit it though.

– Eran Moshe
Nov 25 '18 at 10:30

add a comment |

Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:

In [51]: df = pd.concat(

           [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],

           sort=False

         )



In [52]: df.index.name = 'id'



In [53]: df.reset_index()

Out[53]: 

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

edited Nov 25 '18 at 10:08

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

– Eran Moshe
Nov 25 '18 at 10:12

@EranMoshe It's very likely that the other answer works a little bit slower though.

– Kasrâmvd
Nov 25 '18 at 10:16

I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

– Eran Moshe
Nov 25 '18 at 10:17

@EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

– Kasrâmvd
Nov 25 '18 at 10:20

1

After checking it again, they run roughly the same. I'll edit it though.

– Eran Moshe
Nov 25 '18 at 10:30

add a comment |

Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:

In [51]: df = pd.concat(

           [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],

           sort=False

         )



In [52]: df.index.name = 'id'



In [53]: df.reset_index()

Out[53]: 

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

edited Nov 25 '18 at 10:08

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

Here is one quick way by converting all the json_col's lists of dictionaries to DataFrame and concatenating them together plus some tweaks to create the id column:

In [51]: df = pd.concat(

           [pd.DataFrame(j, index=[i]*len(j)) for i, j in enumerate(json_col, 1)],

           sort=False

         )



In [52]: df.index.name = 'id'



In [53]: df.reset_index()

Out[53]: 

   id  aa   ab   ac

0   1   1  1.0  NaN

1   1   3  2.0  6.0

2   2   1  2.0  1.0

3   2   5  NaN  NaN

4   3   3  NaN  2.0

edited Nov 25 '18 at 10:08

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

edited Nov 25 '18 at 10:08

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

answered Nov 25 '18 at 9:29

Kasrâmvd

80.1k1093131

Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

– Eran Moshe
Nov 25 '18 at 10:12

@EranMoshe It's very likely that the other answer works a little bit slower though.

– Kasrâmvd
Nov 25 '18 at 10:16

I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

– Eran Moshe
Nov 25 '18 at 10:17

@EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

– Kasrâmvd
Nov 25 '18 at 10:20

1

After checking it again, they run roughly the same. I'll edit it though.

– Eran Moshe
Nov 25 '18 at 10:30

add a comment |

Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

– Eran Moshe
Nov 25 '18 at 10:12

@EranMoshe It's very likely that the other answer works a little bit slower though.

– Kasrâmvd
Nov 25 '18 at 10:16

I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

– Eran Moshe
Nov 25 '18 at 10:17

@EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

– Kasrâmvd
Nov 25 '18 at 10:20

1

After checking it again, they run roughly the same. I'll edit it though.

– Eran Moshe
Nov 25 '18 at 10:30

Ali solution work faster, but it's more python and easy to understand, So I'll accept this one

– Eran Moshe
Nov 25 '18 at 10:12

@EranMoshe It's very likely that the other answer works a little bit slower though.

– Kasrâmvd
Nov 25 '18 at 10:16

I thought so too.. But timeit proved me wrong. I might misused it.. You wanna give it a go ?

– Eran Moshe
Nov 25 '18 at 10:17

@EranMoshe Oh, that's interesting! Can you please update your question with the benchmarks? Thanks.

– Kasrâmvd
Nov 25 '18 at 10:20

After checking it again, they run roughly the same. I'll edit it though.

– Eran Moshe
Nov 25 '18 at 10:30

add a comment |

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

# outputs:

   aa   ab   ac  id

0   1  1.0  NaN   1

1   3  2.0  6.0   1

2   1  2.0  1.0   2

3   5  NaN  NaN   2

4   3  NaN  2.0   3

How it works:

The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).

This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form

[{'id': 1, 'aa': 1, 'ab': 1},

 {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},

 {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},

 {'id': 2, 'aa': 5},

 {'id': 3, 'aa': 3, 'ac': 2}]

A new data frame is then simply constructed from the records.

edited Nov 25 '18 at 9:57

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

add a comment |

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

# outputs:

   aa   ab   ac  id

0   1  1.0  NaN   1

1   3  2.0  6.0   1

2   1  2.0  1.0   2

3   5  NaN  NaN   2

4   3  NaN  2.0   3

How it works:

The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).

This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form

[{'id': 1, 'aa': 1, 'ab': 1},

 {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},

 {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},

 {'id': 2, 'aa': 5},

 {'id': 3, 'aa': 3, 'ac': 2}]

A new data frame is then simply constructed from the records.

edited Nov 25 '18 at 9:57

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

add a comment |

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

# outputs:

   aa   ab   ac  id

0   1  1.0  NaN   1

1   3  2.0  6.0   1

2   1  2.0  1.0   2

3   5  NaN  NaN   2

4   3  NaN  2.0   3

How it works:

The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).

This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form

[{'id': 1, 'aa': 1, 'ab': 1},

 {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},

 {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},

 {'id': 2, 'aa': 5},

 {'id': 3, 'aa': 3, 'ac': 2}]

A new data frame is then simply constructed from the records.

edited Nov 25 '18 at 9:57

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

recs = df.apply(lambda x: [{**{'id': x.id}, **d} for d in x.json_col], axis=1).sum()

df2 = pd.DataFrame.from_records(recs)

# outputs:

   aa   ab   ac  id

0   1  1.0  NaN   1

1   3  2.0  6.0   1

2   1  2.0  1.0   2

3   5  NaN  NaN   2

4   3  NaN  2.0   3

How it works:

The applied lambda creates a new dictionary by merging the contents of {id: x.id} to each dictionary in the list of dictionaries in x.json_col (where x is a row).

This is then summed. Since summing a lists of list of elements unites them into a big list of elements, recs has the following form

[{'id': 1, 'aa': 1, 'ab': 1},

 {'id': 1, 'aa': 3, 'ab': 2, 'ac': 6},

 {'id': 2, 'aa': 1, 'ab': 2, 'ac': 1},

 {'id': 2, 'aa': 5},

 {'id': 3, 'aa': 3, 'ac': 2}]

A new data frame is then simply constructed from the records.

edited Nov 25 '18 at 9:57

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

edited Nov 25 '18 at 9:57

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

answered Nov 25 '18 at 9:27

Haleemur Ali

12.9k21741

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk