Adding a grouped, aggregate nunique column to pandas dataframe
I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.
my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.
something like this isn't working:
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()
nor is
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)
this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)
in R this is easily done in data.table with
df[, n_unique_id := uniqueN(id), by = c('track', 'type')]
thanks!
python pandas dataframe aggregate pandas-groupby
add a comment |
I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.
my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.
something like this isn't working:
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()
nor is
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)
this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)
in R this is easily done in data.table with
df[, n_unique_id := uniqueN(id), by = c('track', 'type')]
thanks!
python pandas dataframe aggregate pandas-groupby
Can paste a copy of your source dataframe and what the expected output should look like?
– Scott Boston
May 1 '17 at 21:27
add a comment |
I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.
my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.
something like this isn't working:
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()
nor is
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)
this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)
in R this is easily done in data.table with
df[, n_unique_id := uniqueN(id), by = c('track', 'type')]
thanks!
python pandas dataframe aggregate pandas-groupby
I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.
my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.
something like this isn't working:
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()
nor is
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)
this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)
in R this is easily done in data.table with
df[, n_unique_id := uniqueN(id), by = c('track', 'type')]
thanks!
python pandas dataframe aggregate pandas-groupby
python pandas dataframe aggregate pandas-groupby
edited Nov 22 '18 at 11:52
jpp
102k2165115
102k2165115
asked May 1 '17 at 21:14
wbartswbarts
142
142
Can paste a copy of your source dataframe and what the expected output should look like?
– Scott Boston
May 1 '17 at 21:27
add a comment |
Can paste a copy of your source dataframe and what the expected output should look like?
– Scott Boston
May 1 '17 at 21:27
Can paste a copy of your source dataframe and what the expected output should look like?
– Scott Boston
May 1 '17 at 21:27
Can paste a copy of your source dataframe and what the expected output should look like?
– Scott Boston
May 1 '17 at 21:27
add a comment |
1 Answer
1
active
oldest
votes
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique
in the name space that performs some function. transform
will take a function or a string that it knows a function for. nunique
is definitely one of those strings.
As pointed out by @root, often the method that pandas
will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True
even for passing numpy
functions in some cases.
For example transform('sum')
should be preferred over transform(sum)
.
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64
4
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g..transform(sum)
uses pythonsum
, whereas.transfrom('sum')
would use numpy and be more computationally efficient.
– root
May 1 '17 at 21:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f43726631%2fadding-a-grouped-aggregate-nunique-column-to-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique
in the name space that performs some function. transform
will take a function or a string that it knows a function for. nunique
is definitely one of those strings.
As pointed out by @root, often the method that pandas
will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True
even for passing numpy
functions in some cases.
For example transform('sum')
should be preferred over transform(sum)
.
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64
4
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g..transform(sum)
uses pythonsum
, whereas.transfrom('sum')
would use numpy and be more computationally efficient.
– root
May 1 '17 at 21:31
add a comment |
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique
in the name space that performs some function. transform
will take a function or a string that it knows a function for. nunique
is definitely one of those strings.
As pointed out by @root, often the method that pandas
will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True
even for passing numpy
functions in some cases.
For example transform('sum')
should be preferred over transform(sum)
.
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64
4
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g..transform(sum)
uses pythonsum
, whereas.transfrom('sum')
would use numpy and be more computationally efficient.
– root
May 1 '17 at 21:31
add a comment |
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique
in the name space that performs some function. transform
will take a function or a string that it knows a function for. nunique
is definitely one of those strings.
As pointed out by @root, often the method that pandas
will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True
even for passing numpy
functions in some cases.
For example transform('sum')
should be preferred over transform(sum)
.
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique
in the name space that performs some function. transform
will take a function or a string that it knows a function for. nunique
is definitely one of those strings.
As pointed out by @root, often the method that pandas
will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True
even for passing numpy
functions in some cases.
For example transform('sum')
should be preferred over transform(sum)
.
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64
edited May 1 '17 at 21:31
answered May 1 '17 at 21:27
piRSquaredpiRSquared
158k23154298
158k23154298
4
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g..transform(sum)
uses pythonsum
, whereas.transfrom('sum')
would use numpy and be more computationally efficient.
– root
May 1 '17 at 21:31
add a comment |
4
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g..transform(sum)
uses pythonsum
, whereas.transfrom('sum')
would use numpy and be more computationally efficient.
– root
May 1 '17 at 21:31
4
4
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g.
.transform(sum)
uses python sum
, whereas .transfrom('sum')
would use numpy and be more computationally efficient.– root
May 1 '17 at 21:31
I was just about to comment with this. It might be useful to mention that the string aliases should usually be preferred if available, as they'll reference optimal functions, e.g.
.transform(sum)
uses python sum
, whereas .transfrom('sum')
would use numpy and be more computationally efficient.– root
May 1 '17 at 21:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f43726631%2fadding-a-grouped-aggregate-nunique-column-to-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can paste a copy of your source dataframe and what the expected output should look like?
– Scott Boston
May 1 '17 at 21:27