Use issubset to compare set values between two pandas dataframe columns
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
add a comment |
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
add a comment |
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
I have a pandas dataframe with two columns that are filled with pandas sets. I want to check that all values in one column are a subset of the other column. I thought the code below would work but it seems you cannot apply .issubset() to two series with sets.
Ex:
data = [[['one','orange','green'],['one','orange']],[['milk','honey'],['Clarke', 'honey']]]
df = pd.DataFrame(data, columns=['Column_1','Column_2'])
Are_all_column_2_values_valid = df.loc[:, 'Column_2'].apply(set).issubset(df.loc[:, 'Column_1'])
desired_output = pd.series([True,False])
All values in both sets will be strings.
Any help would greatly be appreciated!
python python-3.x pandas dataframe set
python python-3.x pandas dataframe set
edited Nov 20 '18 at 0:16
jpp
101k2162111
101k2162111
asked Nov 19 '18 at 23:36
S MS M
212
212
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
add a comment |
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384190%2fuse-issubset-to-compare-set-values-between-two-pandas-dataframe-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
add a comment |
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
add a comment |
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
First ensure you actually have series of sets:
df = df.apply(lambda x: x.apply(set))
Then use the syntactic sugar <=
for set.issubset
:
print(df['Column_2'] <= df['Column_1'])
0 True
1 False
dtype: bool
edited Nov 20 '18 at 0:28
answered Nov 20 '18 at 0:15
jppjpp
101k2162111
101k2162111
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
add a comment |
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
Interesting solution. I am trying to apply the set to two columns of a larger pandas dataframe. I tried: df['Column_1'] = df['Column_1'].apply(lambda x: x.apply(set)) but get an error 'AttributeError: 'list' object has no attribute 'apply'' Do you know how to fix this?
– S M
Nov 20 '18 at 18:24
add a comment |
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
add a comment |
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
add a comment |
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
You can use a list comprehension like this:
>>> [set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1)]
[True, False]
Or as a Series:
>>> pd.Series(set(v).issubset(i) for v, i in zip(df.Column_2, df.Column_1))
0 True
1 False
dtype: bool
edited Nov 20 '18 at 14:50
answered Nov 19 '18 at 23:46
sacuLsacuL
30.4k41941
30.4k41941
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53384190%2fuse-issubset-to-compare-set-values-between-two-pandas-dataframe-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown