Compare multiple boolean columns in r












0















little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:



id creator att1 att2 att3 att... att500
a1 person1 TRUE TRUE FALSE ...
a2 person2 TRUE TRUE TRUE ...
a3 person1 TRUE FALSE FALSE ...
a4 person1 TRUE TRUE FALSE ...
a5 person2 TRUE TRUE FALSE ...


And so on. I want to count the occurences of the same attribute combination (about 500 boolish values) by different creators and do this for each line, adding the count to the repective line. In the above example hence I want to have count=1 for the first row (a1) because in a5 a different person has done the very same attribute combination. Notice that a4 does not count, because it is the same combination but by the same person. Think of self mixed cocktails and the frequency they are mixed by different persons independent of each other. row a2 shall have a count of 0, so shall a3 (no same attribute combination) and a4 respectively count = 1 because of a5. a5 has a count of 1 too. However, if other persons mix the same cocktail several times, this shall be counted. I don't want to simply remove duplicates.



My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:



for (row in 1:nrow(data)){ 
# for each row in data
creator <- row$creator
# get creator
attr_tupel <- row[1, 3:500]
#return the attribute combination of the row
data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel])
# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)
}


Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames



And now I'm stuck. I could for sure write each column separately - but that would take ages. Do I need to cast that dataframe into a list or vector or //insert sthg here// (vector and list doesn't work.) Is it at all possible to compare one row of values with many other rows for equality? I don't think having a duplicate of the row would be the solution, besides usually R does simply loop through the entries when he does not have anything to compare anymore. Why not here?



I read several threads about comparing several columns with each other, but did not succeed in transferring the solutions to my problem. e.g.: wants to look up one value for the boolish value, I have multiple TRUE values , same , wants to convert to a c() - which I could do too and compare those, but kind of a hard way, isn't it?



At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have



id creator att1 ... index
a1 person1 1 2 0 ... 3
a2 person2 1 2 3 ... 6


and compare that index. Should work. But kind of feel like that is an ugly workaround. Also when thinking of having data other than boolean, like several strings, I'd still in the long run like to able to compare a tupel of columns against each other independent of their content.



What am I missing? :)



Thanks for your help!



as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.



id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)









share|improve this question

























  • Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

    – 3bbing
    Nov 20 '18 at 18:25








  • 1





    @akrun Above I added some code to create examplary dataframe. Thanks!

    – 3bbing
    Nov 20 '18 at 18:47











  • Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

    – akrun
    Nov 20 '18 at 19:00


















0















little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:



id creator att1 att2 att3 att... att500
a1 person1 TRUE TRUE FALSE ...
a2 person2 TRUE TRUE TRUE ...
a3 person1 TRUE FALSE FALSE ...
a4 person1 TRUE TRUE FALSE ...
a5 person2 TRUE TRUE FALSE ...


And so on. I want to count the occurences of the same attribute combination (about 500 boolish values) by different creators and do this for each line, adding the count to the repective line. In the above example hence I want to have count=1 for the first row (a1) because in a5 a different person has done the very same attribute combination. Notice that a4 does not count, because it is the same combination but by the same person. Think of self mixed cocktails and the frequency they are mixed by different persons independent of each other. row a2 shall have a count of 0, so shall a3 (no same attribute combination) and a4 respectively count = 1 because of a5. a5 has a count of 1 too. However, if other persons mix the same cocktail several times, this shall be counted. I don't want to simply remove duplicates.



My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:



for (row in 1:nrow(data)){ 
# for each row in data
creator <- row$creator
# get creator
attr_tupel <- row[1, 3:500]
#return the attribute combination of the row
data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel])
# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)
}


Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames



And now I'm stuck. I could for sure write each column separately - but that would take ages. Do I need to cast that dataframe into a list or vector or //insert sthg here// (vector and list doesn't work.) Is it at all possible to compare one row of values with many other rows for equality? I don't think having a duplicate of the row would be the solution, besides usually R does simply loop through the entries when he does not have anything to compare anymore. Why not here?



I read several threads about comparing several columns with each other, but did not succeed in transferring the solutions to my problem. e.g.: wants to look up one value for the boolish value, I have multiple TRUE values , same , wants to convert to a c() - which I could do too and compare those, but kind of a hard way, isn't it?



At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have



id creator att1 ... index
a1 person1 1 2 0 ... 3
a2 person2 1 2 3 ... 6


and compare that index. Should work. But kind of feel like that is an ugly workaround. Also when thinking of having data other than boolean, like several strings, I'd still in the long run like to able to compare a tupel of columns against each other independent of their content.



What am I missing? :)



Thanks for your help!



as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.



id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)









share|improve this question

























  • Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

    – 3bbing
    Nov 20 '18 at 18:25








  • 1





    @akrun Above I added some code to create examplary dataframe. Thanks!

    – 3bbing
    Nov 20 '18 at 18:47











  • Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

    – akrun
    Nov 20 '18 at 19:00
















0












0








0








little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:



id creator att1 att2 att3 att... att500
a1 person1 TRUE TRUE FALSE ...
a2 person2 TRUE TRUE TRUE ...
a3 person1 TRUE FALSE FALSE ...
a4 person1 TRUE TRUE FALSE ...
a5 person2 TRUE TRUE FALSE ...


And so on. I want to count the occurences of the same attribute combination (about 500 boolish values) by different creators and do this for each line, adding the count to the repective line. In the above example hence I want to have count=1 for the first row (a1) because in a5 a different person has done the very same attribute combination. Notice that a4 does not count, because it is the same combination but by the same person. Think of self mixed cocktails and the frequency they are mixed by different persons independent of each other. row a2 shall have a count of 0, so shall a3 (no same attribute combination) and a4 respectively count = 1 because of a5. a5 has a count of 1 too. However, if other persons mix the same cocktail several times, this shall be counted. I don't want to simply remove duplicates.



My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:



for (row in 1:nrow(data)){ 
# for each row in data
creator <- row$creator
# get creator
attr_tupel <- row[1, 3:500]
#return the attribute combination of the row
data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel])
# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)
}


Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames



And now I'm stuck. I could for sure write each column separately - but that would take ages. Do I need to cast that dataframe into a list or vector or //insert sthg here// (vector and list doesn't work.) Is it at all possible to compare one row of values with many other rows for equality? I don't think having a duplicate of the row would be the solution, besides usually R does simply loop through the entries when he does not have anything to compare anymore. Why not here?



I read several threads about comparing several columns with each other, but did not succeed in transferring the solutions to my problem. e.g.: wants to look up one value for the boolish value, I have multiple TRUE values , same , wants to convert to a c() - which I could do too and compare those, but kind of a hard way, isn't it?



At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have



id creator att1 ... index
a1 person1 1 2 0 ... 3
a2 person2 1 2 3 ... 6


and compare that index. Should work. But kind of feel like that is an ugly workaround. Also when thinking of having data other than boolean, like several strings, I'd still in the long run like to able to compare a tupel of columns against each other independent of their content.



What am I missing? :)



Thanks for your help!



as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.



id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)









share|improve this question
















little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:



id creator att1 att2 att3 att... att500
a1 person1 TRUE TRUE FALSE ...
a2 person2 TRUE TRUE TRUE ...
a3 person1 TRUE FALSE FALSE ...
a4 person1 TRUE TRUE FALSE ...
a5 person2 TRUE TRUE FALSE ...


And so on. I want to count the occurences of the same attribute combination (about 500 boolish values) by different creators and do this for each line, adding the count to the repective line. In the above example hence I want to have count=1 for the first row (a1) because in a5 a different person has done the very same attribute combination. Notice that a4 does not count, because it is the same combination but by the same person. Think of self mixed cocktails and the frequency they are mixed by different persons independent of each other. row a2 shall have a count of 0, so shall a3 (no same attribute combination) and a4 respectively count = 1 because of a5. a5 has a count of 1 too. However, if other persons mix the same cocktail several times, this shall be counted. I don't want to simply remove duplicates.



My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:



for (row in 1:nrow(data)){ 
# for each row in data
creator <- row$creator
# get creator
attr_tupel <- row[1, 3:500]
#return the attribute combination of the row
data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel])
# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)
}


Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames



And now I'm stuck. I could for sure write each column separately - but that would take ages. Do I need to cast that dataframe into a list or vector or //insert sthg here// (vector and list doesn't work.) Is it at all possible to compare one row of values with many other rows for equality? I don't think having a duplicate of the row would be the solution, besides usually R does simply loop through the entries when he does not have anything to compare anymore. Why not here?



I read several threads about comparing several columns with each other, but did not succeed in transferring the solutions to my problem. e.g.: wants to look up one value for the boolish value, I have multiple TRUE values , same , wants to convert to a c() - which I could do too and compare those, but kind of a hard way, isn't it?



At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have



id creator att1 ... index
a1 person1 1 2 0 ... 3
a2 person2 1 2 3 ... 6


and compare that index. Should work. But kind of feel like that is an ugly workaround. Also when thinking of having data other than boolean, like several strings, I'd still in the long run like to able to compare a tupel of columns against each other independent of their content.



What am I missing? :)



Thanks for your help!



as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.



id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)






r loops boolean comparison






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 18:46







3bbing

















asked Nov 20 '18 at 18:16









3bbing3bbing

355




355













  • Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

    – 3bbing
    Nov 20 '18 at 18:25








  • 1





    @akrun Above I added some code to create examplary dataframe. Thanks!

    – 3bbing
    Nov 20 '18 at 18:47











  • Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

    – akrun
    Nov 20 '18 at 19:00





















  • Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

    – 3bbing
    Nov 20 '18 at 18:25








  • 1





    @akrun Above I added some code to create examplary dataframe. Thanks!

    – 3bbing
    Nov 20 '18 at 18:47











  • Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

    – akrun
    Nov 20 '18 at 19:00



















Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25







Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25






1




1





@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47





@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47













Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00







Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00














1 Answer
1






active

oldest

votes


















1














EDIT: Sorry - my first solution misread the question. Try this instead



You can run this using data table:



#Your set up data (with seed)
set.seed(123)
id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

# Code to run

library(data.table)

setDT(data)
dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)
dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]
dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]
dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")


Essentially what you are doing is creating the "drinks" by collapsing the columns together, counting the number of times that drink was made by others, and then merging that back to your original data set.



dt_out
id names bools1 bools2 bools3 bools4 bools5 drink times_made_others
1: 1 creator 1 FALSE TRUE FALSE TRUE TRUE FALSE_TRUE_FALSE_TRUE_TRUE 3
2: 2 creator 1 FALSE FALSE TRUE TRUE TRUE FALSE_FALSE_TRUE_TRUE_TRUE 1
3: 3 creator 1 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2
4: 4 creator 1 TRUE TRUE FALSE FALSE TRUE TRUE_TRUE_FALSE_FALSE_TRUE 0
5: 5 creator 1 TRUE FALSE FALSE FALSE FALSE TRUE_FALSE_FALSE_FALSE_FALSE 3
6: 6 creator 2 TRUE TRUE FALSE FALSE FALSE TRUE_TRUE_FALSE_FALSE_FALSE 2
7: 7 creator 2 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2





share|improve this answer


























  • amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

    – 3bbing
    Nov 20 '18 at 21:17











  • Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

    – 3bbing
    Dec 2 '18 at 11:31













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399121%2fcompare-multiple-boolean-columns-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














EDIT: Sorry - my first solution misread the question. Try this instead



You can run this using data table:



#Your set up data (with seed)
set.seed(123)
id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

# Code to run

library(data.table)

setDT(data)
dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)
dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]
dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]
dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")


Essentially what you are doing is creating the "drinks" by collapsing the columns together, counting the number of times that drink was made by others, and then merging that back to your original data set.



dt_out
id names bools1 bools2 bools3 bools4 bools5 drink times_made_others
1: 1 creator 1 FALSE TRUE FALSE TRUE TRUE FALSE_TRUE_FALSE_TRUE_TRUE 3
2: 2 creator 1 FALSE FALSE TRUE TRUE TRUE FALSE_FALSE_TRUE_TRUE_TRUE 1
3: 3 creator 1 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2
4: 4 creator 1 TRUE TRUE FALSE FALSE TRUE TRUE_TRUE_FALSE_FALSE_TRUE 0
5: 5 creator 1 TRUE FALSE FALSE FALSE FALSE TRUE_FALSE_FALSE_FALSE_FALSE 3
6: 6 creator 2 TRUE TRUE FALSE FALSE FALSE TRUE_TRUE_FALSE_FALSE_FALSE 2
7: 7 creator 2 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2





share|improve this answer


























  • amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

    – 3bbing
    Nov 20 '18 at 21:17











  • Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

    – 3bbing
    Dec 2 '18 at 11:31


















1














EDIT: Sorry - my first solution misread the question. Try this instead



You can run this using data table:



#Your set up data (with seed)
set.seed(123)
id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

# Code to run

library(data.table)

setDT(data)
dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)
dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]
dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]
dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")


Essentially what you are doing is creating the "drinks" by collapsing the columns together, counting the number of times that drink was made by others, and then merging that back to your original data set.



dt_out
id names bools1 bools2 bools3 bools4 bools5 drink times_made_others
1: 1 creator 1 FALSE TRUE FALSE TRUE TRUE FALSE_TRUE_FALSE_TRUE_TRUE 3
2: 2 creator 1 FALSE FALSE TRUE TRUE TRUE FALSE_FALSE_TRUE_TRUE_TRUE 1
3: 3 creator 1 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2
4: 4 creator 1 TRUE TRUE FALSE FALSE TRUE TRUE_TRUE_FALSE_FALSE_TRUE 0
5: 5 creator 1 TRUE FALSE FALSE FALSE FALSE TRUE_FALSE_FALSE_FALSE_FALSE 3
6: 6 creator 2 TRUE TRUE FALSE FALSE FALSE TRUE_TRUE_FALSE_FALSE_FALSE 2
7: 7 creator 2 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2





share|improve this answer


























  • amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

    – 3bbing
    Nov 20 '18 at 21:17











  • Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

    – 3bbing
    Dec 2 '18 at 11:31
















1












1








1







EDIT: Sorry - my first solution misread the question. Try this instead



You can run this using data table:



#Your set up data (with seed)
set.seed(123)
id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

# Code to run

library(data.table)

setDT(data)
dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)
dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]
dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]
dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")


Essentially what you are doing is creating the "drinks" by collapsing the columns together, counting the number of times that drink was made by others, and then merging that back to your original data set.



dt_out
id names bools1 bools2 bools3 bools4 bools5 drink times_made_others
1: 1 creator 1 FALSE TRUE FALSE TRUE TRUE FALSE_TRUE_FALSE_TRUE_TRUE 3
2: 2 creator 1 FALSE FALSE TRUE TRUE TRUE FALSE_FALSE_TRUE_TRUE_TRUE 1
3: 3 creator 1 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2
4: 4 creator 1 TRUE TRUE FALSE FALSE TRUE TRUE_TRUE_FALSE_FALSE_TRUE 0
5: 5 creator 1 TRUE FALSE FALSE FALSE FALSE TRUE_FALSE_FALSE_FALSE_FALSE 3
6: 6 creator 2 TRUE TRUE FALSE FALSE FALSE TRUE_TRUE_FALSE_FALSE_FALSE 2
7: 7 creator 2 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2





share|improve this answer















EDIT: Sorry - my first solution misread the question. Try this instead



You can run this using data table:



#Your set up data (with seed)
set.seed(123)
id <- 1:50
names <- paste("creator", rep(1:10, each = 5))
bools1 <- rnorm(n=50, mean = 5, sd = 3)
bools1 <- ifelse(bools1>5, TRUE, FALSE)
bools2 <- rnorm(n=50, mean = 5, sd = 3)
bools2 <- ifelse(bools2>5, TRUE, FALSE)
bools3 <- rnorm(n=50, mean = 5, sd = 3)
bools3 <- ifelse(bools3>5, TRUE, FALSE)
bools4 <- rnorm(n=50, mean = 5, sd = 3)
bools4 <- ifelse(bools4>5, TRUE, FALSE)
bools5 <- rnorm(n=50, mean = 5, sd = 3)
bools5 <- ifelse(bools5>5, TRUE, FALSE)

data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

# Code to run

library(data.table)

setDT(data)
dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)
dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]
dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]
dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")


Essentially what you are doing is creating the "drinks" by collapsing the columns together, counting the number of times that drink was made by others, and then merging that back to your original data set.



dt_out
id names bools1 bools2 bools3 bools4 bools5 drink times_made_others
1: 1 creator 1 FALSE TRUE FALSE TRUE TRUE FALSE_TRUE_FALSE_TRUE_TRUE 3
2: 2 creator 1 FALSE FALSE TRUE TRUE TRUE FALSE_FALSE_TRUE_TRUE_TRUE 1
3: 3 creator 1 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2
4: 4 creator 1 TRUE TRUE FALSE FALSE TRUE TRUE_TRUE_FALSE_FALSE_TRUE 0
5: 5 creator 1 TRUE FALSE FALSE FALSE FALSE TRUE_FALSE_FALSE_FALSE_FALSE 3
6: 6 creator 2 TRUE TRUE FALSE FALSE FALSE TRUE_TRUE_FALSE_FALSE_FALSE 2
7: 7 creator 2 TRUE FALSE FALSE TRUE FALSE TRUE_FALSE_FALSE_TRUE_FALSE 2






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 20:32

























answered Nov 20 '18 at 20:23









ChrisChris

5,03611941




5,03611941













  • amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

    – 3bbing
    Nov 20 '18 at 21:17











  • Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

    – 3bbing
    Dec 2 '18 at 11:31





















  • amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

    – 3bbing
    Nov 20 '18 at 21:17











  • Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

    – 3bbing
    Dec 2 '18 at 11:31



















amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17





amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17













Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31







Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31






















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399121%2fcompare-multiple-boolean-columns-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini