Compare multiple boolean columns in r

little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:

id creator att1 att2 att3 att... att500

a1 person1 TRUE TRUE FALSE ...

a2 person2 TRUE TRUE TRUE ...

a3 person1 TRUE FALSE FALSE ...

a4 person1 TRUE TRUE FALSE ...

a5 person2 TRUE TRUE FALSE ...

And so on. I want to count the occurences of the same attribute combination (about 500 boolish values) by different creators and do this for each line, adding the count to the repective line. In the above example hence I want to have count=1 for the first row (a1) because in a5 a different person has done the very same attribute combination. Notice that a4 does not count, because it is the same combination but by the same person. Think of self mixed cocktails and the frequency they are mixed by different persons independent of each other. row a2 shall have a count of 0, so shall a3 (no same attribute combination) and a4 respectively count = 1 because of a5. a5 has a count of 1 too. However, if other persons mix the same cocktail several times, this shall be counted. I don't want to simply remove duplicates.

My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:

for (row in 1:nrow(data)){ 

# for each row in data

   creator <- row$creator 

# get creator

   attr_tupel <- row[1, 3:500] 

#return the attribute combination of the row

   data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel]) 

# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)

}

Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames

And now I'm stuck. I could for sure write each column separately - but that would take ages. Do I need to cast that dataframe into a list or vector or //insert sthg here// (vector and list doesn't work.) Is it at all possible to compare one row of values with many other rows for equality? I don't think having a duplicate of the row would be the solution, besides usually R does simply loop through the entries when he does not have anything to compare anymore. Why not here?

I read several threads about comparing several columns with each other, but did not succeed in transferring the solutions to my problem. e.g.: wants to look up one value for the boolish value, I have multiple TRUE values , same , wants to convert to a c() - which I could do too and compare those, but kind of a hard way, isn't it?

At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have

id creator att1 ... index

a1 person1 1 2 0 ... 3 

a2 person2 1 2 3 ... 6

and compare that index. Should work. But kind of feel like that is an ugly workaround. Also when thinking of having data other than boolean, like several strings, I'd still in the long run like to able to compare a tupel of columns against each other independent of their content.

What am I missing? :)

Thanks for your help!

as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

edited Nov 20 '18 at 18:46

asked Nov 20 '18 at 18:16

3bbing

355

Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25

1

@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47

Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00

add a comment |

little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:

id creator att1 att2 att3 att... att500

a1 person1 TRUE TRUE FALSE ...

a2 person2 TRUE TRUE TRUE ...

a3 person1 TRUE FALSE FALSE ...

a4 person1 TRUE TRUE FALSE ...

a5 person2 TRUE TRUE FALSE ...

My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:

for (row in 1:nrow(data)){ 

# for each row in data

   creator <- row$creator 

# get creator

   attr_tupel <- row[1, 3:500] 

#return the attribute combination of the row

   data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel]) 

# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)

}

Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames

At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have

id creator att1 ... index

a1 person1 1 2 0 ... 3 

a2 person2 1 2 3 ... 6

What am I missing? :)

Thanks for your help!

as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

edited Nov 20 '18 at 18:46

asked Nov 20 '18 at 18:16

3bbing

355

Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25

1

@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47

Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00

add a comment |

little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:

id creator att1 att2 att3 att... att500

a1 person1 TRUE TRUE FALSE ...

a2 person2 TRUE TRUE TRUE ...

a3 person1 TRUE FALSE FALSE ...

a4 person1 TRUE TRUE FALSE ...

a5 person2 TRUE TRUE FALSE ...

My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:

for (row in 1:nrow(data)){ 

# for each row in data

   creator <- row$creator 

# get creator

   attr_tupel <- row[1, 3:500] 

#return the attribute combination of the row

   data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel]) 

# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)

}

Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames

At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have

id creator att1 ... index

a1 person1 1 2 0 ... 3 

a2 person2 1 2 3 ... 6

What am I missing? :)

Thanks for your help!

as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

edited Nov 20 '18 at 18:46

asked Nov 20 '18 at 18:16

3bbing

355

little crossword puzzle. As always I think I'm missing something. I have a dataframe like this:

id creator att1 att2 att3 att... att500

a1 person1 TRUE TRUE FALSE ...

a2 person2 TRUE TRUE TRUE ...

a3 person1 TRUE FALSE FALSE ...

a4 person1 TRUE TRUE FALSE ...

a5 person2 TRUE TRUE FALSE ...

My plan is hence to loop through the rows, exclude all cocktails by the same creator of the row, take the attribute combination and compare it with all the rows in the temporary dataset:

for (row in 1:nrow(data)){ 

# for each row in data

   creator <- row$creator 

# get creator

   attr_tupel <- row[1, 3:500] 

#return the attribute combination of the row

   data[row]$count <- nrow(data[data$creator != creator & data[3:500] == attr_tupel]) 

# into the column $count of the current row write the number of observations that are not from the same creator and match the exact tupel of my ~500 Attributes (equal cocktails by different persons)

}

Unfortunately I can't compare the tupel of the reference row with the other rows, as
‘==’ only defined for equally-sized data frames

At last (from that last link) I was now even thinking of converting the boolish values to a number (adding indices so that we have

id creator att1 ... index

a1 person1 1 2 0 ... 3 

a2 person2 1 2 3 ... 6

What am I missing? :)

Thanks for your help!

as asked for in the comment, here short script to create a similar dataframe. Keep in mind though that there are way more columns to compare.

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)

r loops boolean comparison

edited Nov 20 '18 at 18:46

asked Nov 20 '18 at 18:16

3bbing

355

edited Nov 20 '18 at 18:46

asked Nov 20 '18 at 18:16

3bbing

355

edited Nov 20 '18 at 18:46

asked Nov 20 '18 at 18:16

3bbing

355

asked Nov 20 '18 at 18:16

3bbing

355

asked Nov 20 '18 at 18:16

3bbing

355

Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25

1

@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47

Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00

add a comment |

Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25

1

@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47

Something like m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00

Hi akrun, thanks! You can just cut it off there. it's just there to notice that a solution like this nrow(data[data$att1 == row$att1 & data$att2 == row att2 & data$att3 == row$att3]) would not be practical. The issue particularly evolves through the size of different combinations in about 500 columns.

– 3bbing
Nov 20 '18 at 18:25

@akrun Above I added some code to create examplary dataframe. Thanks!

– 3bbing
Nov 20 '18 at 18:47

Something like

m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00

Something like

m1 <- combn(names(data)[-(1:2)], 2, FUN = function(x) rowSums(data[x])); colnames(m1) <- combn(names(data)[-(1:2)], 2, FUN = paste, collapse="_")

– akrun
Nov 20 '18 at 19:00

add a comment |

1 Answer
1

active

oldest

votes

EDIT: Sorry - my first solution misread the question. Try this instead

You can run this using data table:

#Your set up data (with seed)

set.seed(123)

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)



# Code to run



library(data.table)



setDT(data)

dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)

dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]

dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]

dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")

Essentially what you are doing is creating the "drinks" by collapsing the columns together, counting the number of times that drink was made by others, and then merging that back to your original data set.

dt_out

    id      names bools1 bools2 bools3 bools4 bools5                        drink times_made_others

 1:  1  creator 1  FALSE   TRUE  FALSE   TRUE   TRUE   FALSE_TRUE_FALSE_TRUE_TRUE                 3

 2:  2  creator 1  FALSE  FALSE   TRUE   TRUE   TRUE   FALSE_FALSE_TRUE_TRUE_TRUE                 1

 3:  3  creator 1   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

 4:  4  creator 1   TRUE   TRUE  FALSE  FALSE   TRUE   TRUE_TRUE_FALSE_FALSE_TRUE                 0

 5:  5  creator 1   TRUE  FALSE  FALSE  FALSE  FALSE TRUE_FALSE_FALSE_FALSE_FALSE                 3

 6:  6  creator 2   TRUE   TRUE  FALSE  FALSE  FALSE  TRUE_TRUE_FALSE_FALSE_FALSE                 2

 7:  7  creator 2   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

edited Nov 20 '18 at 20:32

answered Nov 20 '18 at 20:23

Chris

5,03611941

amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17

Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399121%2fcompare-multiple-boolean-columns-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

EDIT: Sorry - my first solution misread the question. Try this instead

You can run this using data table:

#Your set up data (with seed)

set.seed(123)

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)



# Code to run



library(data.table)



setDT(data)

dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)

dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]

dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]

dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")

dt_out

    id      names bools1 bools2 bools3 bools4 bools5                        drink times_made_others

 1:  1  creator 1  FALSE   TRUE  FALSE   TRUE   TRUE   FALSE_TRUE_FALSE_TRUE_TRUE                 3

 2:  2  creator 1  FALSE  FALSE   TRUE   TRUE   TRUE   FALSE_FALSE_TRUE_TRUE_TRUE                 1

 3:  3  creator 1   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

 4:  4  creator 1   TRUE   TRUE  FALSE  FALSE   TRUE   TRUE_TRUE_FALSE_FALSE_TRUE                 0

 5:  5  creator 1   TRUE  FALSE  FALSE  FALSE  FALSE TRUE_FALSE_FALSE_FALSE_FALSE                 3

 6:  6  creator 2   TRUE   TRUE  FALSE  FALSE  FALSE  TRUE_TRUE_FALSE_FALSE_FALSE                 2

 7:  7  creator 2   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

edited Nov 20 '18 at 20:32

answered Nov 20 '18 at 20:23

Chris

5,03611941

amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17

Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31

add a comment |

EDIT: Sorry - my first solution misread the question. Try this instead

You can run this using data table:

#Your set up data (with seed)

set.seed(123)

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)



# Code to run



library(data.table)



setDT(data)

dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)

dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]

dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]

dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")

dt_out

    id      names bools1 bools2 bools3 bools4 bools5                        drink times_made_others

 1:  1  creator 1  FALSE   TRUE  FALSE   TRUE   TRUE   FALSE_TRUE_FALSE_TRUE_TRUE                 3

 2:  2  creator 1  FALSE  FALSE   TRUE   TRUE   TRUE   FALSE_FALSE_TRUE_TRUE_TRUE                 1

 3:  3  creator 1   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

 4:  4  creator 1   TRUE   TRUE  FALSE  FALSE   TRUE   TRUE_TRUE_FALSE_FALSE_TRUE                 0

 5:  5  creator 1   TRUE  FALSE  FALSE  FALSE  FALSE TRUE_FALSE_FALSE_FALSE_FALSE                 3

 6:  6  creator 2   TRUE   TRUE  FALSE  FALSE  FALSE  TRUE_TRUE_FALSE_FALSE_FALSE                 2

 7:  7  creator 2   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

edited Nov 20 '18 at 20:32

answered Nov 20 '18 at 20:23

Chris

5,03611941

amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17

Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31

add a comment |

EDIT: Sorry - my first solution misread the question. Try this instead

You can run this using data table:

#Your set up data (with seed)

set.seed(123)

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)



# Code to run



library(data.table)



setDT(data)

dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)

dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]

dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]

dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")

dt_out

    id      names bools1 bools2 bools3 bools4 bools5                        drink times_made_others

 1:  1  creator 1  FALSE   TRUE  FALSE   TRUE   TRUE   FALSE_TRUE_FALSE_TRUE_TRUE                 3

 2:  2  creator 1  FALSE  FALSE   TRUE   TRUE   TRUE   FALSE_FALSE_TRUE_TRUE_TRUE                 1

 3:  3  creator 1   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

 4:  4  creator 1   TRUE   TRUE  FALSE  FALSE   TRUE   TRUE_TRUE_FALSE_FALSE_TRUE                 0

 5:  5  creator 1   TRUE  FALSE  FALSE  FALSE  FALSE TRUE_FALSE_FALSE_FALSE_FALSE                 3

 6:  6  creator 2   TRUE   TRUE  FALSE  FALSE  FALSE  TRUE_TRUE_FALSE_FALSE_FALSE                 2

 7:  7  creator 2   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

edited Nov 20 '18 at 20:32

answered Nov 20 '18 at 20:23

Chris

5,03611941

EDIT: Sorry - my first solution misread the question. Try this instead

You can run this using data table:

#Your set up data (with seed)

set.seed(123)

id <- 1:50

names <- paste("creator", rep(1:10, each = 5))

bools1 <- rnorm(n=50, mean = 5, sd = 3)

bools1 <- ifelse(bools1>5, TRUE, FALSE)

bools2 <- rnorm(n=50, mean = 5, sd = 3)

bools2 <- ifelse(bools2>5, TRUE, FALSE)

bools3 <- rnorm(n=50, mean = 5, sd = 3)

bools3 <- ifelse(bools3>5, TRUE, FALSE)

bools4 <- rnorm(n=50, mean = 5, sd = 3)

bools4 <- ifelse(bools4>5, TRUE, FALSE)

bools5 <- rnorm(n=50, mean = 5, sd = 3)

bools5 <- ifelse(bools5>5, TRUE, FALSE)



data <- data.frame(id, names, bools1, bools2, bools3, bools4, bools5)



# Code to run



library(data.table)



setDT(data)

dt_m <- melt(data, id.vars = c("id","names"), variable.factor = TRUE)

dt_m <- dt_m[,.(drink = paste0(value, collapse = "_")), by = .(id, names)]

dt_m[, times_made := .N, by = drink][, times_made_others := times_made - .N, by = .(drink, names)]

dt_out <- merge(data, dt_m[, .(id, drink, times_made_others)], by = "id")

dt_out

    id      names bools1 bools2 bools3 bools4 bools5                        drink times_made_others

 1:  1  creator 1  FALSE   TRUE  FALSE   TRUE   TRUE   FALSE_TRUE_FALSE_TRUE_TRUE                 3

 2:  2  creator 1  FALSE  FALSE   TRUE   TRUE   TRUE   FALSE_FALSE_TRUE_TRUE_TRUE                 1

 3:  3  creator 1   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

 4:  4  creator 1   TRUE   TRUE  FALSE  FALSE   TRUE   TRUE_TRUE_FALSE_FALSE_TRUE                 0

 5:  5  creator 1   TRUE  FALSE  FALSE  FALSE  FALSE TRUE_FALSE_FALSE_FALSE_FALSE                 3

 6:  6  creator 2   TRUE   TRUE  FALSE  FALSE  FALSE  TRUE_TRUE_FALSE_FALSE_FALSE                 2

 7:  7  creator 2   TRUE  FALSE  FALSE   TRUE  FALSE  TRUE_FALSE_FALSE_TRUE_FALSE                 2

edited Nov 20 '18 at 20:32

answered Nov 20 '18 at 20:23

Chris

5,03611941

edited Nov 20 '18 at 20:32

answered Nov 20 '18 at 20:23

Chris

5,03611941

answered Nov 20 '18 at 20:23

Chris

5,03611941

answered Nov 20 '18 at 20:23

Chris

5,03611941

amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17

Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31

add a comment |

amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17

Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31

amazing. Thank you so much. I have tried working it out with datatable functions and .N before too but didn't manage for some reason. Never tried grouping with two grouping variables and somehow overwrote the old value in each new row. Is so lean and straightforward! Never used melt() before, will read into it. In general need some time to digest your code tbh, but I adapted it to the large dataset, checked some cases and it looks flawless. Great idea collapsing the "recipe" that way btw! That will be very helpful not only here but along the road! Thanks again!

– 3bbing
Nov 20 '18 at 21:17

Sidenote: Now with the receipe / multiple columns reduced to one column it is also possible to easily loop through the rows and count. Just in case it's needed for someone: for (row in 1:nrow(data)){ data$count[row] <- nrow(data[data$recipe == data$recipe[row]) } If there is more information on the rows this way you can easily adapt the subsetting. Again Thanks Chris!

– 3bbing
Dec 2 '18 at 11:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

6tPYsBRe16 4gP,7BAdFT7iiqaC,sZVUnAxWmCLfG3ZPGFUwoa6,hTxs

搜尋此網誌

Wsrtjtyk