Count unique characters present in a group of columns
I have the following dataset
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))
I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.
I have tried entering this code to count unique characters from all columns
b <- dat %>%
group_by(group) %>%
mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))
The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(6,6,6,6,6))
However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(3,3,3,3,3))
Thank you for your help
r dplyr
add a comment |
I have the following dataset
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))
I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.
I have tried entering this code to count unique characters from all columns
b <- dat %>%
group_by(group) %>%
mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))
The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(6,6,6,6,6))
However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(3,3,3,3,3))
Thank you for your help
r dplyr
add a comment |
I have the following dataset
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))
I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.
I have tried entering this code to count unique characters from all columns
b <- dat %>%
group_by(group) %>%
mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))
The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(6,6,6,6,6))
However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(3,3,3,3,3))
Thank you for your help
r dplyr
I have the following dataset
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))
I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.
I have tried entering this code to count unique characters from all columns
b <- dat %>%
group_by(group) %>%
mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))
The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(6,6,6,6,6))
However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)
dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(3,3,3,3,3))
Thank you for your help
r dplyr
r dplyr
asked Nov 13 '18 at 18:00
dminahandminahan
302
302
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
We can use map2
to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list
, filter
the rows based on the values of 'p' column, pull
the 't' column, get the number of distinct elements (n_distinct
) and create a new column with bind_cols
library(tidyverse)
map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) %>%
n_distinct) %>%
bind_cols(dat, total = .)
and if it is 'unique' across the columns, then instead of getting the n_distinct
per each column, unlist
the list
and then apply the n_distinct
, then create the new column in 'dat' with mutate
map2(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) ) %>%
unlist %>%
n_distinct %>%
mutate(dat, total = .)
# group id t1 p1 t2 p2 t3 p3 total
#1 1 1 a 0.98 b 0.02 <NA> NA 3
#2 1 2 a 1.00 <NA> NA <NA> NA 3
#3 1 3 b 0.50 a 0.25 c 0.25 3
#4 1 4 b 0.90 c 0.10 <NA> NA 3
#5 1 5 c 1.00 <NA> NA <NA> NA 3
Update
If we do a group_by
n_distinct
, the above can be changed by summarise
ing the n_distinct
of 'total' after grouping by 'group'
map2_df(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
group_by(group) %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
select(-.y) %>%
rename_at(.x, ~ 'total')) %>%
summarise(total = n_distinct(total)) %>%
left_join(dat)
1
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
add a comment |
First lapply
though the p
columns with the function >= 0.05
, and then select rows where the result is true for all p
columns with Reduce('&'
. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)]
, and unlist
it (coerce it to a single vector), which allows you to use UniqueN
directly to count the uniques.
library(data.table)
setDT(dat)
dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
, by = group]
# group id t1 p1 t2 p2 t3 p3 total
# 1: 1 1 a 0.98 b 0.02 <NA> NA 3
# 2: 1 2 a 1.00 <NA> NA <NA> NA 3
# 3: 1 3 b 0.50 a 0.25 c 0.25 3
# 4: 1 4 b 0.90 c 0.10 <NA> NA 3
# 5: 1 5 c 1.00 <NA> NA <NA> NA 3
add a comment |
This should work for any number of groups and any number of column pairs.
dat %>%
gather(key, value, -group, -id) %>%
separate(key, c('key1', 'key2'), 1) %>%
group_by(group, id, key2) %>%
spread(key1, value) %>%
filter(p >= 0.05) %>%
pull(t) %>%
n_distinct() %>%
mutate(dat, total = .)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286973%2fcount-unique-characters-present-in-a-group-of-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
We can use map2
to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list
, filter
the rows based on the values of 'p' column, pull
the 't' column, get the number of distinct elements (n_distinct
) and create a new column with bind_cols
library(tidyverse)
map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) %>%
n_distinct) %>%
bind_cols(dat, total = .)
and if it is 'unique' across the columns, then instead of getting the n_distinct
per each column, unlist
the list
and then apply the n_distinct
, then create the new column in 'dat' with mutate
map2(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) ) %>%
unlist %>%
n_distinct %>%
mutate(dat, total = .)
# group id t1 p1 t2 p2 t3 p3 total
#1 1 1 a 0.98 b 0.02 <NA> NA 3
#2 1 2 a 1.00 <NA> NA <NA> NA 3
#3 1 3 b 0.50 a 0.25 c 0.25 3
#4 1 4 b 0.90 c 0.10 <NA> NA 3
#5 1 5 c 1.00 <NA> NA <NA> NA 3
Update
If we do a group_by
n_distinct
, the above can be changed by summarise
ing the n_distinct
of 'total' after grouping by 'group'
map2_df(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
group_by(group) %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
select(-.y) %>%
rename_at(.x, ~ 'total')) %>%
summarise(total = n_distinct(total)) %>%
left_join(dat)
1
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
add a comment |
We can use map2
to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list
, filter
the rows based on the values of 'p' column, pull
the 't' column, get the number of distinct elements (n_distinct
) and create a new column with bind_cols
library(tidyverse)
map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) %>%
n_distinct) %>%
bind_cols(dat, total = .)
and if it is 'unique' across the columns, then instead of getting the n_distinct
per each column, unlist
the list
and then apply the n_distinct
, then create the new column in 'dat' with mutate
map2(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) ) %>%
unlist %>%
n_distinct %>%
mutate(dat, total = .)
# group id t1 p1 t2 p2 t3 p3 total
#1 1 1 a 0.98 b 0.02 <NA> NA 3
#2 1 2 a 1.00 <NA> NA <NA> NA 3
#3 1 3 b 0.50 a 0.25 c 0.25 3
#4 1 4 b 0.90 c 0.10 <NA> NA 3
#5 1 5 c 1.00 <NA> NA <NA> NA 3
Update
If we do a group_by
n_distinct
, the above can be changed by summarise
ing the n_distinct
of 'total' after grouping by 'group'
map2_df(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
group_by(group) %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
select(-.y) %>%
rename_at(.x, ~ 'total')) %>%
summarise(total = n_distinct(total)) %>%
left_join(dat)
1
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
add a comment |
We can use map2
to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list
, filter
the rows based on the values of 'p' column, pull
the 't' column, get the number of distinct elements (n_distinct
) and create a new column with bind_cols
library(tidyverse)
map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) %>%
n_distinct) %>%
bind_cols(dat, total = .)
and if it is 'unique' across the columns, then instead of getting the n_distinct
per each column, unlist
the list
and then apply the n_distinct
, then create the new column in 'dat' with mutate
map2(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) ) %>%
unlist %>%
n_distinct %>%
mutate(dat, total = .)
# group id t1 p1 t2 p2 t3 p3 total
#1 1 1 a 0.98 b 0.02 <NA> NA 3
#2 1 2 a 1.00 <NA> NA <NA> NA 3
#3 1 3 b 0.50 a 0.25 c 0.25 3
#4 1 4 b 0.90 c 0.10 <NA> NA 3
#5 1 5 c 1.00 <NA> NA <NA> NA 3
Update
If we do a group_by
n_distinct
, the above can be changed by summarise
ing the n_distinct
of 'total' after grouping by 'group'
map2_df(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
group_by(group) %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
select(-.y) %>%
rename_at(.x, ~ 'total')) %>%
summarise(total = n_distinct(total)) %>%
left_join(dat)
We can use map2
to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list
, filter
the rows based on the values of 'p' column, pull
the 't' column, get the number of distinct elements (n_distinct
) and create a new column with bind_cols
library(tidyverse)
map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) %>%
n_distinct) %>%
bind_cols(dat, total = .)
and if it is 'unique' across the columns, then instead of getting the n_distinct
per each column, unlist
the list
and then apply the n_distinct
, then create the new column in 'dat' with mutate
map2(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
pull(.x) ) %>%
unlist %>%
n_distinct %>%
mutate(dat, total = .)
# group id t1 p1 t2 p2 t3 p3 total
#1 1 1 a 0.98 b 0.02 <NA> NA 3
#2 1 2 a 1.00 <NA> NA <NA> NA 3
#3 1 3 b 0.50 a 0.25 c 0.25 3
#4 1 4 b 0.90 c 0.10 <NA> NA 3
#5 1 5 c 1.00 <NA> NA <NA> NA 3
Update
If we do a group_by
n_distinct
, the above can be changed by summarise
ing the n_distinct
of 'total' after grouping by 'group'
map2_df(paste0('t', 1:3), paste0('p', 1:3), ~
dat %>%
group_by(group) %>%
select(.x, .y) %>%
filter_at(.y, all_vars(. >= 0.05)) %>%
select(-.y) %>%
rename_at(.x, ~ 'total')) %>%
summarise(total = n_distinct(total)) %>%
left_join(dat)
edited Nov 13 '18 at 18:34
answered Nov 13 '18 at 18:04
akrunakrun
400k13190265
400k13190265
1
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
add a comment |
1
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
1
1
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
the update solved my problem. Thank you!
– dminahan
Nov 14 '18 at 1:55
add a comment |
First lapply
though the p
columns with the function >= 0.05
, and then select rows where the result is true for all p
columns with Reduce('&'
. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)]
, and unlist
it (coerce it to a single vector), which allows you to use UniqueN
directly to count the uniques.
library(data.table)
setDT(dat)
dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
, by = group]
# group id t1 p1 t2 p2 t3 p3 total
# 1: 1 1 a 0.98 b 0.02 <NA> NA 3
# 2: 1 2 a 1.00 <NA> NA <NA> NA 3
# 3: 1 3 b 0.50 a 0.25 c 0.25 3
# 4: 1 4 b 0.90 c 0.10 <NA> NA 3
# 5: 1 5 c 1.00 <NA> NA <NA> NA 3
add a comment |
First lapply
though the p
columns with the function >= 0.05
, and then select rows where the result is true for all p
columns with Reduce('&'
. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)]
, and unlist
it (coerce it to a single vector), which allows you to use UniqueN
directly to count the uniques.
library(data.table)
setDT(dat)
dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
, by = group]
# group id t1 p1 t2 p2 t3 p3 total
# 1: 1 1 a 0.98 b 0.02 <NA> NA 3
# 2: 1 2 a 1.00 <NA> NA <NA> NA 3
# 3: 1 3 b 0.50 a 0.25 c 0.25 3
# 4: 1 4 b 0.90 c 0.10 <NA> NA 3
# 5: 1 5 c 1.00 <NA> NA <NA> NA 3
add a comment |
First lapply
though the p
columns with the function >= 0.05
, and then select rows where the result is true for all p
columns with Reduce('&'
. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)]
, and unlist
it (coerce it to a single vector), which allows you to use UniqueN
directly to count the uniques.
library(data.table)
setDT(dat)
dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
, by = group]
# group id t1 p1 t2 p2 t3 p3 total
# 1: 1 1 a 0.98 b 0.02 <NA> NA 3
# 2: 1 2 a 1.00 <NA> NA <NA> NA 3
# 3: 1 3 b 0.50 a 0.25 c 0.25 3
# 4: 1 4 b 0.90 c 0.10 <NA> NA 3
# 5: 1 5 c 1.00 <NA> NA <NA> NA 3
First lapply
though the p
columns with the function >= 0.05
, and then select rows where the result is true for all p
columns with Reduce('&'
. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)]
, and unlist
it (coerce it to a single vector), which allows you to use UniqueN
directly to count the uniques.
library(data.table)
setDT(dat)
dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
, by = group]
# group id t1 p1 t2 p2 t3 p3 total
# 1: 1 1 a 0.98 b 0.02 <NA> NA 3
# 2: 1 2 a 1.00 <NA> NA <NA> NA 3
# 3: 1 3 b 0.50 a 0.25 c 0.25 3
# 4: 1 4 b 0.90 c 0.10 <NA> NA 3
# 5: 1 5 c 1.00 <NA> NA <NA> NA 3
edited Nov 13 '18 at 18:22
answered Nov 13 '18 at 18:07
IceCreamToucanIceCreamToucan
9,0621716
9,0621716
add a comment |
add a comment |
This should work for any number of groups and any number of column pairs.
dat %>%
gather(key, value, -group, -id) %>%
separate(key, c('key1', 'key2'), 1) %>%
group_by(group, id, key2) %>%
spread(key1, value) %>%
filter(p >= 0.05) %>%
pull(t) %>%
n_distinct() %>%
mutate(dat, total = .)
add a comment |
This should work for any number of groups and any number of column pairs.
dat %>%
gather(key, value, -group, -id) %>%
separate(key, c('key1', 'key2'), 1) %>%
group_by(group, id, key2) %>%
spread(key1, value) %>%
filter(p >= 0.05) %>%
pull(t) %>%
n_distinct() %>%
mutate(dat, total = .)
add a comment |
This should work for any number of groups and any number of column pairs.
dat %>%
gather(key, value, -group, -id) %>%
separate(key, c('key1', 'key2'), 1) %>%
group_by(group, id, key2) %>%
spread(key1, value) %>%
filter(p >= 0.05) %>%
pull(t) %>%
n_distinct() %>%
mutate(dat, total = .)
This should work for any number of groups and any number of column pairs.
dat %>%
gather(key, value, -group, -id) %>%
separate(key, c('key1', 'key2'), 1) %>%
group_by(group, id, key2) %>%
spread(key1, value) %>%
filter(p >= 0.05) %>%
pull(t) %>%
n_distinct() %>%
mutate(dat, total = .)
answered Nov 13 '18 at 19:02
dmcadmca
432414
432414
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286973%2fcount-unique-characters-present-in-a-group-of-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown