Count unique characters present in a group of columns












1















I have the following dataset



dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))


I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.



I have tried entering this code to count unique characters from all columns



b <- dat %>%
group_by(group) %>%
mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))


The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6



dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(6,6,6,6,6))


However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)



dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
total = c(3,3,3,3,3))


Thank you for your help










share|improve this question



























    1















    I have the following dataset



    dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
    t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
    t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
    t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))


    I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.



    I have tried entering this code to count unique characters from all columns



    b <- dat %>%
    group_by(group) %>%
    mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
    n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
    n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))


    The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6



    dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
    t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
    t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
    t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
    total = c(6,6,6,6,6))


    However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)



    dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
    t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
    t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
    t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
    total = c(3,3,3,3,3))


    Thank you for your help










    share|improve this question

























      1












      1








      1








      I have the following dataset



      dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
      t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
      t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
      t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))


      I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.



      I have tried entering this code to count unique characters from all columns



      b <- dat %>%
      group_by(group) %>%
      mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
      n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
      n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))


      The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6



      dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
      t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
      t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
      t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
      total = c(6,6,6,6,6))


      However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)



      dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
      t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
      t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
      t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
      total = c(3,3,3,3,3))


      Thank you for your help










      share|improve this question














      I have the following dataset



      dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
      t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
      t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
      t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA))


      I am trying to count the number of unique characters present across multiple columns (t1,t2,t3) in a group encompassing multiple rows (variable = 'group'). Whether or not each character is counted depends on the associated p1,p2,p3 values being >= 0.05.



      I have tried entering this code to count unique characters from all columns



      b <- dat %>%
      group_by(group) %>%
      mutate(total = sum(n_distinct(t1[p1 >= 0.05], na.rm = TRUE),
      n_distinct(t2[p2 >= 0.05], na.rm = TRUE),
      n_distinct(t3[p3 >= 0.05], na.rm = TRUE)))


      The results being that it counts the unique characters of t1, t2, and t3, and then sums that up resulting in the following dataset where 'total' is 6



      dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
      t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
      t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
      t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
      total = c(6,6,6,6,6))


      However, what I am looking to do is count the total number of unique characters across all of the columns. In other words, if 'a' was seen in column t1, then don't count 'a' in the other columns (t2,t3). The final product would look like the following dataset where 'total' is 3 (representing a,b, and c being the unique characters present)



      dat <- data.frame(group = c(1,1,1,1,1), id = c(1,2,3,4,5),
      t1 = c('a','a','b','b','c'),p1 = c(0.98,1,0.5,0.9,1),
      t2 = c('b',NA,'a','c',NA),p2 = c(0.02,NA,0.25,0.10,NA),
      t3 = c(NA,NA,'c',NA,NA),p3 = c(NA,NA,0.25,NA,NA),
      total = c(3,3,3,3,3))


      Thank you for your help







      r dplyr






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 18:00









      dminahandminahan

      302




      302
























          3 Answers
          3






          active

          oldest

          votes


















          3














          We can use map2 to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list, filter the rows based on the values of 'p' column, pull the 't' column, get the number of distinct elements (n_distinct) and create a new column with bind_cols



          library(tidyverse)
          map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
          dat %>%
          select(.x, .y) %>%
          filter_at(.y, all_vars(. >= 0.05)) %>%
          pull(.x) %>%
          n_distinct) %>%
          bind_cols(dat, total = .)


          and if it is 'unique' across the columns, then instead of getting the n_distinct per each column, unlist the list and then apply the n_distinct, then create the new column in 'dat' with mutate



          map2(paste0('t', 1:3), paste0('p', 1:3), ~ 
          dat %>%
          select(.x, .y) %>%
          filter_at(.y, all_vars(. >= 0.05)) %>%
          pull(.x) ) %>%
          unlist %>%
          n_distinct %>%
          mutate(dat, total = .)
          # group id t1 p1 t2 p2 t3 p3 total
          #1 1 1 a 0.98 b 0.02 <NA> NA 3
          #2 1 2 a 1.00 <NA> NA <NA> NA 3
          #3 1 3 b 0.50 a 0.25 c 0.25 3
          #4 1 4 b 0.90 c 0.10 <NA> NA 3
          #5 1 5 c 1.00 <NA> NA <NA> NA 3


          Update



          If we do a group_by n_distinct, the above can be changed by summariseing the n_distinct of 'total' after grouping by 'group'



          map2_df(paste0('t', 1:3), paste0('p', 1:3), ~ 
          dat %>%
          group_by(group) %>%
          select(.x, .y) %>%
          filter_at(.y, all_vars(. >= 0.05)) %>%
          select(-.y) %>%
          rename_at(.x, ~ 'total')) %>%
          summarise(total = n_distinct(total)) %>%
          left_join(dat)





          share|improve this answer





















          • 1





            the update solved my problem. Thank you!

            – dminahan
            Nov 14 '18 at 1:55



















          3














          First lapply though the p columns with the function >= 0.05, and then select rows where the result is true for all p columns with Reduce('&'. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)], and unlist it (coerce it to a single vector), which allows you to use UniqueN directly to count the uniques.



          library(data.table)
          setDT(dat)

          dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
          uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
          , by = group]



          # group id t1 p1 t2 p2 t3 p3 total
          # 1: 1 1 a 0.98 b 0.02 <NA> NA 3
          # 2: 1 2 a 1.00 <NA> NA <NA> NA 3
          # 3: 1 3 b 0.50 a 0.25 c 0.25 3
          # 4: 1 4 b 0.90 c 0.10 <NA> NA 3
          # 5: 1 5 c 1.00 <NA> NA <NA> NA 3





          share|improve this answer

































            0














            This should work for any number of groups and any number of column pairs.



            dat %>%
            gather(key, value, -group, -id) %>%
            separate(key, c('key1', 'key2'), 1) %>%
            group_by(group, id, key2) %>%
            spread(key1, value) %>%
            filter(p >= 0.05) %>%
            pull(t) %>%
            n_distinct() %>%
            mutate(dat, total = .)





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286973%2fcount-unique-characters-present-in-a-group-of-columns%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              We can use map2 to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list, filter the rows based on the values of 'p' column, pull the 't' column, get the number of distinct elements (n_distinct) and create a new column with bind_cols



              library(tidyverse)
              map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) %>%
              n_distinct) %>%
              bind_cols(dat, total = .)


              and if it is 'unique' across the columns, then instead of getting the n_distinct per each column, unlist the list and then apply the n_distinct, then create the new column in 'dat' with mutate



              map2(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) ) %>%
              unlist %>%
              n_distinct %>%
              mutate(dat, total = .)
              # group id t1 p1 t2 p2 t3 p3 total
              #1 1 1 a 0.98 b 0.02 <NA> NA 3
              #2 1 2 a 1.00 <NA> NA <NA> NA 3
              #3 1 3 b 0.50 a 0.25 c 0.25 3
              #4 1 4 b 0.90 c 0.10 <NA> NA 3
              #5 1 5 c 1.00 <NA> NA <NA> NA 3


              Update



              If we do a group_by n_distinct, the above can be changed by summariseing the n_distinct of 'total' after grouping by 'group'



              map2_df(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              group_by(group) %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              select(-.y) %>%
              rename_at(.x, ~ 'total')) %>%
              summarise(total = n_distinct(total)) %>%
              left_join(dat)





              share|improve this answer





















              • 1





                the update solved my problem. Thank you!

                – dminahan
                Nov 14 '18 at 1:55
















              3














              We can use map2 to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list, filter the rows based on the values of 'p' column, pull the 't' column, get the number of distinct elements (n_distinct) and create a new column with bind_cols



              library(tidyverse)
              map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) %>%
              n_distinct) %>%
              bind_cols(dat, total = .)


              and if it is 'unique' across the columns, then instead of getting the n_distinct per each column, unlist the list and then apply the n_distinct, then create the new column in 'dat' with mutate



              map2(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) ) %>%
              unlist %>%
              n_distinct %>%
              mutate(dat, total = .)
              # group id t1 p1 t2 p2 t3 p3 total
              #1 1 1 a 0.98 b 0.02 <NA> NA 3
              #2 1 2 a 1.00 <NA> NA <NA> NA 3
              #3 1 3 b 0.50 a 0.25 c 0.25 3
              #4 1 4 b 0.90 c 0.10 <NA> NA 3
              #5 1 5 c 1.00 <NA> NA <NA> NA 3


              Update



              If we do a group_by n_distinct, the above can be changed by summariseing the n_distinct of 'total' after grouping by 'group'



              map2_df(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              group_by(group) %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              select(-.y) %>%
              rename_at(.x, ~ 'total')) %>%
              summarise(total = n_distinct(total)) %>%
              left_join(dat)





              share|improve this answer





















              • 1





                the update solved my problem. Thank you!

                – dminahan
                Nov 14 '18 at 1:55














              3












              3








              3







              We can use map2 to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list, filter the rows based on the values of 'p' column, pull the 't' column, get the number of distinct elements (n_distinct) and create a new column with bind_cols



              library(tidyverse)
              map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) %>%
              n_distinct) %>%
              bind_cols(dat, total = .)


              and if it is 'unique' across the columns, then instead of getting the n_distinct per each column, unlist the list and then apply the n_distinct, then create the new column in 'dat' with mutate



              map2(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) ) %>%
              unlist %>%
              n_distinct %>%
              mutate(dat, total = .)
              # group id t1 p1 t2 p2 t3 p3 total
              #1 1 1 a 0.98 b 0.02 <NA> NA 3
              #2 1 2 a 1.00 <NA> NA <NA> NA 3
              #3 1 3 b 0.50 a 0.25 c 0.25 3
              #4 1 4 b 0.90 c 0.10 <NA> NA 3
              #5 1 5 c 1.00 <NA> NA <NA> NA 3


              Update



              If we do a group_by n_distinct, the above can be changed by summariseing the n_distinct of 'total' after grouping by 'group'



              map2_df(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              group_by(group) %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              select(-.y) %>%
              rename_at(.x, ~ 'total')) %>%
              summarise(total = n_distinct(total)) %>%
              left_join(dat)





              share|improve this answer















              We can use map2 to loop through the each of the corresponding 't' and 'p' columns, get the subset of columns in the list, filter the rows based on the values of 'p' column, pull the 't' column, get the number of distinct elements (n_distinct) and create a new column with bind_cols



              library(tidyverse)
              map2_int(paste0('t', 1:3), paste0('p', 1:3), ~
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) %>%
              n_distinct) %>%
              bind_cols(dat, total = .)


              and if it is 'unique' across the columns, then instead of getting the n_distinct per each column, unlist the list and then apply the n_distinct, then create the new column in 'dat' with mutate



              map2(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              pull(.x) ) %>%
              unlist %>%
              n_distinct %>%
              mutate(dat, total = .)
              # group id t1 p1 t2 p2 t3 p3 total
              #1 1 1 a 0.98 b 0.02 <NA> NA 3
              #2 1 2 a 1.00 <NA> NA <NA> NA 3
              #3 1 3 b 0.50 a 0.25 c 0.25 3
              #4 1 4 b 0.90 c 0.10 <NA> NA 3
              #5 1 5 c 1.00 <NA> NA <NA> NA 3


              Update



              If we do a group_by n_distinct, the above can be changed by summariseing the n_distinct of 'total' after grouping by 'group'



              map2_df(paste0('t', 1:3), paste0('p', 1:3), ~ 
              dat %>%
              group_by(group) %>%
              select(.x, .y) %>%
              filter_at(.y, all_vars(. >= 0.05)) %>%
              select(-.y) %>%
              rename_at(.x, ~ 'total')) %>%
              summarise(total = n_distinct(total)) %>%
              left_join(dat)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 13 '18 at 18:34

























              answered Nov 13 '18 at 18:04









              akrunakrun

              400k13190265




              400k13190265








              • 1





                the update solved my problem. Thank you!

                – dminahan
                Nov 14 '18 at 1:55














              • 1





                the update solved my problem. Thank you!

                – dminahan
                Nov 14 '18 at 1:55








              1




              1





              the update solved my problem. Thank you!

              – dminahan
              Nov 14 '18 at 1:55





              the update solved my problem. Thank you!

              – dminahan
              Nov 14 '18 at 1:55













              3














              First lapply though the p columns with the function >= 0.05, and then select rows where the result is true for all p columns with Reduce('&'. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)], and unlist it (coerce it to a single vector), which allows you to use UniqueN directly to count the uniques.



              library(data.table)
              setDT(dat)

              dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
              uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
              , by = group]



              # group id t1 p1 t2 p2 t3 p3 total
              # 1: 1 1 a 0.98 b 0.02 <NA> NA 3
              # 2: 1 2 a 1.00 <NA> NA <NA> NA 3
              # 3: 1 3 b 0.50 a 0.25 c 0.25 3
              # 4: 1 4 b 0.90 c 0.10 <NA> NA 3
              # 5: 1 5 c 1.00 <NA> NA <NA> NA 3





              share|improve this answer






























                3














                First lapply though the p columns with the function >= 0.05, and then select rows where the result is true for all p columns with Reduce('&'. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)], and unlist it (coerce it to a single vector), which allows you to use UniqueN directly to count the uniques.



                library(data.table)
                setDT(dat)

                dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
                uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
                , by = group]



                # group id t1 p1 t2 p2 t3 p3 total
                # 1: 1 1 a 0.98 b 0.02 <NA> NA 3
                # 2: 1 2 a 1.00 <NA> NA <NA> NA 3
                # 3: 1 3 b 0.50 a 0.25 c 0.25 3
                # 4: 1 4 b 0.90 c 0.10 <NA> NA 3
                # 5: 1 5 c 1.00 <NA> NA <NA> NA 3





                share|improve this answer




























                  3












                  3








                  3







                  First lapply though the p columns with the function >= 0.05, and then select rows where the result is true for all p columns with Reduce('&'. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)], and unlist it (coerce it to a single vector), which allows you to use UniqueN directly to count the uniques.



                  library(data.table)
                  setDT(dat)

                  dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
                  uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
                  , by = group]



                  # group id t1 p1 t2 p2 t3 p3 total
                  # 1: 1 1 a 0.98 b 0.02 <NA> NA 3
                  # 2: 1 2 a 1.00 <NA> NA <NA> NA 3
                  # 3: 1 3 b 0.50 a 0.25 c 0.25 3
                  # 4: 1 4 b 0.90 c 0.10 <NA> NA 3
                  # 5: 1 5 c 1.00 <NA> NA <NA> NA 3





                  share|improve this answer















                  First lapply though the p columns with the function >= 0.05, and then select rows where the result is true for all p columns with Reduce('&'. Once you have the rows, you can select the data for which you want to count uniques with .SD[rows, paste0('t', 1:3)], and unlist it (coerce it to a single vector), which allows you to use UniqueN directly to count the uniques.



                  library(data.table)
                  setDT(dat)

                  dat[, total := {rows <- Reduce('&', lapply(.(p1, p2, p3), '>=', 0.05))
                  uniqueN(unlist(.SD[rows, paste0('t', 1:3)]))}
                  , by = group]



                  # group id t1 p1 t2 p2 t3 p3 total
                  # 1: 1 1 a 0.98 b 0.02 <NA> NA 3
                  # 2: 1 2 a 1.00 <NA> NA <NA> NA 3
                  # 3: 1 3 b 0.50 a 0.25 c 0.25 3
                  # 4: 1 4 b 0.90 c 0.10 <NA> NA 3
                  # 5: 1 5 c 1.00 <NA> NA <NA> NA 3






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 13 '18 at 18:22

























                  answered Nov 13 '18 at 18:07









                  IceCreamToucanIceCreamToucan

                  9,0621716




                  9,0621716























                      0














                      This should work for any number of groups and any number of column pairs.



                      dat %>%
                      gather(key, value, -group, -id) %>%
                      separate(key, c('key1', 'key2'), 1) %>%
                      group_by(group, id, key2) %>%
                      spread(key1, value) %>%
                      filter(p >= 0.05) %>%
                      pull(t) %>%
                      n_distinct() %>%
                      mutate(dat, total = .)





                      share|improve this answer




























                        0














                        This should work for any number of groups and any number of column pairs.



                        dat %>%
                        gather(key, value, -group, -id) %>%
                        separate(key, c('key1', 'key2'), 1) %>%
                        group_by(group, id, key2) %>%
                        spread(key1, value) %>%
                        filter(p >= 0.05) %>%
                        pull(t) %>%
                        n_distinct() %>%
                        mutate(dat, total = .)





                        share|improve this answer


























                          0












                          0








                          0







                          This should work for any number of groups and any number of column pairs.



                          dat %>%
                          gather(key, value, -group, -id) %>%
                          separate(key, c('key1', 'key2'), 1) %>%
                          group_by(group, id, key2) %>%
                          spread(key1, value) %>%
                          filter(p >= 0.05) %>%
                          pull(t) %>%
                          n_distinct() %>%
                          mutate(dat, total = .)





                          share|improve this answer













                          This should work for any number of groups and any number of column pairs.



                          dat %>%
                          gather(key, value, -group, -id) %>%
                          separate(key, c('key1', 'key2'), 1) %>%
                          group_by(group, id, key2) %>%
                          spread(key1, value) %>%
                          filter(p >= 0.05) %>%
                          pull(t) %>%
                          n_distinct() %>%
                          mutate(dat, total = .)






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 13 '18 at 19:02









                          dmcadmca

                          432414




                          432414






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286973%2fcount-unique-characters-present-in-a-group-of-columns%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              這個網誌中的熱門文章

                              Tangent Lines Diagram Along Smooth Curve

                              Yusuf al-Mu'taman ibn Hud

                              Zucchini