(R) Parse character vector and split into two separate columns












0















I have a dataframe with character columns of mean (sd) like so:



table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)


I would like to split each column into two columns, one for the mean and one for the sd. Something like:



table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2

)


So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.










share|improve this question























  • table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

    – Lyngbakr
    Nov 19 '18 at 17:08


















0















I have a dataframe with character columns of mean (sd) like so:



table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)


I would like to split each column into two columns, one for the mean and one for the sd. Something like:



table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2

)


So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.










share|improve this question























  • table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

    – Lyngbakr
    Nov 19 '18 at 17:08
















0












0








0








I have a dataframe with character columns of mean (sd) like so:



table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)


I would like to split each column into two columns, one for the mean and one for the sd. Something like:



table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2

)


So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.










share|improve this question














I have a dataframe with character columns of mean (sd) like so:



table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)


I would like to split each column into two columns, one for the mean and one for the sd. Something like:



table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2

)


So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.







r regex parsing tidyr






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 19 '18 at 17:05









hlineehlinee

1439




1439













  • table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

    – Lyngbakr
    Nov 19 '18 at 17:08





















  • table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

    – Lyngbakr
    Nov 19 '18 at 17:08



















table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08







table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08














2 Answers
2






active

oldest

votes


















1














Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.



library(dplyr)
library(tidyr)

table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")


giving:



# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2





share|improve this answer


























  • Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

    – hlinee
    Nov 22 '18 at 14:51






  • 1





    There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

    – G. Grothendieck
    Nov 22 '18 at 15:49





















1














In base R you would do:



nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

mean1 sd1 mean2 sd2
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379504%2fr-parse-character-vector-and-split-into-two-separate-columns%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.



    library(dplyr)
    library(tidyr)

    table %>%
    separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
    separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")


    giving:



    # A tibble: 3 x 4
    mean1 sd1 mean2 sd2
    <chr> <chr> <chr> <chr>
    1 27.0 3.1 171.4 9.0
    2 27.0 3.2 176.8 7.2
    3 27.1 3.0 165.0 6.2





    share|improve this answer


























    • Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

      – hlinee
      Nov 22 '18 at 14:51






    • 1





      There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

      – G. Grothendieck
      Nov 22 '18 at 15:49


















    1














    Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.



    library(dplyr)
    library(tidyr)

    table %>%
    separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
    separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")


    giving:



    # A tibble: 3 x 4
    mean1 sd1 mean2 sd2
    <chr> <chr> <chr> <chr>
    1 27.0 3.1 171.4 9.0
    2 27.0 3.2 176.8 7.2
    3 27.1 3.0 165.0 6.2





    share|improve this answer


























    • Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

      – hlinee
      Nov 22 '18 at 14:51






    • 1





      There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

      – G. Grothendieck
      Nov 22 '18 at 15:49
















    1












    1








    1







    Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.



    library(dplyr)
    library(tidyr)

    table %>%
    separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
    separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")


    giving:



    # A tibble: 3 x 4
    mean1 sd1 mean2 sd2
    <chr> <chr> <chr> <chr>
    1 27.0 3.1 171.4 9.0
    2 27.0 3.2 176.8 7.2
    3 27.1 3.0 165.0 6.2





    share|improve this answer















    Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.



    library(dplyr)
    library(tidyr)

    table %>%
    separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
    separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")


    giving:



    # A tibble: 3 x 4
    mean1 sd1 mean2 sd2
    <chr> <chr> <chr> <chr>
    1 27.0 3.1 171.4 9.0
    2 27.0 3.2 176.8 7.2
    3 27.1 3.0 165.0 6.2






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 21 '18 at 1:42

























    answered Nov 19 '18 at 17:12









    G. GrothendieckG. Grothendieck

    149k10131236




    149k10131236













    • Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

      – hlinee
      Nov 22 '18 at 14:51






    • 1





      There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

      – G. Grothendieck
      Nov 22 '18 at 15:49





















    • Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

      – hlinee
      Nov 22 '18 at 14:51






    • 1





      There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

      – G. Grothendieck
      Nov 22 '18 at 15:49



















    Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

    – hlinee
    Nov 22 '18 at 14:51





    Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

    – hlinee
    Nov 22 '18 at 14:51




    1




    1





    There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

    – G. Grothendieck
    Nov 22 '18 at 15:49







    There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

    – G. Grothendieck
    Nov 22 '18 at 15:49















    1














    In base R you would do:



    nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

    read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

    mean1 sd1 mean2 sd2
    1 27.0 3.1 171.4 9.0
    2 27.0 3.2 176.8 7.2
    3 27.1 3.0 165.0 6.2





    share|improve this answer




























      1














      In base R you would do:



      nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

      read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

      mean1 sd1 mean2 sd2
      1 27.0 3.1 171.4 9.0
      2 27.0 3.2 176.8 7.2
      3 27.1 3.0 165.0 6.2





      share|improve this answer


























        1












        1








        1







        In base R you would do:



        nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

        read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

        mean1 sd1 mean2 sd2
        1 27.0 3.1 171.4 9.0
        2 27.0 3.2 176.8 7.2
        3 27.1 3.0 165.0 6.2





        share|improve this answer













        In base R you would do:



        nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

        read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

        mean1 sd1 mean2 sd2
        1 27.0 3.1 171.4 9.0
        2 27.0 3.2 176.8 7.2
        3 27.1 3.0 165.0 6.2






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 19 '18 at 18:13









        OnyambuOnyambu

        15.8k1521




        15.8k1521






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379504%2fr-parse-character-vector-and-split-into-two-separate-columns%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud

            Zucchini