(R) Parse character vector and split into two separate columns

I have a dataframe with character columns of mean (sd) like so:

table <- tribble(

  ~var1, ~var2,

  #------------

  "27.0 (3.1)", "171.4 (9.0)",

  "27.0 (3.2)", "176.8 (7.2)",

  "27.1 (3.0)", "165.0 (6.2)"

)

I would like to split each column into two columns, one for the mean and one for the sd. Something like:

table_split <- tribble(

  ~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,

  #---------------------

  27.0, 3.1, 171.4, 9.0,

  27.0, 3.2, 176.8, 7.2,

  27.1, 3.0, 165.0, 6.2



)

So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.

asked Nov 19 '18 at 17:05

hlinee

1439

table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08

add a comment |

I have a dataframe with character columns of mean (sd) like so:

table <- tribble(

  ~var1, ~var2,

  #------------

  "27.0 (3.1)", "171.4 (9.0)",

  "27.0 (3.2)", "176.8 (7.2)",

  "27.1 (3.0)", "165.0 (6.2)"

)

I would like to split each column into two columns, one for the mean and one for the sd. Something like:

table_split <- tribble(

  ~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,

  #---------------------

  27.0, 3.1, 171.4, 9.0,

  27.0, 3.2, 176.8, 7.2,

  27.1, 3.0, 165.0, 6.2



)

So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.

asked Nov 19 '18 at 17:05

hlinee

1439

table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08

add a comment |

I have a dataframe with character columns of mean (sd) like so:

table <- tribble(

  ~var1, ~var2,

  #------------

  "27.0 (3.1)", "171.4 (9.0)",

  "27.0 (3.2)", "176.8 (7.2)",

  "27.1 (3.0)", "165.0 (6.2)"

)

I would like to split each column into two columns, one for the mean and one for the sd. Something like:

table_split <- tribble(

  ~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,

  #---------------------

  27.0, 3.1, 171.4, 9.0,

  27.0, 3.2, 176.8, 7.2,

  27.1, 3.0, 165.0, 6.2



)

So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.

asked Nov 19 '18 at 17:05

hlinee

1439

I have a dataframe with character columns of mean (sd) like so:

table <- tribble(

  ~var1, ~var2,

  #------------

  "27.0 (3.1)", "171.4 (9.0)",

  "27.0 (3.2)", "176.8 (7.2)",

  "27.1 (3.0)", "165.0 (6.2)"

)

I would like to split each column into two columns, one for the mean and one for the sd. Something like:

table_split <- tribble(

  ~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,

  #---------------------

  27.0, 3.1, 171.4, 9.0,

  27.0, 3.2, 176.8, 7.2,

  27.1, 3.0, 165.0, 6.2



)

So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(") which only partially works as it it does not remove the ending parenthesis.

r regex parsing tidyr

asked Nov 19 '18 at 17:05

hlinee

1439

asked Nov 19 '18 at 17:05

hlinee

1439

asked Nov 19 '18 at 17:05

hlinee

1439

asked Nov 19 '18 at 17:05

hlinee

1439

asked Nov 19 '18 at 17:05

hlinee

1439

table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08

add a comment |

table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08

table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))? That is, just add a mutate call using gsub to remove the final ).

– Lyngbakr
Nov 19 '18 at 17:08

add a comment |

2 Answers
2

active

oldest

votes

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)

library(tidyr)  



table %>% 

  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%

  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4

  mean1 sd1   mean2 sd2  

  <chr> <chr> <chr> <chr>

1 27.0  3.1   171.4 9.0  

2 27.0  3.2   176.8 7.2  

3 27.1  3.0   165.0 6.2

edited Nov 21 '18 at 1:42

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

– hlinee
Nov 22 '18 at 14:51

1

There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

– G. Grothendieck
Nov 22 '18 at 15:49

add a comment |

In base R you would do:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names



read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)



  mean1 sd1 mean2 sd2

1  27.0 3.1 171.4 9.0

2  27.0 3.2 176.8 7.2

3  27.1 3.0 165.0 6.2

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379504%2fr-parse-character-vector-and-split-into-two-separate-columns%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)

library(tidyr)  



table %>% 

  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%

  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4

  mean1 sd1   mean2 sd2  

  <chr> <chr> <chr> <chr>

1 27.0  3.1   171.4 9.0  

2 27.0  3.2   176.8 7.2  

3 27.1  3.0   165.0 6.2

edited Nov 21 '18 at 1:42

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

– hlinee
Nov 22 '18 at 14:51

1

There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

– G. Grothendieck
Nov 22 '18 at 15:49

add a comment |

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)

library(tidyr)  



table %>% 

  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%

  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4

  mean1 sd1   mean2 sd2  

  <chr> <chr> <chr> <chr>

1 27.0  3.1   171.4 9.0  

2 27.0  3.2   176.8 7.2  

3 27.1  3.0   165.0 6.2

edited Nov 21 '18 at 1:42

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

– hlinee
Nov 22 '18 at 14:51

1

There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

– G. Grothendieck
Nov 22 '18 at 15:49

add a comment |

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)

library(tidyr)  



table %>% 

  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%

  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4

  mean1 sd1   mean2 sd2  

  <chr> <chr> <chr> <chr>

1 27.0  3.1   171.4 9.0  

2 27.0  3.2   176.8 7.2  

3 27.1  3.0   165.0 6.2

edited Nov 21 '18 at 1:42

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)

library(tidyr)  



table %>% 

  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%

  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4

  mean1 sd1   mean2 sd2  

  <chr> <chr> <chr> <chr>

1 27.0  3.1   171.4 9.0  

2 27.0  3.2   176.8 7.2  

3 27.1  3.0   165.0 6.2

edited Nov 21 '18 at 1:42

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

edited Nov 21 '18 at 1:42

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

answered Nov 19 '18 at 17:12

G. Grothendieck

149k10131236

Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

– hlinee
Nov 22 '18 at 14:51

1

There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

– G. Grothendieck
Nov 22 '18 at 15:49

add a comment |

Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

– hlinee
Nov 22 '18 at 14:51

1

There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

– G. Grothendieck
Nov 22 '18 at 15:49

Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times

– hlinee
Nov 22 '18 at 14:51

There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use into = c("mean1", NA, "sd1", NA).

– G. Grothendieck
Nov 22 '18 at 15:49

add a comment |

In base R you would do:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names



read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)



  mean1 sd1 mean2 sd2

1  27.0 3.1 171.4 9.0

2  27.0 3.2 176.8 7.2

3  27.1 3.0 165.0 6.2

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

add a comment |

In base R you would do:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names



read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)



  mean1 sd1 mean2 sd2

1  27.0 3.1 171.4 9.0

2  27.0 3.2 176.8 7.2

3  27.1 3.0 165.0 6.2

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

add a comment |

In base R you would do:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names



read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)



  mean1 sd1 mean2 sd2

1  27.0 3.1 171.4 9.0

2  27.0 3.2 176.8 7.2

3  27.1 3.0 165.0 6.2

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

In base R you would do:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names



read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)



  mean1 sd1 mean2 sd2

1  27.0 3.1 171.4 9.0

2  27.0 3.2 176.8 7.2

3  27.1 3.0 165.0 6.2

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

answered Nov 19 '18 at 18:13

Onyambu

15.8k1521

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk