(R) Parse character vector and split into two separate columns
I have a dataframe with character columns of mean (sd) like so:
table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)
I would like to split each column into two columns, one for the mean and one for the sd. Something like:
table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2
)
So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(")
which only partially works as it it does not remove the ending parenthesis.
r regex parsing tidyr
add a comment |
I have a dataframe with character columns of mean (sd) like so:
table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)
I would like to split each column into two columns, one for the mean and one for the sd. Something like:
table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2
)
So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(")
which only partially works as it it does not remove the ending parenthesis.
r regex parsing tidyr
table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))
? That is, just add amutate
call usinggsub
to remove the final)
.
– Lyngbakr
Nov 19 '18 at 17:08
add a comment |
I have a dataframe with character columns of mean (sd) like so:
table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)
I would like to split each column into two columns, one for the mean and one for the sd. Something like:
table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2
)
So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(")
which only partially works as it it does not remove the ending parenthesis.
r regex parsing tidyr
I have a dataframe with character columns of mean (sd) like so:
table <- tribble(
~var1, ~var2,
#------------
"27.0 (3.1)", "171.4 (9.0)",
"27.0 (3.2)", "176.8 (7.2)",
"27.1 (3.0)", "165.0 (6.2)"
)
I would like to split each column into two columns, one for the mean and one for the sd. Something like:
table_split <- tribble(
~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
#---------------------
27.0, 3.1, 171.4, 9.0,
27.0, 3.2, 176.8, 7.2,
27.1, 3.0, 165.0, 6.2
)
So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \(")
which only partially works as it it does not remove the ending parenthesis.
r regex parsing tidyr
r regex parsing tidyr
asked Nov 19 '18 at 17:05
hlineehlinee
1439
1439
table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))
? That is, just add amutate
call usinggsub
to remove the final)
.
– Lyngbakr
Nov 19 '18 at 17:08
add a comment |
table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))
? That is, just add amutate
call usinggsub
to remove the final)
.
– Lyngbakr
Nov 19 '18 at 17:08
table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))
? That is, just add a mutate
call using gsub
to remove the final )
.– Lyngbakr
Nov 19 '18 at 17:08
table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))
? That is, just add a mutate
call using gsub
to remove the final )
.– Lyngbakr
Nov 19 '18 at 17:08
add a comment |
2 Answers
2
active
oldest
votes
Use separate
as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA
in the into
argument.
library(dplyr)
library(tidyr)
table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")
giving:
# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
1
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to useinto = c("mean1", NA, "sd1", NA)
.
– G. Grothendieck
Nov 22 '18 at 15:49
add a comment |
In base R you would do:
nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names
read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)
mean1 sd1 mean2 sd2
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379504%2fr-parse-character-vector-and-split-into-two-separate-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use separate
as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA
in the into
argument.
library(dplyr)
library(tidyr)
table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")
giving:
# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
1
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to useinto = c("mean1", NA, "sd1", NA)
.
– G. Grothendieck
Nov 22 '18 at 15:49
add a comment |
Use separate
as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA
in the into
argument.
library(dplyr)
library(tidyr)
table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")
giving:
# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
1
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to useinto = c("mean1", NA, "sd1", NA)
.
– G. Grothendieck
Nov 22 '18 at 15:49
add a comment |
Use separate
as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA
in the into
argument.
library(dplyr)
library(tidyr)
table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")
giving:
# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
Use separate
as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA
in the into
argument.
library(dplyr)
library(tidyr)
table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")
giving:
# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
edited Nov 21 '18 at 1:42
answered Nov 19 '18 at 17:12
G. GrothendieckG. Grothendieck
149k10131236
149k10131236
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
1
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to useinto = c("mean1", NA, "sd1", NA)
.
– G. Grothendieck
Nov 22 '18 at 15:49
add a comment |
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
1
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to useinto = c("mean1", NA, "sd1", NA)
.
– G. Grothendieck
Nov 22 '18 at 15:49
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
Thank you- this was exactly what I was looking for. If you don't mind, could you explain why sep = "[ ()]+"? I tried sep = "[ ()]" and it did not work. From my understanding, the regex means match one of either space, open or close parenthesis one or more times
– hlinee
Nov 22 '18 at 14:51
1
1
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use
into = c("mean1", NA, "sd1", NA)
.– G. Grothendieck
Nov 22 '18 at 15:49
There are two characters between the two numbers so either add the plus as we did so that space-left-paren is regarded as a single separator or else they will be regarded as two separators in which case we would have to use
into = c("mean1", NA, "sd1", NA)
.– G. Grothendieck
Nov 22 '18 at 15:49
add a comment |
In base R you would do:
nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names
read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)
mean1 sd1 mean2 sd2
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
add a comment |
In base R you would do:
nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names
read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)
mean1 sd1 mean2 sd2
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
add a comment |
In base R you would do:
nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names
read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)
mean1 sd1 mean2 sd2
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
In base R you would do:
nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names
read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)
mean1 sd1 mean2 sd2
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2
answered Nov 19 '18 at 18:13
OnyambuOnyambu
15.8k1521
15.8k1521
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379504%2fr-parse-character-vector-and-split-into-two-separate-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
table %>% separate(var1, c("var1_mean", "var1_sd"), sep = " \(") %>% mutate(var1_sd = gsub(")", "", var1_sd))
? That is, just add amutate
call usinggsub
to remove the final)
.– Lyngbakr
Nov 19 '18 at 17:08