Splitting columns ending with “Process” into many columns with individual values in R

I have a dataframe like this

ID <- c("ID21","ID22","ID23","ID24")

STR_PL_CAN_EVOLVE_PROCESS <- c("CCP_A,CCP_B","CCQ_A,CCQ_B,CCQ_C","IOT_A,CCP_B","CCQ_B,IOT_B")

Average <- c(7.5,6.5,7.1,6.6)

STR_VD_CAN_MEASURE_PROCESS <- c("Length,Breadth","Breadth,Width","Height,Length,Width","Width,Length")

Passfail <- c("Pass","Pass","Fail","Fail")



df <- data.frame(ID,STR_PL_CAN_EVOLVE_PROCESS,Average,STR_VD_CAN_MEASURE_PROCESS,Passfail,stringsAsFactors=FALSE)

I am trying to separate the values in the columns ending with "process" into several columns using tidyverse and doing it this way.

library(tidyverse)

df1 <- df %>%

  separate(STR_PL_CAN_EVOLVE_PROCESS, 

           paste0("ST_PL_CA_EV_PR","_Path",

                  seq(1:10)),

           sep = ",") %>%

  separate(STR_VD_CAN_MEASURE_PROCESS, 

           paste0("ST_VD_CA_ME_PR","_Path",

                  seq(1:10)),

           sep = ",")

This works but I manually do a lot of things here (input the column names, new column names). Here are some things that I am trying to achieve

Automatically supply the names of the columns ending with "PROCESS" and separate those columns.

Extract the first 2 characters in the columns names (separated by underscores) as new column names. For example:
STR_PL_CAN_EVOLVE_PROCESS becomes ST_PL_CA_EV_PR

Remove columns that are only NA's

My desired output is

    ID ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 Average ST_VD_CA_ME_PR_Path1 ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3 Passfail

  ID21                CCP_A                CCP_B                 <NA>     7.5               Length              Breadth                 <NA>     Pass

  ID22                CCQ_A                CCQ_B                CCQ_C     6.5              Breadth                Width                 <NA>     Pass

  ID23                IOT_A                CCP_B                 <NA>     7.1               Height               Length                Width     Fail

  ID24                CCQ_B                IOT_B                 <NA>     6.6                Width               Length                 <NA>     Fail

My actual dataset has around 35 columns ending with "PROCESS". Can someone point me in the right direction?

asked Nov 13 '18 at 17:30

Sharath

92011021

Possible duplicate of Split data frame string column into multiple columns

– IceCreamToucan
Nov 13 '18 at 17:53

add a comment |

I have a dataframe like this

ID <- c("ID21","ID22","ID23","ID24")

STR_PL_CAN_EVOLVE_PROCESS <- c("CCP_A,CCP_B","CCQ_A,CCQ_B,CCQ_C","IOT_A,CCP_B","CCQ_B,IOT_B")

Average <- c(7.5,6.5,7.1,6.6)

STR_VD_CAN_MEASURE_PROCESS <- c("Length,Breadth","Breadth,Width","Height,Length,Width","Width,Length")

Passfail <- c("Pass","Pass","Fail","Fail")



df <- data.frame(ID,STR_PL_CAN_EVOLVE_PROCESS,Average,STR_VD_CAN_MEASURE_PROCESS,Passfail,stringsAsFactors=FALSE)

I am trying to separate the values in the columns ending with "process" into several columns using tidyverse and doing it this way.

library(tidyverse)

df1 <- df %>%

  separate(STR_PL_CAN_EVOLVE_PROCESS, 

           paste0("ST_PL_CA_EV_PR","_Path",

                  seq(1:10)),

           sep = ",") %>%

  separate(STR_VD_CAN_MEASURE_PROCESS, 

           paste0("ST_VD_CA_ME_PR","_Path",

                  seq(1:10)),

           sep = ",")

This works but I manually do a lot of things here (input the column names, new column names). Here are some things that I am trying to achieve

Automatically supply the names of the columns ending with "PROCESS" and separate those columns.

Extract the first 2 characters in the columns names (separated by underscores) as new column names. For example:
STR_PL_CAN_EVOLVE_PROCESS becomes ST_PL_CA_EV_PR

Remove columns that are only NA's

My desired output is

    ID ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 Average ST_VD_CA_ME_PR_Path1 ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3 Passfail

  ID21                CCP_A                CCP_B                 <NA>     7.5               Length              Breadth                 <NA>     Pass

  ID22                CCQ_A                CCQ_B                CCQ_C     6.5              Breadth                Width                 <NA>     Pass

  ID23                IOT_A                CCP_B                 <NA>     7.1               Height               Length                Width     Fail

  ID24                CCQ_B                IOT_B                 <NA>     6.6                Width               Length                 <NA>     Fail

My actual dataset has around 35 columns ending with "PROCESS". Can someone point me in the right direction?

asked Nov 13 '18 at 17:30

Sharath

92011021

Possible duplicate of Split data frame string column into multiple columns

– IceCreamToucan
Nov 13 '18 at 17:53

add a comment |

I have a dataframe like this

ID <- c("ID21","ID22","ID23","ID24")

STR_PL_CAN_EVOLVE_PROCESS <- c("CCP_A,CCP_B","CCQ_A,CCQ_B,CCQ_C","IOT_A,CCP_B","CCQ_B,IOT_B")

Average <- c(7.5,6.5,7.1,6.6)

STR_VD_CAN_MEASURE_PROCESS <- c("Length,Breadth","Breadth,Width","Height,Length,Width","Width,Length")

Passfail <- c("Pass","Pass","Fail","Fail")



df <- data.frame(ID,STR_PL_CAN_EVOLVE_PROCESS,Average,STR_VD_CAN_MEASURE_PROCESS,Passfail,stringsAsFactors=FALSE)

I am trying to separate the values in the columns ending with "process" into several columns using tidyverse and doing it this way.

library(tidyverse)

df1 <- df %>%

  separate(STR_PL_CAN_EVOLVE_PROCESS, 

           paste0("ST_PL_CA_EV_PR","_Path",

                  seq(1:10)),

           sep = ",") %>%

  separate(STR_VD_CAN_MEASURE_PROCESS, 

           paste0("ST_VD_CA_ME_PR","_Path",

                  seq(1:10)),

           sep = ",")

This works but I manually do a lot of things here (input the column names, new column names). Here are some things that I am trying to achieve

Automatically supply the names of the columns ending with "PROCESS" and separate those columns.

Extract the first 2 characters in the columns names (separated by underscores) as new column names. For example:
STR_PL_CAN_EVOLVE_PROCESS becomes ST_PL_CA_EV_PR

Remove columns that are only NA's

My desired output is

    ID ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 Average ST_VD_CA_ME_PR_Path1 ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3 Passfail

  ID21                CCP_A                CCP_B                 <NA>     7.5               Length              Breadth                 <NA>     Pass

  ID22                CCQ_A                CCQ_B                CCQ_C     6.5              Breadth                Width                 <NA>     Pass

  ID23                IOT_A                CCP_B                 <NA>     7.1               Height               Length                Width     Fail

  ID24                CCQ_B                IOT_B                 <NA>     6.6                Width               Length                 <NA>     Fail

My actual dataset has around 35 columns ending with "PROCESS". Can someone point me in the right direction?

asked Nov 13 '18 at 17:30

Sharath

92011021

I have a dataframe like this

ID <- c("ID21","ID22","ID23","ID24")

STR_PL_CAN_EVOLVE_PROCESS <- c("CCP_A,CCP_B","CCQ_A,CCQ_B,CCQ_C","IOT_A,CCP_B","CCQ_B,IOT_B")

Average <- c(7.5,6.5,7.1,6.6)

STR_VD_CAN_MEASURE_PROCESS <- c("Length,Breadth","Breadth,Width","Height,Length,Width","Width,Length")

Passfail <- c("Pass","Pass","Fail","Fail")



df <- data.frame(ID,STR_PL_CAN_EVOLVE_PROCESS,Average,STR_VD_CAN_MEASURE_PROCESS,Passfail,stringsAsFactors=FALSE)

I am trying to separate the values in the columns ending with "process" into several columns using tidyverse and doing it this way.

library(tidyverse)

df1 <- df %>%

  separate(STR_PL_CAN_EVOLVE_PROCESS, 

           paste0("ST_PL_CA_EV_PR","_Path",

                  seq(1:10)),

           sep = ",") %>%

  separate(STR_VD_CAN_MEASURE_PROCESS, 

           paste0("ST_VD_CA_ME_PR","_Path",

                  seq(1:10)),

           sep = ",")

This works but I manually do a lot of things here (input the column names, new column names). Here are some things that I am trying to achieve

Automatically supply the names of the columns ending with "PROCESS" and separate those columns.

Extract the first 2 characters in the columns names (separated by underscores) as new column names. For example:
STR_PL_CAN_EVOLVE_PROCESS becomes ST_PL_CA_EV_PR

Remove columns that are only NA's

My desired output is

    ID ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 Average ST_VD_CA_ME_PR_Path1 ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3 Passfail

  ID21                CCP_A                CCP_B                 <NA>     7.5               Length              Breadth                 <NA>     Pass

  ID22                CCQ_A                CCQ_B                CCQ_C     6.5              Breadth                Width                 <NA>     Pass

  ID23                IOT_A                CCP_B                 <NA>     7.1               Height               Length                Width     Fail

  ID24                CCQ_B                IOT_B                 <NA>     6.6                Width               Length                 <NA>     Fail

My actual dataset has around 35 columns ending with "PROCESS". Can someone point me in the right direction?

r dataframe dplyr data.table tidyverse

asked Nov 13 '18 at 17:30

Sharath

92011021

asked Nov 13 '18 at 17:30

Sharath

92011021

asked Nov 13 '18 at 17:30

Sharath

92011021

asked Nov 13 '18 at 17:30

Sharath

92011021

asked Nov 13 '18 at 17:30

Sharath

92011021

Possible duplicate of Split data frame string column into multiple columns

– IceCreamToucan
Nov 13 '18 at 17:53

add a comment |

Possible duplicate of Split data frame string column into multiple columns

– IceCreamToucan
Nov 13 '18 at 17:53

Possible duplicate of Split data frame string column into multiple columns

– IceCreamToucan
Nov 13 '18 at 17:53

add a comment |

2 Answers
2

active

oldest

votes

Here is an option with cSplit

library(splitstackshape)

library(dplyr)

df %>% 

   cSplit(c("STR_PL_CAN_EVOLVE_PROCESS", "STR_VD_CAN_MEASURE_PROCESS"), 

                     ',', drop = TRUE)

#ID Average Passfail STR_PL_CAN_EVOLVE_PROCESS_1 STR_PL_CAN_EVOLVE_PROCESS_2 STR_PL_CAN_EVOLVE_PROCESS_3

#1: ID21     7.5     Pass                       CCP_A                       CCP_B                        <NA>

#2: ID22     6.5     Pass                       CCQ_A                       CCQ_B                       CCQ_C

#3: ID23     7.1     Fail                       IOT_A                       CCP_B                        <NA>

#4: ID24     6.6     Fail                       CCQ_B                       IOT_B                        <NA>

#   STR_VD_CAN_MEASURE_PROCESS_1 STR_VD_CAN_MEASURE_PROCESS_2 STR_VD_CAN_MEASURE_PROCESS_3

#1:                       Length                      Breadth                         <NA>

#2:                      Breadth                        Width                         <NA>

#3:                       Height                       Length                        Width

#4:                        Width                       Length                         <NA>

answered Nov 13 '18 at 17:55

akrun

400k13190265

1

Fantastic. works perfectly

– Sharath
Nov 13 '18 at 18:53

add a comment |

A pure tidyverse version. Many steps, but I believe it makes what you are after.

How about this:

df %>% 



# Grabs all columns ending with process

gather(key,val, ends_with("PROCESS")) %>% 



# Separate all now previous column names by "_"

separate(key, paste0("Pat",1:5)) %>% 



# Mutate all columns starting with "Pat" to just two chars

mutate_at(vars(starts_with("Pat")), substr, 1, 2) %>% 



# Separat cell values on comma

separate(val, paste0("Path",1:3)) %>% 



# Gather all Path columns into a key and value pair

gather(Path, val,starts_with("Path")) %>% 



# Unite all columns starting with "Pat" into one

unite(key, starts_with("Pat")) %>% na.omit() %>% 



# Spread the data using "key" as columns, and "val" as vell values

spread(key,val)



    ID Average Passfail ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 ST_VD_CA_ME_PR_Path1

1 ID21     7.5     Pass                  CCP                    A                  CCP               Length

2 ID22     6.5     Pass                  CCQ                    A                  CCQ              Breadth

3 ID23     7.1     Fail                  IOT                    A                  CCP               Height

4 ID24     6.6     Fail                  CCQ                    B                  IOT                Width

  ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3

1              Breadth                 <NA>

2                Width                 <NA>

3               Length                Width

4               Length                 <NA>

edited Nov 13 '18 at 19:45

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286555%2fsplitting-columns-ending-with-process-into-many-columns-with-individual-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Here is an option with cSplit

library(splitstackshape)

library(dplyr)

df %>% 

   cSplit(c("STR_PL_CAN_EVOLVE_PROCESS", "STR_VD_CAN_MEASURE_PROCESS"), 

                     ',', drop = TRUE)

#ID Average Passfail STR_PL_CAN_EVOLVE_PROCESS_1 STR_PL_CAN_EVOLVE_PROCESS_2 STR_PL_CAN_EVOLVE_PROCESS_3

#1: ID21     7.5     Pass                       CCP_A                       CCP_B                        <NA>

#2: ID22     6.5     Pass                       CCQ_A                       CCQ_B                       CCQ_C

#3: ID23     7.1     Fail                       IOT_A                       CCP_B                        <NA>

#4: ID24     6.6     Fail                       CCQ_B                       IOT_B                        <NA>

#   STR_VD_CAN_MEASURE_PROCESS_1 STR_VD_CAN_MEASURE_PROCESS_2 STR_VD_CAN_MEASURE_PROCESS_3

#1:                       Length                      Breadth                         <NA>

#2:                      Breadth                        Width                         <NA>

#3:                       Height                       Length                        Width

#4:                        Width                       Length                         <NA>

answered Nov 13 '18 at 17:55

akrun

400k13190265

1

Fantastic. works perfectly

– Sharath
Nov 13 '18 at 18:53

add a comment |

Here is an option with cSplit

library(splitstackshape)

library(dplyr)

df %>% 

   cSplit(c("STR_PL_CAN_EVOLVE_PROCESS", "STR_VD_CAN_MEASURE_PROCESS"), 

                     ',', drop = TRUE)

#ID Average Passfail STR_PL_CAN_EVOLVE_PROCESS_1 STR_PL_CAN_EVOLVE_PROCESS_2 STR_PL_CAN_EVOLVE_PROCESS_3

#1: ID21     7.5     Pass                       CCP_A                       CCP_B                        <NA>

#2: ID22     6.5     Pass                       CCQ_A                       CCQ_B                       CCQ_C

#3: ID23     7.1     Fail                       IOT_A                       CCP_B                        <NA>

#4: ID24     6.6     Fail                       CCQ_B                       IOT_B                        <NA>

#   STR_VD_CAN_MEASURE_PROCESS_1 STR_VD_CAN_MEASURE_PROCESS_2 STR_VD_CAN_MEASURE_PROCESS_3

#1:                       Length                      Breadth                         <NA>

#2:                      Breadth                        Width                         <NA>

#3:                       Height                       Length                        Width

#4:                        Width                       Length                         <NA>

answered Nov 13 '18 at 17:55

akrun

400k13190265

1

Fantastic. works perfectly

– Sharath
Nov 13 '18 at 18:53

add a comment |

Here is an option with cSplit

library(splitstackshape)

library(dplyr)

df %>% 

   cSplit(c("STR_PL_CAN_EVOLVE_PROCESS", "STR_VD_CAN_MEASURE_PROCESS"), 

                     ',', drop = TRUE)

#ID Average Passfail STR_PL_CAN_EVOLVE_PROCESS_1 STR_PL_CAN_EVOLVE_PROCESS_2 STR_PL_CAN_EVOLVE_PROCESS_3

#1: ID21     7.5     Pass                       CCP_A                       CCP_B                        <NA>

#2: ID22     6.5     Pass                       CCQ_A                       CCQ_B                       CCQ_C

#3: ID23     7.1     Fail                       IOT_A                       CCP_B                        <NA>

#4: ID24     6.6     Fail                       CCQ_B                       IOT_B                        <NA>

#   STR_VD_CAN_MEASURE_PROCESS_1 STR_VD_CAN_MEASURE_PROCESS_2 STR_VD_CAN_MEASURE_PROCESS_3

#1:                       Length                      Breadth                         <NA>

#2:                      Breadth                        Width                         <NA>

#3:                       Height                       Length                        Width

#4:                        Width                       Length                         <NA>

answered Nov 13 '18 at 17:55

akrun

400k13190265

Here is an option with cSplit

library(splitstackshape)

library(dplyr)

df %>% 

   cSplit(c("STR_PL_CAN_EVOLVE_PROCESS", "STR_VD_CAN_MEASURE_PROCESS"), 

                     ',', drop = TRUE)

#ID Average Passfail STR_PL_CAN_EVOLVE_PROCESS_1 STR_PL_CAN_EVOLVE_PROCESS_2 STR_PL_CAN_EVOLVE_PROCESS_3

#1: ID21     7.5     Pass                       CCP_A                       CCP_B                        <NA>

#2: ID22     6.5     Pass                       CCQ_A                       CCQ_B                       CCQ_C

#3: ID23     7.1     Fail                       IOT_A                       CCP_B                        <NA>

#4: ID24     6.6     Fail                       CCQ_B                       IOT_B                        <NA>

#   STR_VD_CAN_MEASURE_PROCESS_1 STR_VD_CAN_MEASURE_PROCESS_2 STR_VD_CAN_MEASURE_PROCESS_3

#1:                       Length                      Breadth                         <NA>

#2:                      Breadth                        Width                         <NA>

#3:                       Height                       Length                        Width

#4:                        Width                       Length                         <NA>

answered Nov 13 '18 at 17:55

akrun

400k13190265

answered Nov 13 '18 at 17:55

akrun

400k13190265

answered Nov 13 '18 at 17:55

akrun

400k13190265

answered Nov 13 '18 at 17:55

akrun

400k13190265

1

Fantastic. works perfectly

– Sharath
Nov 13 '18 at 18:53

add a comment |

1

Fantastic. works perfectly

– Sharath
Nov 13 '18 at 18:53

Fantastic. works perfectly

– Sharath
Nov 13 '18 at 18:53

add a comment |

A pure tidyverse version. Many steps, but I believe it makes what you are after.

How about this:

df %>% 



# Grabs all columns ending with process

gather(key,val, ends_with("PROCESS")) %>% 



# Separate all now previous column names by "_"

separate(key, paste0("Pat",1:5)) %>% 



# Mutate all columns starting with "Pat" to just two chars

mutate_at(vars(starts_with("Pat")), substr, 1, 2) %>% 



# Separat cell values on comma

separate(val, paste0("Path",1:3)) %>% 



# Gather all Path columns into a key and value pair

gather(Path, val,starts_with("Path")) %>% 



# Unite all columns starting with "Pat" into one

unite(key, starts_with("Pat")) %>% na.omit() %>% 



# Spread the data using "key" as columns, and "val" as vell values

spread(key,val)



    ID Average Passfail ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 ST_VD_CA_ME_PR_Path1

1 ID21     7.5     Pass                  CCP                    A                  CCP               Length

2 ID22     6.5     Pass                  CCQ                    A                  CCQ              Breadth

3 ID23     7.1     Fail                  IOT                    A                  CCP               Height

4 ID24     6.6     Fail                  CCQ                    B                  IOT                Width

  ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3

1              Breadth                 <NA>

2                Width                 <NA>

3               Length                Width

4               Length                 <NA>

edited Nov 13 '18 at 19:45

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

add a comment |

A pure tidyverse version. Many steps, but I believe it makes what you are after.

How about this:

df %>% 



# Grabs all columns ending with process

gather(key,val, ends_with("PROCESS")) %>% 



# Separate all now previous column names by "_"

separate(key, paste0("Pat",1:5)) %>% 



# Mutate all columns starting with "Pat" to just two chars

mutate_at(vars(starts_with("Pat")), substr, 1, 2) %>% 



# Separat cell values on comma

separate(val, paste0("Path",1:3)) %>% 



# Gather all Path columns into a key and value pair

gather(Path, val,starts_with("Path")) %>% 



# Unite all columns starting with "Pat" into one

unite(key, starts_with("Pat")) %>% na.omit() %>% 



# Spread the data using "key" as columns, and "val" as vell values

spread(key,val)



    ID Average Passfail ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 ST_VD_CA_ME_PR_Path1

1 ID21     7.5     Pass                  CCP                    A                  CCP               Length

2 ID22     6.5     Pass                  CCQ                    A                  CCQ              Breadth

3 ID23     7.1     Fail                  IOT                    A                  CCP               Height

4 ID24     6.6     Fail                  CCQ                    B                  IOT                Width

  ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3

1              Breadth                 <NA>

2                Width                 <NA>

3               Length                Width

4               Length                 <NA>

edited Nov 13 '18 at 19:45

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

add a comment |

A pure tidyverse version. Many steps, but I believe it makes what you are after.

How about this:

df %>% 



# Grabs all columns ending with process

gather(key,val, ends_with("PROCESS")) %>% 



# Separate all now previous column names by "_"

separate(key, paste0("Pat",1:5)) %>% 



# Mutate all columns starting with "Pat" to just two chars

mutate_at(vars(starts_with("Pat")), substr, 1, 2) %>% 



# Separat cell values on comma

separate(val, paste0("Path",1:3)) %>% 



# Gather all Path columns into a key and value pair

gather(Path, val,starts_with("Path")) %>% 



# Unite all columns starting with "Pat" into one

unite(key, starts_with("Pat")) %>% na.omit() %>% 



# Spread the data using "key" as columns, and "val" as vell values

spread(key,val)



    ID Average Passfail ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 ST_VD_CA_ME_PR_Path1

1 ID21     7.5     Pass                  CCP                    A                  CCP               Length

2 ID22     6.5     Pass                  CCQ                    A                  CCQ              Breadth

3 ID23     7.1     Fail                  IOT                    A                  CCP               Height

4 ID24     6.6     Fail                  CCQ                    B                  IOT                Width

  ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3

1              Breadth                 <NA>

2                Width                 <NA>

3               Length                Width

4               Length                 <NA>

edited Nov 13 '18 at 19:45

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

A pure tidyverse version. Many steps, but I believe it makes what you are after.

How about this:

df %>% 



# Grabs all columns ending with process

gather(key,val, ends_with("PROCESS")) %>% 



# Separate all now previous column names by "_"

separate(key, paste0("Pat",1:5)) %>% 



# Mutate all columns starting with "Pat" to just two chars

mutate_at(vars(starts_with("Pat")), substr, 1, 2) %>% 



# Separat cell values on comma

separate(val, paste0("Path",1:3)) %>% 



# Gather all Path columns into a key and value pair

gather(Path, val,starts_with("Path")) %>% 



# Unite all columns starting with "Pat" into one

unite(key, starts_with("Pat")) %>% na.omit() %>% 



# Spread the data using "key" as columns, and "val" as vell values

spread(key,val)



    ID Average Passfail ST_PL_CA_EV_PR_Path1 ST_PL_CA_EV_PR_Path2 ST_PL_CA_EV_PR_Path3 ST_VD_CA_ME_PR_Path1

1 ID21     7.5     Pass                  CCP                    A                  CCP               Length

2 ID22     6.5     Pass                  CCQ                    A                  CCQ              Breadth

3 ID23     7.1     Fail                  IOT                    A                  CCP               Height

4 ID24     6.6     Fail                  CCQ                    B                  IOT                Width

  ST_VD_CA_ME_PR_Path2 ST_VD_CA_ME_PR_Path3

1              Breadth                 <NA>

2                Width                 <NA>

3               Length                Width

4               Length                 <NA>

edited Nov 13 '18 at 19:45

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

edited Nov 13 '18 at 19:45

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

answered Nov 13 '18 at 17:58

Athanasia Mowinckel

907

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk