parse json with embedded lists into semi-long dataframe











up vote
0
down vote

favorite












I have a json file with several layers of nesting, and I am struggling to get it into a workable dataframe. I created a toy example of mock data based on a real structure: here is the gist.



And here is my desired output. The output could be "longer" or have additional variables from the original json, but I'm showing the core ask.



enter image description here



This is the part of the json that shows the deepest level of nesting that I want to get into a semi-long format as shown above in white (a fully wide format would be fine).



enter image description here



I've tried lots of things with this object:



myList <- jsonlite::fromJSON("example.json", flatten=TRUE)$results


from trying to subset [] and cbind(), to other efforts trying to unnest the embedded lists. Nothing quite right. I'd benefit greatly from advice on the best approach.










share|improve this question


























    up vote
    0
    down vote

    favorite












    I have a json file with several layers of nesting, and I am struggling to get it into a workable dataframe. I created a toy example of mock data based on a real structure: here is the gist.



    And here is my desired output. The output could be "longer" or have additional variables from the original json, but I'm showing the core ask.



    enter image description here



    This is the part of the json that shows the deepest level of nesting that I want to get into a semi-long format as shown above in white (a fully wide format would be fine).



    enter image description here



    I've tried lots of things with this object:



    myList <- jsonlite::fromJSON("example.json", flatten=TRUE)$results


    from trying to subset [] and cbind(), to other efforts trying to unnest the embedded lists. Nothing quite right. I'd benefit greatly from advice on the best approach.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a json file with several layers of nesting, and I am struggling to get it into a workable dataframe. I created a toy example of mock data based on a real structure: here is the gist.



      And here is my desired output. The output could be "longer" or have additional variables from the original json, but I'm showing the core ask.



      enter image description here



      This is the part of the json that shows the deepest level of nesting that I want to get into a semi-long format as shown above in white (a fully wide format would be fine).



      enter image description here



      I've tried lots of things with this object:



      myList <- jsonlite::fromJSON("example.json", flatten=TRUE)$results


      from trying to subset [] and cbind(), to other efforts trying to unnest the embedded lists. Nothing quite right. I'd benefit greatly from advice on the best approach.










      share|improve this question













      I have a json file with several layers of nesting, and I am struggling to get it into a workable dataframe. I created a toy example of mock data based on a real structure: here is the gist.



      And here is my desired output. The output could be "longer" or have additional variables from the original json, but I'm showing the core ask.



      enter image description here



      This is the part of the json that shows the deepest level of nesting that I want to get into a semi-long format as shown above in white (a fully wide format would be fine).



      enter image description here



      I've tried lots of things with this object:



      myList <- jsonlite::fromJSON("example.json", flatten=TRUE)$results


      from trying to subset [] and cbind(), to other efforts trying to unnest the embedded lists. Nothing quite right. I'd benefit greatly from advice on the best approach.







      r json jsonlite






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 9 at 19:12









      Eric Green

      2,41753163




      2,41753163
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Does this get you any further along? (This is a gnarly structure):



          library(tidyverse)

          x <- (jsonlite::fromJSON("/Users/hrbrmstr/r7/gh/labs-research/2018-11-portland-ciso-event/example.json"))

          jsonlite::stream_out(x$results, con = gzfile("ex-res.json.gz"))

          y <- ndjson::stream_in("ex-res.json.gz", "tbl")

          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          glimpse()
          ## Observations: 87,696
          ## Variables: 18
          ## $ contact.name <chr> "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2"...
          ## $ contact.uuid <chr> "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", ...
          ## $ created_on <chr> "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "...
          ## $ id <dbl> 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235...
          ## $ modified_on <chr> "2016-02-09T04:42:54.812323Z", "2016-02-08T08:09:51.545160Z", "2016-02-09T04:42:54.812323Z", "...
          ## $ responded <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
          ## $ start.uuid <chr> "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", ...
          ## $ uuid <chr> "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", ...
          ## $ exit_type <chr> NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "comp...
          ## $ exited_on <chr> NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51...
          ## $ path <chr> "path.0.node", "path.0.node", "path.0.time", "path.0.time", "path.1.node", "path.1.node", "pat...
          ## $ path_val <chr> "ecb4cb11-6cca-4791-a950-c448e9300846", "ecb4cb11-6cca-4791-a950-c448e9300846", "2016-02-08T07...
          ## $ flow <chr> "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "fl...
          ## $ flow_val <chr> "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyra...
          ## $ name <chr> "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.c...
          ## $ name_val <chr> "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 -...
          ## $ intervention <chr> "values.intervention", "values.intervention", "values.intervention", "values.intervention", "v...
          ## $ interv_val <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...


          Full approach:



          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          filter(grepl(".value", name)) %>%
          filter(grepl("node", path)) %>%
          mutate(variable = gsub("values.", "", name)) %>%
          mutate(variable = gsub(".value", "", variable)) %>%
          distinct(contact.name, uuid, name, .keep_all = TRUE) %>%
          select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>%
          arrange(id, created_on) # optional wide %>% spread(variable, name_val)





          share|improve this answer























          • Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
            – Eric Green
            Nov 9 at 20:01










          • messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
            – Eric Green
            Nov 9 at 20:30






          • 1




            definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
            – hrbrmstr
            Nov 9 at 20:37











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231968%2fparse-json-with-embedded-lists-into-semi-long-dataframe%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          Does this get you any further along? (This is a gnarly structure):



          library(tidyverse)

          x <- (jsonlite::fromJSON("/Users/hrbrmstr/r7/gh/labs-research/2018-11-portland-ciso-event/example.json"))

          jsonlite::stream_out(x$results, con = gzfile("ex-res.json.gz"))

          y <- ndjson::stream_in("ex-res.json.gz", "tbl")

          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          glimpse()
          ## Observations: 87,696
          ## Variables: 18
          ## $ contact.name <chr> "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2"...
          ## $ contact.uuid <chr> "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", ...
          ## $ created_on <chr> "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "...
          ## $ id <dbl> 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235...
          ## $ modified_on <chr> "2016-02-09T04:42:54.812323Z", "2016-02-08T08:09:51.545160Z", "2016-02-09T04:42:54.812323Z", "...
          ## $ responded <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
          ## $ start.uuid <chr> "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", ...
          ## $ uuid <chr> "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", ...
          ## $ exit_type <chr> NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "comp...
          ## $ exited_on <chr> NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51...
          ## $ path <chr> "path.0.node", "path.0.node", "path.0.time", "path.0.time", "path.1.node", "path.1.node", "pat...
          ## $ path_val <chr> "ecb4cb11-6cca-4791-a950-c448e9300846", "ecb4cb11-6cca-4791-a950-c448e9300846", "2016-02-08T07...
          ## $ flow <chr> "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "fl...
          ## $ flow_val <chr> "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyra...
          ## $ name <chr> "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.c...
          ## $ name_val <chr> "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 -...
          ## $ intervention <chr> "values.intervention", "values.intervention", "values.intervention", "values.intervention", "v...
          ## $ interv_val <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...


          Full approach:



          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          filter(grepl(".value", name)) %>%
          filter(grepl("node", path)) %>%
          mutate(variable = gsub("values.", "", name)) %>%
          mutate(variable = gsub(".value", "", variable)) %>%
          distinct(contact.name, uuid, name, .keep_all = TRUE) %>%
          select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>%
          arrange(id, created_on) # optional wide %>% spread(variable, name_val)





          share|improve this answer























          • Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
            – Eric Green
            Nov 9 at 20:01










          • messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
            – Eric Green
            Nov 9 at 20:30






          • 1




            definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
            – hrbrmstr
            Nov 9 at 20:37















          up vote
          2
          down vote



          accepted










          Does this get you any further along? (This is a gnarly structure):



          library(tidyverse)

          x <- (jsonlite::fromJSON("/Users/hrbrmstr/r7/gh/labs-research/2018-11-portland-ciso-event/example.json"))

          jsonlite::stream_out(x$results, con = gzfile("ex-res.json.gz"))

          y <- ndjson::stream_in("ex-res.json.gz", "tbl")

          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          glimpse()
          ## Observations: 87,696
          ## Variables: 18
          ## $ contact.name <chr> "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2"...
          ## $ contact.uuid <chr> "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", ...
          ## $ created_on <chr> "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "...
          ## $ id <dbl> 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235...
          ## $ modified_on <chr> "2016-02-09T04:42:54.812323Z", "2016-02-08T08:09:51.545160Z", "2016-02-09T04:42:54.812323Z", "...
          ## $ responded <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
          ## $ start.uuid <chr> "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", ...
          ## $ uuid <chr> "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", ...
          ## $ exit_type <chr> NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "comp...
          ## $ exited_on <chr> NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51...
          ## $ path <chr> "path.0.node", "path.0.node", "path.0.time", "path.0.time", "path.1.node", "path.1.node", "pat...
          ## $ path_val <chr> "ecb4cb11-6cca-4791-a950-c448e9300846", "ecb4cb11-6cca-4791-a950-c448e9300846", "2016-02-08T07...
          ## $ flow <chr> "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "fl...
          ## $ flow_val <chr> "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyra...
          ## $ name <chr> "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.c...
          ## $ name_val <chr> "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 -...
          ## $ intervention <chr> "values.intervention", "values.intervention", "values.intervention", "values.intervention", "v...
          ## $ interv_val <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...


          Full approach:



          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          filter(grepl(".value", name)) %>%
          filter(grepl("node", path)) %>%
          mutate(variable = gsub("values.", "", name)) %>%
          mutate(variable = gsub(".value", "", variable)) %>%
          distinct(contact.name, uuid, name, .keep_all = TRUE) %>%
          select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>%
          arrange(id, created_on) # optional wide %>% spread(variable, name_val)





          share|improve this answer























          • Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
            – Eric Green
            Nov 9 at 20:01










          • messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
            – Eric Green
            Nov 9 at 20:30






          • 1




            definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
            – hrbrmstr
            Nov 9 at 20:37













          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          Does this get you any further along? (This is a gnarly structure):



          library(tidyverse)

          x <- (jsonlite::fromJSON("/Users/hrbrmstr/r7/gh/labs-research/2018-11-portland-ciso-event/example.json"))

          jsonlite::stream_out(x$results, con = gzfile("ex-res.json.gz"))

          y <- ndjson::stream_in("ex-res.json.gz", "tbl")

          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          glimpse()
          ## Observations: 87,696
          ## Variables: 18
          ## $ contact.name <chr> "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2"...
          ## $ contact.uuid <chr> "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", ...
          ## $ created_on <chr> "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "...
          ## $ id <dbl> 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235...
          ## $ modified_on <chr> "2016-02-09T04:42:54.812323Z", "2016-02-08T08:09:51.545160Z", "2016-02-09T04:42:54.812323Z", "...
          ## $ responded <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
          ## $ start.uuid <chr> "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", ...
          ## $ uuid <chr> "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", ...
          ## $ exit_type <chr> NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "comp...
          ## $ exited_on <chr> NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51...
          ## $ path <chr> "path.0.node", "path.0.node", "path.0.time", "path.0.time", "path.1.node", "path.1.node", "pat...
          ## $ path_val <chr> "ecb4cb11-6cca-4791-a950-c448e9300846", "ecb4cb11-6cca-4791-a950-c448e9300846", "2016-02-08T07...
          ## $ flow <chr> "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "fl...
          ## $ flow_val <chr> "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyra...
          ## $ name <chr> "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.c...
          ## $ name_val <chr> "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 -...
          ## $ intervention <chr> "values.intervention", "values.intervention", "values.intervention", "values.intervention", "v...
          ## $ interv_val <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...


          Full approach:



          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          filter(grepl(".value", name)) %>%
          filter(grepl("node", path)) %>%
          mutate(variable = gsub("values.", "", name)) %>%
          mutate(variable = gsub(".value", "", variable)) %>%
          distinct(contact.name, uuid, name, .keep_all = TRUE) %>%
          select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>%
          arrange(id, created_on) # optional wide %>% spread(variable, name_val)





          share|improve this answer














          Does this get you any further along? (This is a gnarly structure):



          library(tidyverse)

          x <- (jsonlite::fromJSON("/Users/hrbrmstr/r7/gh/labs-research/2018-11-portland-ciso-event/example.json"))

          jsonlite::stream_out(x$results, con = gzfile("ex-res.json.gz"))

          y <- ndjson::stream_in("ex-res.json.gz", "tbl")

          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          glimpse()
          ## Observations: 87,696
          ## Variables: 18
          ## $ contact.name <chr> "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2"...
          ## $ contact.uuid <chr> "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", ...
          ## $ created_on <chr> "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "...
          ## $ id <dbl> 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235...
          ## $ modified_on <chr> "2016-02-09T04:42:54.812323Z", "2016-02-08T08:09:51.545160Z", "2016-02-09T04:42:54.812323Z", "...
          ## $ responded <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
          ## $ start.uuid <chr> "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", ...
          ## $ uuid <chr> "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", ...
          ## $ exit_type <chr> NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "comp...
          ## $ exited_on <chr> NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51...
          ## $ path <chr> "path.0.node", "path.0.node", "path.0.time", "path.0.time", "path.1.node", "path.1.node", "pat...
          ## $ path_val <chr> "ecb4cb11-6cca-4791-a950-c448e9300846", "ecb4cb11-6cca-4791-a950-c448e9300846", "2016-02-08T07...
          ## $ flow <chr> "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "fl...
          ## $ flow_val <chr> "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyra...
          ## $ name <chr> "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.c...
          ## $ name_val <chr> "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 -...
          ## $ intervention <chr> "values.intervention", "values.intervention", "values.intervention", "values.intervention", "v...
          ## $ interv_val <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...


          Full approach:



          gather(y, path, path_val, starts_with("path")) %>%
          gather(flow, flow_val, starts_with("flow")) %>%
          gather(name, name_val, starts_with("values.pdep")) %>%
          gather(intervention, interv_val, starts_with("values.inter")) %>%
          filter(grepl(".value", name)) %>%
          filter(grepl("node", path)) %>%
          mutate(variable = gsub("values.", "", name)) %>%
          mutate(variable = gsub(".value", "", variable)) %>%
          distinct(contact.name, uuid, name, .keep_all = TRUE) %>%
          select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>%
          arrange(id, created_on) # optional wide %>% spread(variable, name_val)






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 9 at 20:59









          Eric Green

          2,41753163




          2,41753163










          answered Nov 9 at 19:32









          hrbrmstr

          59.8k685146




          59.8k685146












          • Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
            – Eric Green
            Nov 9 at 20:01










          • messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
            – Eric Green
            Nov 9 at 20:30






          • 1




            definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
            – hrbrmstr
            Nov 9 at 20:37


















          • Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
            – Eric Green
            Nov 9 at 20:01










          • messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
            – Eric Green
            Nov 9 at 20:30






          • 1




            definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
            – hrbrmstr
            Nov 9 at 20:37
















          Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
          – Eric Green
          Nov 9 at 20:01




          Awesome, @hrbmstr! I was able to take this and get it the rest of the way. Shall I make an edit to your answer to suggest the method?
          – Eric Green
          Nov 9 at 20:01












          messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
          – Eric Green
          Nov 9 at 20:30




          messy as a comment, but replace glimpse() with: filter(grepl(".value", name)) %>% filter(grepl("node", path)) %>% mutate(variable = gsub("values.", "", name)) %>% mutate(variable = gsub(".value", "", variable)) %>% distinct(contact.name, uuid, name, .keep_all = TRUE) %>% select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>% arrange(id, created_on) # optional wide %>% spread(variable, name_val)
          – Eric Green
          Nov 9 at 20:30




          1




          1




          definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
          – hrbrmstr
          Nov 9 at 20:37




          definitely go ahead and edit it. just glad it helped. i've had my share of gnarly json in the past.
          – hrbrmstr
          Nov 9 at 20:37


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53231968%2fparse-json-with-embedded-lists-into-semi-long-dataframe%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini