Reading Roleplaying games statblocks using R











up vote
1
down vote

favorite












I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.



I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.



#Read monster
m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

m %>%
html_node(css = ".statblock") %>%
html_node(".title") %>%
html_text() %>%
str_split("CR") -> title_cr

title_cr

[1] "Stone Giant Ranger 2 " " 10"

monster = data_frame(Monster = title_cr[[1]][[1]],
CR = title_cr[[1]][[2]])

> print(monster)
# A tibble: 1 x 2
Monster CR
<chr> <chr>
1 "Stone Giant Ranger 2 " " 10"


Take a look at the sample statblock from the url.



> txt = m %>% 
+ html_node(css = ".statblock") %>% html_text()
> print(txt)
[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n


Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.



So I started doing things like this:



monster$AC = txt %>% 
str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
as.numeric()

monster$Init = txt %>%
str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
as.numeric()

monster$HP = txt %>%
str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
as.numeric()

print(monster)
# A tibble: 1 x 5
Monster CR AC Init HP
<chr> <chr> <dbl> <dbl> <dbl>
1 "Stone Giant Ranger 2 " " 10" 29 2 151


Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?



Thanks!










share|improve this question


























    up vote
    1
    down vote

    favorite












    I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.



    I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.



    #Read monster
    m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

    m %>%
    html_node(css = ".statblock") %>%
    html_node(".title") %>%
    html_text() %>%
    str_split("CR") -> title_cr

    title_cr

    [1] "Stone Giant Ranger 2 " " 10"

    monster = data_frame(Monster = title_cr[[1]][[1]],
    CR = title_cr[[1]][[2]])

    > print(monster)
    # A tibble: 1 x 2
    Monster CR
    <chr> <chr>
    1 "Stone Giant Ranger 2 " " 10"


    Take a look at the sample statblock from the url.



    > txt = m %>% 
    + html_node(css = ".statblock") %>% html_text()
    > print(txt)
    [1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n


    Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.



    So I started doing things like this:



    monster$AC = txt %>% 
    str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
    as.numeric()

    monster$Init = txt %>%
    str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
    as.numeric()

    monster$HP = txt %>%
    str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
    as.numeric()

    print(monster)
    # A tibble: 1 x 5
    Monster CR AC Init HP
    <chr> <chr> <dbl> <dbl> <dbl>
    1 "Stone Giant Ranger 2 " " 10" 29 2 151


    Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?



    Thanks!










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.



      I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.



      #Read monster
      m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

      m %>%
      html_node(css = ".statblock") %>%
      html_node(".title") %>%
      html_text() %>%
      str_split("CR") -> title_cr

      title_cr

      [1] "Stone Giant Ranger 2 " " 10"

      monster = data_frame(Monster = title_cr[[1]][[1]],
      CR = title_cr[[1]][[2]])

      > print(monster)
      # A tibble: 1 x 2
      Monster CR
      <chr> <chr>
      1 "Stone Giant Ranger 2 " " 10"


      Take a look at the sample statblock from the url.



      > txt = m %>% 
      + html_node(css = ".statblock") %>% html_text()
      > print(txt)
      [1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n


      Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.



      So I started doing things like this:



      monster$AC = txt %>% 
      str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
      as.numeric()

      monster$Init = txt %>%
      str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
      as.numeric()

      monster$HP = txt %>%
      str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
      as.numeric()

      print(monster)
      # A tibble: 1 x 5
      Monster CR AC Init HP
      <chr> <chr> <dbl> <dbl> <dbl>
      1 "Stone Giant Ranger 2 " " 10" 29 2 151


      Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?



      Thanks!










      share|improve this question













      I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.



      I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.



      #Read monster
      m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

      m %>%
      html_node(css = ".statblock") %>%
      html_node(".title") %>%
      html_text() %>%
      str_split("CR") -> title_cr

      title_cr

      [1] "Stone Giant Ranger 2 " " 10"

      monster = data_frame(Monster = title_cr[[1]][[1]],
      CR = title_cr[[1]][[2]])

      > print(monster)
      # A tibble: 1 x 2
      Monster CR
      <chr> <chr>
      1 "Stone Giant Ranger 2 " " 10"


      Take a look at the sample statblock from the url.



      > txt = m %>% 
      + html_node(css = ".statblock") %>% html_text()
      > print(txt)
      [1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n


      Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.



      So I started doing things like this:



      monster$AC = txt %>% 
      str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
      as.numeric()

      monster$Init = txt %>%
      str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
      as.numeric()

      monster$HP = txt %>%
      str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
      as.numeric()

      print(monster)
      # A tibble: 1 x 5
      Monster CR AC Init HP
      <chr> <chr> <dbl> <dbl> <dbl>
      1 "Stone Giant Ranger 2 " " 10" 29 2 151


      Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?



      Thanks!







      r






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 7 at 12:37









      ARO

      61




      61
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):



          library(rvest)
          library(stringi)
          library(tidyverse)

          pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

          sb <- html_node(pg, "div.statblock")

          html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
          ## [1] "Stone Giant Ranger 2"

          html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
          ## [1] "CR 10"

          html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
          ## [1] "XP 9,600"

          html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
          ## [1] "Male"

          html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
          html_text()
          ## [1] "Stone Giant" "Ranger" "humanoid" "giant"

          html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
          ## [1] "+2;"

          html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
          html_text()
          ## [1] "darkvision" "low-light vision" "Perception"

          html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
          following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
          html_nodes(xpath=".//following-sibling::text()") %>%
          html_text()
          ## [1] " 60 ft., " "; " " +12"

          html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
          html_text() %>%
          stri_match_all_regex(
          "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
          cg_missing = ""
          ) %>%
          .[[1]] %>%
          as_data_frame() %>%
          select(-1) %>%
          unite(col = "defense", sep = "")
          ## # A tibble: 14 x 1
          ## defense
          ## <chr>
          ## 1 AC 29
          ## 2 touch 12
          ## 3 flat-footed 27
          ## 4 +6 armor
          ## 5 +1 deflection
          ## 6 +2 Dex
          ## 7 +11 natural
          ## 8 -1 size
          ## 9 hp 151
          ## 10 +2 Favored
          ## 11 Class +84
          ## 12 Fort +16
          ## 13 Ref +8
          ## 14 Will +7


          You'll still need some string ops but hopefully this helps a bit.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189629%2freading-roleplaying-games-statblocks-using-r%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):



            library(rvest)
            library(stringi)
            library(tidyverse)

            pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

            sb <- html_node(pg, "div.statblock")

            html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
            ## [1] "Stone Giant Ranger 2"

            html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
            ## [1] "CR 10"

            html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
            ## [1] "XP 9,600"

            html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
            ## [1] "Male"

            html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
            html_text()
            ## [1] "Stone Giant" "Ranger" "humanoid" "giant"

            html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
            ## [1] "+2;"

            html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
            html_text()
            ## [1] "darkvision" "low-light vision" "Perception"

            html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
            following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
            html_nodes(xpath=".//following-sibling::text()") %>%
            html_text()
            ## [1] " 60 ft., " "; " " +12"

            html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
            html_text() %>%
            stri_match_all_regex(
            "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
            cg_missing = ""
            ) %>%
            .[[1]] %>%
            as_data_frame() %>%
            select(-1) %>%
            unite(col = "defense", sep = "")
            ## # A tibble: 14 x 1
            ## defense
            ## <chr>
            ## 1 AC 29
            ## 2 touch 12
            ## 3 flat-footed 27
            ## 4 +6 armor
            ## 5 +1 deflection
            ## 6 +2 Dex
            ## 7 +11 natural
            ## 8 -1 size
            ## 9 hp 151
            ## 10 +2 Favored
            ## 11 Class +84
            ## 12 Fort +16
            ## 13 Ref +8
            ## 14 Will +7


            You'll still need some string ops but hopefully this helps a bit.






            share|improve this answer

























              up vote
              0
              down vote













              With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):



              library(rvest)
              library(stringi)
              library(tidyverse)

              pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

              sb <- html_node(pg, "div.statblock")

              html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
              ## [1] "Stone Giant Ranger 2"

              html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
              ## [1] "CR 10"

              html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
              ## [1] "XP 9,600"

              html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
              ## [1] "Male"

              html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
              html_text()
              ## [1] "Stone Giant" "Ranger" "humanoid" "giant"

              html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
              ## [1] "+2;"

              html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
              html_text()
              ## [1] "darkvision" "low-light vision" "Perception"

              html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
              following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
              html_nodes(xpath=".//following-sibling::text()") %>%
              html_text()
              ## [1] " 60 ft., " "; " " +12"

              html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
              html_text() %>%
              stri_match_all_regex(
              "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
              cg_missing = ""
              ) %>%
              .[[1]] %>%
              as_data_frame() %>%
              select(-1) %>%
              unite(col = "defense", sep = "")
              ## # A tibble: 14 x 1
              ## defense
              ## <chr>
              ## 1 AC 29
              ## 2 touch 12
              ## 3 flat-footed 27
              ## 4 +6 armor
              ## 5 +1 deflection
              ## 6 +2 Dex
              ## 7 +11 natural
              ## 8 -1 size
              ## 9 hp 151
              ## 10 +2 Favored
              ## 11 Class +84
              ## 12 Fort +16
              ## 13 Ref +8
              ## 14 Will +7


              You'll still need some string ops but hopefully this helps a bit.






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):



                library(rvest)
                library(stringi)
                library(tidyverse)

                pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

                sb <- html_node(pg, "div.statblock")

                html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
                ## [1] "Stone Giant Ranger 2"

                html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
                ## [1] "CR 10"

                html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
                ## [1] "XP 9,600"

                html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
                ## [1] "Male"

                html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
                html_text()
                ## [1] "Stone Giant" "Ranger" "humanoid" "giant"

                html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
                ## [1] "+2;"

                html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
                html_text()
                ## [1] "darkvision" "low-light vision" "Perception"

                html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
                following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
                html_nodes(xpath=".//following-sibling::text()") %>%
                html_text()
                ## [1] " 60 ft., " "; " " +12"

                html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
                html_text() %>%
                stri_match_all_regex(
                "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
                cg_missing = ""
                ) %>%
                .[[1]] %>%
                as_data_frame() %>%
                select(-1) %>%
                unite(col = "defense", sep = "")
                ## # A tibble: 14 x 1
                ## defense
                ## <chr>
                ## 1 AC 29
                ## 2 touch 12
                ## 3 flat-footed 27
                ## 4 +6 armor
                ## 5 +1 deflection
                ## 6 +2 Dex
                ## 7 +11 natural
                ## 8 -1 size
                ## 9 hp 151
                ## 10 +2 Favored
                ## 11 Class +84
                ## 12 Fort +16
                ## 13 Ref +8
                ## 14 Will +7


                You'll still need some string ops but hopefully this helps a bit.






                share|improve this answer












                With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):



                library(rvest)
                library(stringi)
                library(tidyverse)

                pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")

                sb <- html_node(pg, "div.statblock")

                html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
                ## [1] "Stone Giant Ranger 2"

                html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
                ## [1] "CR 10"

                html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
                ## [1] "XP 9,600"

                html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
                ## [1] "Male"

                html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
                html_text()
                ## [1] "Stone Giant" "Ranger" "humanoid" "giant"

                html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
                ## [1] "+2;"

                html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
                html_text()
                ## [1] "darkvision" "low-light vision" "Perception"

                html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
                following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
                html_nodes(xpath=".//following-sibling::text()") %>%
                html_text()
                ## [1] " 60 ft., " "; " " +12"

                html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
                html_text() %>%
                stri_match_all_regex(
                "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
                cg_missing = ""
                ) %>%
                .[[1]] %>%
                as_data_frame() %>%
                select(-1) %>%
                unite(col = "defense", sep = "")
                ## # A tibble: 14 x 1
                ## defense
                ## <chr>
                ## 1 AC 29
                ## 2 touch 12
                ## 3 flat-footed 27
                ## 4 +6 armor
                ## 5 +1 deflection
                ## 6 +2 Dex
                ## 7 +11 natural
                ## 8 -1 size
                ## 9 hp 151
                ## 10 +2 Favored
                ## 11 Class +84
                ## 12 Fort +16
                ## 13 Ref +8
                ## 14 Will +7


                You'll still need some string ops but hopefully this helps a bit.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 7 at 14:00









                hrbrmstr

                58.2k584143




                58.2k584143






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189629%2freading-roleplaying-games-statblocks-using-r%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    Academy of Television Arts & Sciences

                    L'Équipe

                    1995 France bombings