Reading Roleplaying games statblocks using R

up vote
1
down vote

favorite

I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.

I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.

#Read monster

m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



m %>% 

  html_node(css = ".statblock") %>% 

  html_node(".title") %>% 

  html_text() %>%

  str_split("CR") -> title_cr



title_cr



[1] "Stone Giant Ranger 2 " " 10" 



monster = data_frame(Monster = title_cr[[1]][[1]],

                     CR = title_cr[[1]][[2]])



> print(monster)

# A tibble: 1 x 2

  Monster                 CR   

  <chr>                   <chr>

1 "Stone Giant Ranger 2 " " 10"

Take a look at the sample statblock from the url.

> txt = m %>% 

+   html_node(css = ".statblock") %>% html_text()

> print(txt)

[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n    DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching  n    OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft.  n     STATISTICS n     Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn   n     n   n  n  n   n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n  n    n   Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur  n   n  n

Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.

So I started doing things like this:

monster$AC = txt %>% 

  str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%

  as.numeric()



monster$Init = txt %>% 

  str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%

  as.numeric()



monster$HP = txt %>% 

  str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%

  as.numeric()



print(monster)

# A tibble: 1 x 5

  Monster                 CR       AC  Init    HP

  <chr>                   <chr> <dbl> <dbl> <dbl>

1 "Stone Giant Ranger 2 " " 10"    29     2   151

Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?

Thanks!

asked Nov 7 at 12:37

ARO

add a comment |

up vote
1
down vote

favorite

I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.

#Read monster

m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



m %>% 

  html_node(css = ".statblock") %>% 

  html_node(".title") %>% 

  html_text() %>%

  str_split("CR") -> title_cr



title_cr



[1] "Stone Giant Ranger 2 " " 10" 



monster = data_frame(Monster = title_cr[[1]][[1]],

                     CR = title_cr[[1]][[2]])



> print(monster)

# A tibble: 1 x 2

  Monster                 CR   

  <chr>                   <chr>

1 "Stone Giant Ranger 2 " " 10"

Take a look at the sample statblock from the url.

> txt = m %>% 

+   html_node(css = ".statblock") %>% html_text()

> print(txt)

[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n    DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching  n    OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft.  n     STATISTICS n     Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn   n     n   n  n  n   n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n  n    n   Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur  n   n  n

So I started doing things like this:

monster$AC = txt %>% 

  str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%

  as.numeric()



monster$Init = txt %>% 

  str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%

  as.numeric()



monster$HP = txt %>% 

  str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%

  as.numeric()



print(monster)

# A tibble: 1 x 5

  Monster                 CR       AC  Init    HP

  <chr>                   <chr> <dbl> <dbl> <dbl>

1 "Stone Giant Ranger 2 " " 10"    29     2   151

Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?

Thanks!

asked Nov 7 at 12:37

ARO

add a comment |

up vote
1
down vote

favorite

I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.

#Read monster

m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



m %>% 

  html_node(css = ".statblock") %>% 

  html_node(".title") %>% 

  html_text() %>%

  str_split("CR") -> title_cr



title_cr



[1] "Stone Giant Ranger 2 " " 10" 



monster = data_frame(Monster = title_cr[[1]][[1]],

                     CR = title_cr[[1]][[2]])



> print(monster)

# A tibble: 1 x 2

  Monster                 CR   

  <chr>                   <chr>

1 "Stone Giant Ranger 2 " " 10"

Take a look at the sample statblock from the url.

> txt = m %>% 

+   html_node(css = ".statblock") %>% html_text()

> print(txt)

[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n    DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching  n    OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft.  n     STATISTICS n     Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn   n     n   n  n  n   n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n  n    n   Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur  n   n  n

So I started doing things like this:

monster$AC = txt %>% 

  str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%

  as.numeric()



monster$Init = txt %>% 

  str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%

  as.numeric()



monster$HP = txt %>% 

  str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%

  as.numeric()



print(monster)

# A tibble: 1 x 5

  Monster                 CR       AC  Init    HP

  <chr>                   <chr> <dbl> <dbl> <dbl>

1 "Stone Giant Ranger 2 " " 10"    29     2   151

Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?

Thanks!

asked Nov 7 at 12:37

ARO

I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.

#Read monster

m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



m %>% 

  html_node(css = ".statblock") %>% 

  html_node(".title") %>% 

  html_text() %>%

  str_split("CR") -> title_cr



title_cr



[1] "Stone Giant Ranger 2 " " 10" 



monster = data_frame(Monster = title_cr[[1]][[1]],

                     CR = title_cr[[1]][[2]])



> print(monster)

# A tibble: 1 x 2

  Monster                 CR   

  <chr>                   <chr>

1 "Stone Giant Ranger 2 " " 10"

Take a look at the sample statblock from the url.

> txt = m %>% 

+   html_node(css = ".statblock") %>% html_text()

> print(txt)

[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n    DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching  n    OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft.  n     STATISTICS n     Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn   n     n   n  n  n   n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n  n    n   Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur  n   n  n

So I started doing things like this:

monster$AC = txt %>% 

  str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%

  as.numeric()



monster$Init = txt %>% 

  str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%

  as.numeric()



monster$HP = txt %>% 

  str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%

  as.numeric()



print(monster)

# A tibble: 1 x 5

  Monster                 CR       AC  Init    HP

  <chr>                   <chr> <dbl> <dbl> <dbl>

1 "Stone Giant Ranger 2 " " 10"    29     2   151

Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?

Thanks!

asked Nov 7 at 12:37

ARO

asked Nov 7 at 12:37

ARO

asked Nov 7 at 12:37

ARO

asked Nov 7 at 12:37

ARO

asked Nov 7 at 12:37

ARO

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):

library(rvest)

library(stringi)

library(tidyverse)



pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



sb <- html_node(pg, "div.statblock")



html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)

## [1] "Stone Giant Ranger 2"



html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)

## [1] "CR 10"



html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)

## [1] "XP 9,600"



html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "Male"



html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%

  html_text()

## [1] "Stone Giant" "Ranger"      "humanoid"    "giant"



html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "+2;"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_text()

## [1] "darkvision"       "low-light vision" "Perception"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/

           following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_nodes(xpath=".//following-sibling::text()") %>%

  html_text()

## [1] " 60 ft., " "; "        " +12"



html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%

  html_text() %>%

  stri_match_all_regex(

    "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",

    cg_missing = ""

  ) %>%

  .[[1]] %>%

  as_data_frame() %>%

  select(-1) %>%

  unite(col = "defense", sep = "")

## # A tibble: 14 x 1

##    defense       

##    <chr>         

##  1 AC 29         

##  2 touch 12      

##  3 flat-footed 27

##  4 +6 armor      

##  5 +1 deflection 

##  6 +2 Dex        

##  7 +11 natural   

##  8 -1 size       

##  9 hp 151        

## 10 +2 Favored    

## 11 Class +84     

## 12 Fort +16      

## 13 Ref +8        

## 14 Will +7

You'll still need some string ops but hopefully this helps a bit.

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189629%2freading-roleplaying-games-statblocks-using-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

library(rvest)

library(stringi)

library(tidyverse)



pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



sb <- html_node(pg, "div.statblock")



html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)

## [1] "Stone Giant Ranger 2"



html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)

## [1] "CR 10"



html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)

## [1] "XP 9,600"



html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "Male"



html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%

  html_text()

## [1] "Stone Giant" "Ranger"      "humanoid"    "giant"



html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "+2;"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_text()

## [1] "darkvision"       "low-light vision" "Perception"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/

           following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_nodes(xpath=".//following-sibling::text()") %>%

  html_text()

## [1] " 60 ft., " "; "        " +12"



html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%

  html_text() %>%

  stri_match_all_regex(

    "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",

    cg_missing = ""

  ) %>%

  .[[1]] %>%

  as_data_frame() %>%

  select(-1) %>%

  unite(col = "defense", sep = "")

## # A tibble: 14 x 1

##    defense       

##    <chr>         

##  1 AC 29         

##  2 touch 12      

##  3 flat-footed 27

##  4 +6 armor      

##  5 +1 deflection 

##  6 +2 Dex        

##  7 +11 natural   

##  8 -1 size       

##  9 hp 151        

## 10 +2 Favored    

## 11 Class +84     

## 12 Fort +16      

## 13 Ref +8        

## 14 Will +7

You'll still need some string ops but hopefully this helps a bit.

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

add a comment |

up vote
0
down vote

library(rvest)

library(stringi)

library(tidyverse)



pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



sb <- html_node(pg, "div.statblock")



html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)

## [1] "Stone Giant Ranger 2"



html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)

## [1] "CR 10"



html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)

## [1] "XP 9,600"



html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "Male"



html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%

  html_text()

## [1] "Stone Giant" "Ranger"      "humanoid"    "giant"



html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "+2;"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_text()

## [1] "darkvision"       "low-light vision" "Perception"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/

           following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_nodes(xpath=".//following-sibling::text()") %>%

  html_text()

## [1] " 60 ft., " "; "        " +12"



html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%

  html_text() %>%

  stri_match_all_regex(

    "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",

    cg_missing = ""

  ) %>%

  .[[1]] %>%

  as_data_frame() %>%

  select(-1) %>%

  unite(col = "defense", sep = "")

## # A tibble: 14 x 1

##    defense       

##    <chr>         

##  1 AC 29         

##  2 touch 12      

##  3 flat-footed 27

##  4 +6 armor      

##  5 +1 deflection 

##  6 +2 Dex        

##  7 +11 natural   

##  8 -1 size       

##  9 hp 151        

## 10 +2 Favored    

## 11 Class +84     

## 12 Fort +16      

## 13 Ref +8        

## 14 Will +7

You'll still need some string ops but hopefully this helps a bit.

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

add a comment |

up vote
0
down vote

library(rvest)

library(stringi)

library(tidyverse)



pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



sb <- html_node(pg, "div.statblock")



html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)

## [1] "Stone Giant Ranger 2"



html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)

## [1] "CR 10"



html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)

## [1] "XP 9,600"



html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "Male"



html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%

  html_text()

## [1] "Stone Giant" "Ranger"      "humanoid"    "giant"



html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "+2;"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_text()

## [1] "darkvision"       "low-light vision" "Perception"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/

           following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_nodes(xpath=".//following-sibling::text()") %>%

  html_text()

## [1] " 60 ft., " "; "        " +12"



html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%

  html_text() %>%

  stri_match_all_regex(

    "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",

    cg_missing = ""

  ) %>%

  .[[1]] %>%

  as_data_frame() %>%

  select(-1) %>%

  unite(col = "defense", sep = "")

## # A tibble: 14 x 1

##    defense       

##    <chr>         

##  1 AC 29         

##  2 touch 12      

##  3 flat-footed 27

##  4 +6 armor      

##  5 +1 deflection 

##  6 +2 Dex        

##  7 +11 natural   

##  8 -1 size       

##  9 hp 151        

## 10 +2 Favored    

## 11 Class +84     

## 12 Fort +16      

## 13 Ref +8        

## 14 Will +7

You'll still need some string ops but hopefully this helps a bit.

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

library(rvest)

library(stringi)

library(tidyverse)



pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")



sb <- html_node(pg, "div.statblock")



html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)

## [1] "Stone Giant Ranger 2"



html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)

## [1] "CR 10"



html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)

## [1] "XP 9,600"



html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "Male"



html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%

  html_text()

## [1] "Stone Giant" "Ranger"      "humanoid"    "giant"



html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)

## [1] "+2;"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_text()

## [1] "darkvision"       "low-light vision" "Perception"



html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/

           following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%

  html_nodes(xpath=".//following-sibling::text()") %>%

  html_text()

## [1] " 60 ft., " "; "        " +12"



html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%

  html_text() %>%

  stri_match_all_regex(

    "(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",

    cg_missing = ""

  ) %>%

  .[[1]] %>%

  as_data_frame() %>%

  select(-1) %>%

  unite(col = "defense", sep = "")

## # A tibble: 14 x 1

##    defense       

##    <chr>         

##  1 AC 29         

##  2 touch 12      

##  3 flat-footed 27

##  4 +6 armor      

##  5 +1 deflection 

##  6 +2 Dex        

##  7 +11 natural   

##  8 -1 size       

##  9 hp 151        

## 10 +2 Favored    

## 11 Class +84     

## 12 Fort +16      

## 13 Ref +8        

## 14 Will +7

You'll still need some string ops but hopefully this helps a bit.

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

answered Nov 7 at 14:00

hrbrmstr

58.2k584143

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk