Reading Roleplaying games statblocks using R
up vote
1
down vote
favorite
I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.
I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.
#Read monster
m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
m %>%
html_node(css = ".statblock") %>%
html_node(".title") %>%
html_text() %>%
str_split("CR") -> title_cr
title_cr
[1] "Stone Giant Ranger 2 " " 10"
monster = data_frame(Monster = title_cr[[1]][[1]],
CR = title_cr[[1]][[2]])
> print(monster)
# A tibble: 1 x 2
Monster CR
<chr> <chr>
1 "Stone Giant Ranger 2 " " 10"
Take a look at the sample statblock from the url.
> txt = m %>%
+ html_node(css = ".statblock") %>% html_text()
> print(txt)
[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n
Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.
So I started doing things like this:
monster$AC = txt %>%
str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
as.numeric()
monster$Init = txt %>%
str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
as.numeric()
monster$HP = txt %>%
str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
as.numeric()
print(monster)
# A tibble: 1 x 5
Monster CR AC Init HP
<chr> <chr> <dbl> <dbl> <dbl>
1 "Stone Giant Ranger 2 " " 10" 29 2 151
Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?
Thanks!
r
add a comment |
up vote
1
down vote
favorite
I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.
I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.
#Read monster
m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
m %>%
html_node(css = ".statblock") %>%
html_node(".title") %>%
html_text() %>%
str_split("CR") -> title_cr
title_cr
[1] "Stone Giant Ranger 2 " " 10"
monster = data_frame(Monster = title_cr[[1]][[1]],
CR = title_cr[[1]][[2]])
> print(monster)
# A tibble: 1 x 2
Monster CR
<chr> <chr>
1 "Stone Giant Ranger 2 " " 10"
Take a look at the sample statblock from the url.
> txt = m %>%
+ html_node(css = ".statblock") %>% html_text()
> print(txt)
[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n
Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.
So I started doing things like this:
monster$AC = txt %>%
str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
as.numeric()
monster$Init = txt %>%
str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
as.numeric()
monster$HP = txt %>%
str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
as.numeric()
print(monster)
# A tibble: 1 x 5
Monster CR AC Init HP
<chr> <chr> <dbl> <dbl> <dbl>
1 "Stone Giant Ranger 2 " " 10" 29 2 151
Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?
Thanks!
r
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.
I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.
#Read monster
m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
m %>%
html_node(css = ".statblock") %>%
html_node(".title") %>%
html_text() %>%
str_split("CR") -> title_cr
title_cr
[1] "Stone Giant Ranger 2 " " 10"
monster = data_frame(Monster = title_cr[[1]][[1]],
CR = title_cr[[1]][[2]])
> print(monster)
# A tibble: 1 x 2
Monster CR
<chr> <chr>
1 "Stone Giant Ranger 2 " " 10"
Take a look at the sample statblock from the url.
> txt = m %>%
+ html_node(css = ".statblock") %>% html_text()
> print(txt)
[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n
Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.
So I started doing things like this:
monster$AC = txt %>%
str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
as.numeric()
monster$Init = txt %>%
str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
as.numeric()
monster$HP = txt %>%
str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
as.numeric()
print(monster)
# A tibble: 1 x 5
Monster CR AC Init HP
<chr> <chr> <dbl> <dbl> <dbl>
1 "Stone Giant Ranger 2 " " 10" 29 2 151
Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?
Thanks!
r
I'd characterize myself as a moderately versed R user (who also tries to expand into Python). Sometimes I dabble in fun projects to expand my horizons, and I like RPGs. I'm trying to scrape some monster statblocks from the d220pfsrd.com page to use in our RPG games.
I hoped the page would be more structured, so I could use the rvest package for everything, but now it seems I need to use a lot of regex to do this.
#Read monster
m = read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
m %>%
html_node(css = ".statblock") %>%
html_node(".title") %>%
html_text() %>%
str_split("CR") -> title_cr
title_cr
[1] "Stone Giant Ranger 2 " " 10"
monster = data_frame(Monster = title_cr[[1]][[1]],
CR = title_cr[[1]][[2]])
> print(monster)
# A tibble: 1 x 2
Monster CR
<chr> <chr>
1 "Stone Giant Ranger 2 " " 10"
Take a look at the sample statblock from the url.
> txt = m %>%
+ html_node(css = ".statblock") %>% html_text()
> print(txt)
[1] "nStone Giant Ranger 2 CR 10nXP 9,600 Male Stone Giant Ranger 2 CE Large humanoid (giant)Init +2; Senses darkvision 60 ft., low-light vision; Perception +12 n DEFENSEn AC 29, touch 12, flat-footed 27 (+6 armor, +1 deflection, +2 Dex, +11 natural, -1 size)hp 151 (12d8+2d10+2 Favored Class +84)Fort +16, Ref +8, Will +7Special Defenses rock catching n OFFENSE Speed 40 ft.Melee +1 dwarf bane heavy pick +20/+15/+10 (1d8+11/19-20/x4) and +1 light pick +20 (1d6+6/19-20/x4)Ranged rock +13/+8/+3 (2d8+15)Space 10 ft.; Reach 10 ft.Special Attacks favored enemy (dwarf +2); rock throwing 180 ft. n STATISTICS n Str 27, Dex 15, Con 19, Int 10, Wis 12, Cha 10Base Atk +11; CMB +22; CMD 34Feats Improved Critical (Heavy Pick), Improved Critical (Light Pick), Iron Will, Quick DrawB, Power Attack, Two-Weapon Fighting, Weapon Focus (Heavy Pick), Weapon Focus (Light Pick)Skills Climb +13, Perception +12, Stealth +9 (+17 in rocky terrain), Survival +12 (+13 when Tracking); Racial Modifiers +8 Stealth in rocky terrainLanguages Common, Dwarven, GiantSQ Wild Empathy +2Gear +2 Hide Shirt; +1 dwarf bane heavy pick, +1 light pick, ring of protection +1, war horn n n n n n n n Section 15: Copyright Notice – Pathfinder 4: Fortress of the Stone Giants n n n Pathfinder 4: Fortress of the Stone Giants. Copyright 2007, Paizo Publishing LLC. Author: Wolfgang Baur n n n
Am I right to assume that working with nodes to get text is out the window here? I moved along from using nodes to just searching for patterns I'm interested in, such as HP, AC and other relevant stats I could use.
So I started doing things like this:
monster$AC = txt %>%
str_extract("AC [0-9]{2}") %>% str_extract("[0-9]{2}") %>%
as.numeric()
monster$Init = txt %>%
str_extract("Init [+-][0-9]") %>% str_extract("[0-9]+") %>%
as.numeric()
monster$HP = txt %>%
str_extract("hp [0-9]{1,4}") %>% str_extract("[0-9]{1,4}") %>%
as.numeric()
print(monster)
# A tibble: 1 x 5
Monster CR AC Init HP
<chr> <chr> <dbl> <dbl> <dbl>
1 "Stone Giant Ranger 2 " " 10" 29 2 151
Are there any obviously better approaches here, or do I need to keep at this process if I want to read statblocks from this page into dataframes?
Thanks!
r
r
asked Nov 7 at 12:37
ARO
61
61
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):
library(rvest)
library(stringi)
library(tidyverse)
pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
sb <- html_node(pg, "div.statblock")
html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
## [1] "Stone Giant Ranger 2"
html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
## [1] "CR 10"
html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
## [1] "XP 9,600"
html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "Male"
html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
html_text()
## [1] "Stone Giant" "Ranger" "humanoid" "giant"
html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "+2;"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_text()
## [1] "darkvision" "low-light vision" "Perception"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_nodes(xpath=".//following-sibling::text()") %>%
html_text()
## [1] " 60 ft., " "; " " +12"
html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
html_text() %>%
stri_match_all_regex(
"(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
cg_missing = ""
) %>%
.[[1]] %>%
as_data_frame() %>%
select(-1) %>%
unite(col = "defense", sep = "")
## # A tibble: 14 x 1
## defense
## <chr>
## 1 AC 29
## 2 touch 12
## 3 flat-footed 27
## 4 +6 armor
## 5 +1 deflection
## 6 +2 Dex
## 7 +11 natural
## 8 -1 size
## 9 hp 151
## 10 +2 Favored
## 11 Class +84
## 12 Fort +16
## 13 Ref +8
## 14 Will +7
You'll still need some string ops but hopefully this helps a bit.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):
library(rvest)
library(stringi)
library(tidyverse)
pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
sb <- html_node(pg, "div.statblock")
html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
## [1] "Stone Giant Ranger 2"
html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
## [1] "CR 10"
html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
## [1] "XP 9,600"
html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "Male"
html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
html_text()
## [1] "Stone Giant" "Ranger" "humanoid" "giant"
html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "+2;"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_text()
## [1] "darkvision" "low-light vision" "Perception"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_nodes(xpath=".//following-sibling::text()") %>%
html_text()
## [1] " 60 ft., " "; " " +12"
html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
html_text() %>%
stri_match_all_regex(
"(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
cg_missing = ""
) %>%
.[[1]] %>%
as_data_frame() %>%
select(-1) %>%
unite(col = "defense", sep = "")
## # A tibble: 14 x 1
## defense
## <chr>
## 1 AC 29
## 2 touch 12
## 3 flat-footed 27
## 4 +6 armor
## 5 +1 deflection
## 6 +2 Dex
## 7 +11 natural
## 8 -1 size
## 9 hp 151
## 10 +2 Favored
## 11 Class +84
## 12 Fort +16
## 13 Ref +8
## 14 Will +7
You'll still need some string ops but hopefully this helps a bit.
add a comment |
up vote
0
down vote
With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):
library(rvest)
library(stringi)
library(tidyverse)
pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
sb <- html_node(pg, "div.statblock")
html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
## [1] "Stone Giant Ranger 2"
html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
## [1] "CR 10"
html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
## [1] "XP 9,600"
html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "Male"
html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
html_text()
## [1] "Stone Giant" "Ranger" "humanoid" "giant"
html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "+2;"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_text()
## [1] "darkvision" "low-light vision" "Perception"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_nodes(xpath=".//following-sibling::text()") %>%
html_text()
## [1] " 60 ft., " "; " " +12"
html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
html_text() %>%
stri_match_all_regex(
"(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
cg_missing = ""
) %>%
.[[1]] %>%
as_data_frame() %>%
select(-1) %>%
unite(col = "defense", sep = "")
## # A tibble: 14 x 1
## defense
## <chr>
## 1 AC 29
## 2 touch 12
## 3 flat-footed 27
## 4 +6 armor
## 5 +1 deflection
## 6 +2 Dex
## 7 +11 natural
## 8 -1 size
## 9 hp 151
## 10 +2 Favored
## 11 Class +84
## 12 Fort +16
## 13 Ref +8
## 14 Will +7
You'll still need some string ops but hopefully this helps a bit.
add a comment |
up vote
0
down vote
up vote
0
down vote
With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):
library(rvest)
library(stringi)
library(tidyverse)
pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
sb <- html_node(pg, "div.statblock")
html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
## [1] "Stone Giant Ranger 2"
html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
## [1] "CR 10"
html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
## [1] "XP 9,600"
html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "Male"
html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
html_text()
## [1] "Stone Giant" "Ranger" "humanoid" "giant"
html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "+2;"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_text()
## [1] "darkvision" "low-light vision" "Perception"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_nodes(xpath=".//following-sibling::text()") %>%
html_text()
## [1] " 60 ft., " "; " " +12"
html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
html_text() %>%
stri_match_all_regex(
"(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
cg_missing = ""
) %>%
.[[1]] %>%
as_data_frame() %>%
select(-1) %>%
unite(col = "defense", sep = "")
## # A tibble: 14 x 1
## defense
## <chr>
## 1 AC 29
## 2 touch 12
## 3 flat-footed 27
## 4 +6 armor
## 5 +1 deflection
## 6 +2 Dex
## 7 +11 natural
## 8 -1 size
## 9 hp 151
## 10 +2 Favored
## 11 Class +84
## 12 Fort +16
## 13 Ref +8
## 14 Will +7
You'll still need some string ops but hopefully this helps a bit.
With a little bit of XPath elbow grease, the underlying visual formatting can be used as structure (this is incomplete in that it doesn't parse the whole page but it should provide enough examples to get the rest…if not, just comment about anything specific you're stuck on):
library(rvest)
library(stringi)
library(tidyverse)
pg <- read_html("https://www.d20pfsrd.com/bestiary/unique-monsters/cr-10/teraktinus/")
sb <- html_node(pg, "div.statblock")
html_node(sb, xpath=".//p[@class='title']/text()") %>% html_text(trim=TRUE)
## [1] "Stone Giant Ranger 2"
html_node(sb, xpath=".//p[@class='title']/span") %>% html_text(trim=TRUE)
## [1] "CR 10"
html_node(sb, xpath=".//p/b[contains(., 'XP')]") %>% html_text(trim=TRUE)
## [1] "XP 9,600"
html_node(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "Male"
html_nodes(sb, xpath=".//p/b[contains(., 'XP')]/following-sibling::a[preceding-sibling::b[1][contains(., 'XP')]]") %>%
html_text()
## [1] "Stone Giant" "Ranger" "humanoid" "giant"
html_node(sb, xpath=".//p/b[contains(., 'Init')]/following-sibling::text()") %>% html_text(trim=TRUE)
## [1] "+2;"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_text()
## [1] "darkvision" "low-light vision" "Perception"
html_nodes(sb, xpath=".//p/b[contains(., 'Senses')]/
following-sibling::a[preceding-sibling::b[1][contains(., 'Senses')]]") %>%
html_nodes(xpath=".//following-sibling::text()") %>%
html_text()
## [1] " 60 ft., " "; " " +12"
html_nodes(sb, xpath = ".//p[@class='divider' and contains(., 'DEFENSE')]/following-sibling::p[2]") %>%
html_text() %>%
stri_match_all_regex(
"(AC[[:space:]]+[[:digit:]]+)|([[:alpha:]\-]+[[:space:]]+[[:digit:]]+)|([\-+][[:digit:]]+[[:space:]]+[[:alpha:]\-]+)|([[:alpha:]]+[[:space:]][\-+][[:digit:]]+)",
cg_missing = ""
) %>%
.[[1]] %>%
as_data_frame() %>%
select(-1) %>%
unite(col = "defense", sep = "")
## # A tibble: 14 x 1
## defense
## <chr>
## 1 AC 29
## 2 touch 12
## 3 flat-footed 27
## 4 +6 armor
## 5 +1 deflection
## 6 +2 Dex
## 7 +11 natural
## 8 -1 size
## 9 hp 151
## 10 +2 Favored
## 11 Class +84
## 12 Fort +16
## 13 Ref +8
## 14 Will +7
You'll still need some string ops but hopefully this helps a bit.
answered Nov 7 at 14:00
hrbrmstr
58.2k584143
58.2k584143
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53189629%2freading-roleplaying-games-statblocks-using-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown