提问人:user8229029 提问时间:9/25/2022 最后编辑:user8229029 更新时间:9/25/2022 访问量:90
如何在 R 中为 BGG API 解析 xml 列表和表
How to parse xml lists and tables in R for BGG API
问:
我想为我的棋盘游戏集合编写一个 R Shiny 应用程序,并且需要从棋盘游戏极客 API 获取数据来执行此操作。这是一个基于 xml 的 API,显然,它留给用户来弄清楚一切。无论如何,我不是网络程序员,并且在某些方面遇到了一些困难。请注意,我的代码都不是执行此操作的最佳方法。一个示例网页是:https://boardgamegeek.com/xmlapi/boardgame/354242。这是一个很长的页面,我不想把它全部复制过来,所以如果我在这里复制的不够多,请看一下。
<boardgameintegration objectid="353880" inbound="true">Moly Atrapa</boardgameintegration>
<poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="0">
<results numplayers="1">
<result value="Best" numvotes="0"/>
<result value="Recommended" numvotes="0"/>
<result value="Not Recommended" numvotes="0"/>
</results>
<results numplayers="2">
<result value="Best" numvotes="0"/>
<result value="Recommended" numvotes="0"/>
<result value="Not Recommended" numvotes="0"/>
</results>
<results numplayers="3">
<result value="Best" numvotes="0"/>
<result value="Recommended" numvotes="0"/>
<result value="Not Recommended" numvotes="0"/>
</results>
</poll>
<poll name="language_dependence" title="Language Dependence" totalvotes="0">
<results>
<result level="1" value="No necessary in-game text" numvotes="0"/>
<result level="2" value="Some necessary text - easily memorized or small crib sheet" numvotes="0"/>
<result level="3" value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0"/>
<result level="4" value="Extensive use of text - massive conversion needed to be playable" numvotes="0"/>
<result level="5" value="Unplayable in another language" numvotes="0"/>
</results>
</poll>
我的主要问题是 1) 如何提取“名称”、“标题”和“总票数”属性(这就是它们的名字 - 我找不到这些东西!),以及 2) 如何对不同“numplayers”的不同结果做同样的事情?到目前为止,我的代码看起来像这样,但它只是让我提取了发布日期等信息。
library(XML)
library(methods)
library(xml2)
library(rvest)
data <- read_xml("https://boardgamegeek.com/xmlapi/boardgame/35424")
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)
xmltop[['boardgame']][['yearpublished']][1]$text # gets the year published
答:
1赞
IRTFM
9/25/2022
#1
我认为您需要xmlAttrs(所以要搜索的术语是“属性”)
# used a different entry:
data <- read_xml("https://boardgamegeek.com/xmlapi/boardgame/78985")
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)
xmlAttrs(xmltop[['boardgame']][['poll']])
name
"suggested_numplayers"
title
"User Suggested Number of Players"
totalvotes
"0"
您可以从该属性向量中提取:
xmlAttrs(xmltop[['boardgame']][['poll']])['name']
name
"suggested_numplayers"
如果将输出分配给 R 数据对象名称,则会获得一个命名的字符向量:
> attrs <- xmlAttrs(xmltop[['boardgame']][['poll']])
> str(attrs)
Named chr [1:3] "suggested_numplayers" ...
- attr(*, "names")= chr [1:3] "name" "title" "totalvotes"
您可能会发现此演示文稿很有帮助:
https://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf
2赞
Parfait
9/25/2022
#2
要检索这些属性,请考虑 XML 的 undocumented ,可使用三重冒号运算符访问:xmlAttrsToDataFrame
library(XML)
url <- "https://boardgamegeek.com/xmlapi/boardgame/354242"
doc <- xmlParse(readLines(url))
poll_df <- XML:::xmlAttrsToDataFrame(getNodeSet(doc, '//poll'))
poll_df
# name title totalvotes
# 1 suggested_numplayers User Suggested Number of Players 0
# 2 language_dependence Language Dependence 0
# 3 suggested_playerage User Suggested Player Age 0
results_dfs <- lapply(
getNodeSet(doc, '//poll[@name="suggested_numplayers"]/results'),
function(x) data.frame(
numplayers = xmlAttrs(x)["numplayers"],
XML:::xmlAttrsToDataFrame(xmlChildren(x)),
row.names = NULL
)
)
result_df <- do.call(rbind, results_dfs)
result_df
# numplayers value numvotes
# 1 1 Best 0
# 2 1 Recommended 0
# 3 1 Not Recommended 0
# 4 2 Best 0
# 5 2 Recommended 0
# 6 2 Not Recommended 0
# 7 3 Best 0
# 8 3 Recommended 0
# 9 3 Not Recommended 0
# 10 4 Best 0
# 11 4 Recommended 0
# 12 4 Not Recommended 0
# 13 5 Best 0
# 14 5 Recommended 0
# 15 5 Not Recommended 0
# 16 6 Best 0
# 17 6 Recommended 0
# 18 6 Not Recommended 0
# 19 6+ Best 0
# 20 6+ Recommended 0
# 21 6+ Not Recommended 0
评论
name
xmltop[['boardgame']][['name']]
title or