如何在 R 中为 BGG API 解析 xml 列表和表

How to parse xml lists and tables in R for BGG API

提问人:user8229029 提问时间:9/25/2022 最后编辑:user8229029 更新时间:9/25/2022 访问量:90

问:

我想为我的棋盘游戏集合编写一个 R Shiny 应用程序,并且需要从棋盘游戏极客 API 获取数据来执行此操作。这是一个基于 xml 的 API,显然,它留给用户来弄清楚一切。无论如何,我不是网络程序员,并且在某些方面遇到了一些困难。请注意,我的代码都不是执行此操作的最佳方法。一个示例网页是:https://boardgamegeek.com/xmlapi/boardgame/354242。这是一个很长的页面,我不想把它全部复制过来,所以如果我在这里复制的不够多,请看一下。

<boardgameintegration objectid="353880" inbound="true">Moly Atrapa</boardgameintegration>
<poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="0">
<results numplayers="1">
  <result value="Best" numvotes="0"/>
  <result value="Recommended" numvotes="0"/>
  <result value="Not Recommended" numvotes="0"/>
</results>
<results numplayers="2">
  <result value="Best" numvotes="0"/>
  <result value="Recommended" numvotes="0"/>
  <result value="Not Recommended" numvotes="0"/>
</results>
<results numplayers="3">
  <result value="Best" numvotes="0"/>
  <result value="Recommended" numvotes="0"/>
  <result value="Not Recommended" numvotes="0"/>
</results>

</poll>
  <poll name="language_dependence" title="Language Dependence" totalvotes="0">
    <results>
      <result level="1" value="No necessary in-game text" numvotes="0"/>
      <result level="2" value="Some necessary text - easily memorized or small crib sheet" numvotes="0"/>
      <result level="3" value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0"/>
      <result level="4" value="Extensive use of text - massive conversion needed to be playable" numvotes="0"/>
      <result level="5" value="Unplayable in another language" numvotes="0"/>
    </results>
 </poll>

我的主要问题是 1) 如何提取“名称”、“标题”和“总票数”属性(这就是它们的名字 - 我找不到这些东西!),以及 2) 如何对不同“numplayers”的不同结果做同样的事情?到目前为止,我的代码看起来像这样,但它只是让我提取了发布日期等信息。

library(XML)
library(methods)
library(xml2)
library(rvest)

data <- read_xml("https://boardgamegeek.com/xmlapi/boardgame/35424")
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)
xmltop[['boardgame']][['yearpublished']][1]$text # gets the year published
r xml xml 解析

评论

0赞 IRTFM 9/25/2022
该元素只是:。我没有看到任何总票数的元素。namexmltop[['boardgame']][['name']]title or

答:

1赞 IRTFM 9/25/2022 #1

我认为您需要xmlAttrs(所以要搜索的术语是“属性”)

# used a different entry:
data <- read_xml("https://boardgamegeek.com/xmlapi/boardgame/78985")
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)

xmlAttrs(xmltop[['boardgame']][['poll']])
                              name 
            "suggested_numplayers" 
                             title 
"User Suggested Number of Players" 
                        totalvotes 
                               "0" 

您可以从该属性向量中提取:

xmlAttrs(xmltop[['boardgame']][['poll']])['name']
                  name 
"suggested_numplayers" 

如果将输出分配给 R 数据对象名称,则会获得一个命名的字符向量:

> attrs <- xmlAttrs(xmltop[['boardgame']][['poll']])
> str(attrs)
 Named chr [1:3] "suggested_numplayers" ...
 - attr(*, "names")= chr [1:3] "name" "title" "totalvotes"

您可能会发现此演示文稿很有帮助:

https://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf

2赞 Parfait 9/25/2022 #2

要检索这些属性,请考虑 XML 的 undocumented ,可使用三重冒号运算符访问:xmlAttrsToDataFrame

library(XML) 

url <- "https://boardgamegeek.com/xmlapi/boardgame/354242"

doc <- xmlParse(readLines(url))

poll_df <- XML:::xmlAttrsToDataFrame(getNodeSet(doc, '//poll'))
poll_df
#                   name                            title totalvotes
# 1 suggested_numplayers User Suggested Number of Players          0
# 2  language_dependence              Language Dependence          0
# 3  suggested_playerage        User Suggested Player Age          0

results_dfs <- lapply(
  getNodeSet(doc, '//poll[@name="suggested_numplayers"]/results'),
  function(x) data.frame(
    numplayers = xmlAttrs(x)["numplayers"],
    XML:::xmlAttrsToDataFrame(xmlChildren(x)),
    row.names = NULL
  )
)

result_df <- do.call(rbind, results_dfs)
result_df
#    numplayers           value numvotes
# 1           1            Best        0
# 2           1     Recommended        0
# 3           1 Not Recommended        0
# 4           2            Best        0
# 5           2     Recommended        0
# 6           2 Not Recommended        0
# 7           3            Best        0
# 8           3     Recommended        0
# 9           3 Not Recommended        0
# 10          4            Best        0
# 11          4     Recommended        0
# 12          4 Not Recommended        0
# 13          5            Best        0
# 14          5     Recommended        0
# 15          5 Not Recommended        0
# 16          6            Best        0
# 17          6     Recommended        0
# 18          6 Not Recommended        0
# 19         6+            Best        0
# 20         6+     Recommended        0
# 21         6+ Not Recommended        0