使用 R 解析 BGG xml api 中的 xml 数据

Parsing xml data in BGG xml api with R

提问人:user8229029 提问时间:9/26/2022 最后编辑:user8229029 更新时间:9/27/2022 访问量:37

问:

这个问题是这个问题的第二部分:如何在 R 中为 BGG API 解析 xml 列表和表

我想为此表生成一个数据框:

<marketplacelistings>
  <listing>
    <listdate>Thu, 19 Jan 2006 22:08:15 +0000</listdate>
    <price currency="EUR">90.00</price>
    <condition>likenew</condition>
    <notes>Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Ritter/ 5-6 Spieler / extensions The Seafarers/ Cities and Knights/ 5-6 players); 3 x gespielt (Neuwertig; lediglich alle Bestandteile in EINER der Originalboxen verstaut) / 3 times played (like new; only all items in ONE original box stored); Abgabe nur komplett / selling only all together; KEIN Festpreis (nur um überhaupt etwas einzugeben) – erwarte Angebot! / no fixed price (just to complete the entries)– make an offer; Versand weltweit zu Lasten Käufer / shipping worldwide, paid by buyer</notes>
    <link href="https://boardgamegeek.com/market/product/40605" title="marketlisting"/>
  </listing>
  <listing>
    <listdate>Mon, 29 Sep 2008 15:25:32 +0000</listdate>
    <price currency="USD">34.95</price>
    <condition>new</condition>
    <notes>Brand New Sealed Board Game. Released from MayFair Games. Price is in USD. If you wish to pay in CAD...then we will convert at market rate. Shipping is $10.95 USD. We also carry the 5-6 Player Expansion that goes with this for $24.95 USD. We have sold thousands of board games across Canada. Please feel free to buy with confidence.</notes>
    <link href="https://boardgamegeek.com/market/product/116347" title="marketlisting"/>
  </listing>

这是我不知道该怎么办的地方。这个游戏大约有 100 个列表,我想从中制作一个数据框。我从哪里开始?下面的代码不起作用,因为它给出了 为 NULL 结果。

listings_df <- do.call(rbind,lapply(
  getNodeSet(xmltop, '//marketplacelistings'),
  function(x) data.frame(
    XML:::xmlAttrsToDataFrame(xmlChildren(x)),
    row.names = NULL
  )))

这个问题的完整文件在这里: https://boardgamegeek.com/xmlapi/boardgame/13&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&comments=1

编辑:这是我的最终解决方案。它可能不优雅,但它有效。

marketplace_df_func <- function(xmltop){

 marketplace_df <- data.frame(
listdate = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//listdate"), xmlValue),
currency = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//price[@currency]"), xmlAttrs),
price = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//price"), xmlValue),
condition = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//condition"), xmlValue))

marketplace_df$listdate <- substr(marketplace_df$listdate, 1, 25)

return(marketplace_df)}
r xml xml 解析

评论

0赞 Parfait 9/26/2022
您发布的 XML 中没有复数列表
0赞 user8229029 9/26/2022
我添加了我的最终解决方案。我无法让你的答案为我工作。不过,谢谢你!

答:

1赞 Parfait 9/26/2022 #1

由于此 XML 现在在元素中包含更多数据而不是属性,因此只需运行 accessible 而不循环:xmlToDataFramelapply

library(XML) 

url <- "..."
doc <- xmlParse(readLines(url))

listings_df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, "//listing"))
str(listings_df)
# 'data.frame': 103 obs. of  5 variables:
#  $ listdate : chr  "Thu, 19 Jan 2006 22:08:15 +0000" "Mon, 29 Sep 2008 15:25:32 +0000" "Sat, 18 Jul 2009 20:42:03 +0000" "Fri, 04 Dec 2009 14:25:25 +0000" ...
#  $ price    : chr  "90.00" "34.95" "49.00" "40.00" ...
#  $ condition: chr  "likenew" "new" "verygood" "new" ...
#  $ notes    : chr  "Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Rit"| __truncated__ "Brand New Sealed Board Game. Released from MayFair Games.  Price is in USD.  If you wish to pay in CAD...then w"| __truncated__ "inlcudes 5/6 player expansion" "" ...
#  $ link     : chr  "" "" "" "" ...

若要绑定基础属性,请使用特殊方法:

listings_df <- data.frame(
    xmlToDataFrame(doc, nodes = getNodeSet(doc, "//listing")),
    XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//listing/price")),
    XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//listing/link")),
    row.names = NULL
)
str(listings_df)
# 'data.frame': 103 obs. of  8 variables:
#  $ listdate : chr  "Thu, 19 Jan 2006 22:08:15 +0000" "Mon, 29 Sep 2008 15:25:32 +0000" "Sat, 18 Jul 2009 20:42:03 +0000" "Fri, 04 Dec 2009 14:25:25 +0000" ...
#  $ price    : chr  "90.00" "34.95" "49.00" "40.00" ...
#  $ condition: chr  "likenew" "new" "verygood" "new" ...
#  $ notes    : chr  "Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Rit"| __truncated__ "Brand New Sealed Board Game. Released from MayFair Games.  Price is in USD.  If you wish to pay in CAD...then w"| __truncated__ "inlcudes 5/6 player expansion" "" ...
#  $ link     : chr  "" "" "" "" ...
#  $ currency : chr  "EUR" "USD" "EUR" "EUR" ...
#  $ href     : chr  "https://boardgamegeek.com/market/product/40605" "https://boardgamegeek.com/market/product/116347" "https://boardgamegeek.com/market/product/158433" "https://boardgamegeek.com/market/product/181379" ...
#  $ title    : chr  "marketlisting" "marketlisting" "marketlisting" "marketlisting" ...

评论

0赞 Parfait 9/27/2022
奇怪!解决方案对我有用,发布了 URL。我编辑以显示 103 个 obs 数据帧的输出。也许你的不是解析的XML?干杯!xmltop