提问人:Michael 提问时间:10/15/2023 最后编辑:GuedesBFMichael 更新时间:10/15/2023 访问量:109
从大型列表中提取多个元素,并将它们放在 R 中的数据帧中
Extract multiple elements from a large list and put them in a dataframe in R
问:
这与此处发布的问题类似。
我使用 nhlapi 包来抓取一些 boxscores,这会产生一个大的嵌套列表。
我最终想得到的是一个数据帧,其中包含主队和客队的所有球员以及所有统计数据,就像它在 NHL 网站上显示的那样。
获取 boxscore 数据的代码为:
install.packages("nhlapi")
library(nhlapi)
boxscores<-nhl_games_boxscore(gameIds = 2023020001)
然后,我在类似的问题中使用了建议的答案。我可以使用以下方法获取玩家详细信息:
away_players <- boxscores[[1]][["teams"]][["away"]][["players"]]
df_away_players <- lapply(1:length(away_players), function(i) {
away_players[[i]][["person"]] %>%
data.frame()
}) %>%
bind_rows()
head(df_away_players)
id fullName link firstName lastName primaryNumber birthDate
1 8482062 Cole Smith /api/v1/people/8482062 Cole Smith 36 1995-10-28
2 8478508 Yakov Trenin /api/v1/people/8478508 Yakov Trenin 13 1997-01-13
3 8474679 Gustav Nyquist /api/v1/people/8474679 Gustav Nyquist 14 1989-09-01
4 8476925 Colton Sissons /api/v1/people/8476925 Colton Sissons 10 1993-11-05
5 8478438 Tommy Novak /api/v1/people/8478438 Tommy Novak 82 1997-04-28
6 8476887 Filip Forsberg /api/v1/people/8476887 Filip Forsberg 9 1994-08-13
我还可以使用以下方法获取 skaterStats:
df_away_stats <- lapply(1:length(away_players), function(i) {
away_players[[i]][["stats"]] %>%
data.frame()
}) %>%
bind_rows()
head(df_away_stats)
skaterStats.timeOnIce skaterStats.assists skaterStats.goals skaterStats.shots
1 12:52 0 0 1
2 13:54 0 0 1
3 14:53 1 0 2
4 14:45 0 0 3
5 15:43 0 1 3
6 20:51 2 0 6
我试图用这个将两者结合起来:
df_combined <- c(df_away_players, df_away_stats)
这确实会生成一个列表,但我无法弄清楚如何将所有这些信息放入数据帧中。
str(df_combined)
List of 63
$ id : int [1:22] 8482062 8478508 8474679 8476925 8478438 8476887 8474600 8474568 8481239 8478851 ...
$ fullName : chr [1:22] "Cole Smith" "Yakov Trenin" "Gustav Nyquist" "Colton Sissons" ...
$ link : chr [1:22] "/api/v1/people/8482062" "/api/v1/people/8478508" "/api/v1/people/8474679" "/api/v1/people/8476925" ...
$ firstName : chr [1:22] "Cole" "Yakov" "Gustav" "Colton" ...
$ lastName : chr [1:22] "Smith" "Trenin" "Nyquist" "Sissons" ...
$ primaryNumber : chr [1:22] "36" "13" "14" "10" ...
$ birthDate : chr [1:22] "1995-10-28" "1997-01-13" "1989-09-01" "1993-11-05" ...
$ currentAge : int [1:22] 27 26 34 29 26 29 33 33 23 27 ...
它应该看起来与 nhl 网站上显示的内容非常相似。使用 URL:https://www.nhl.com/gamecenter/nsh-vs-tbl/2023/10/10/2023020001/boxscore
这是它的样子:
如果可能的话,我还想添加日期,并且还可以为多个游戏执行此操作(我相信该函数接受多个 gameId),但我怀疑我需要某种循环?nhl_games_boxscore
前两名球员的输出,如以下评论中的要求:away_players
dput(df_away_players[c(1, 2)]))
away_players<- list(ID8482062 = list(person = list(id = 8482062L, fullName = "Cole Smith",
link = "/api/v1/people/8482062", firstName = "Cole", lastName = "Smith",
primaryNumber = "36", birthDate = "1995-10-28", currentAge = 27L,
birthCity = "Brainerd", birthStateProvince = "MN", birthCountry = "USA",
nationality = "USA", height = "6' 3\"", weight = 195L, active = TRUE,
alternateCaptain = FALSE, captain = FALSE, rookie = FALSE,
shootsCatches = "L", rosterStatus = "Y", currentTeam = list(
id = 18L, name = "Nashville Predators", link = "/api/v1/teams/18"),
primaryPosition = list(code = "L", name = "Left Wing", type = "Forward",
abbreviation = "LW")), jerseyNumber = "36", position = list(
code = "L", name = "Left Wing", type = "Forward", abbreviation = "LW"),
stats = list(skaterStats = list(timeOnIce = "12:52", assists = 0L,
goals = 0L, shots = 1L, hits = 2L, powerPlayGoals = 0L,
powerPlayAssists = 0L, penaltyMinutes = 0L, faceOffWins = 0L,
faceoffTaken = 0L, takeaways = 2L, giveaways = 1L, shortHandedGoals = 0L,
shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L,
evenTimeOnIce = "7:52", powerPlayTimeOnIce = "0:00",
shortHandedTimeOnIce = "5:00"))), ID8478508 = list(person = list(
id = 8478508L, fullName = "Yakov Trenin", link = "/api/v1/people/8478508",
firstName = "Yakov", lastName = "Trenin", primaryNumber = "13",
birthDate = "1997-01-13", currentAge = 26L, birthCity = "Chelyabinsk",
birthCountry = "RUS", nationality = "RUS", height = "6' 2\"",
weight = 201L, active = TRUE, alternateCaptain = FALSE, captain = FALSE,
rookie = FALSE, shootsCatches = "L", rosterStatus = "Y",
currentTeam = list(id = 18L, name = "Nashville Predators",
link = "/api/v1/teams/18"), primaryPosition = list(code = "C",
name = "Center", type = "Forward", abbreviation = "C")),
jerseyNumber = "13", position = list(code = "C", name = "Center",
type = "Forward", abbreviation = "C"), stats = list(skaterStats = list(
timeOnIce = "13:54", assists = 0L, goals = 0L, shots = 1L,
hits = 3L, powerPlayGoals = 0L, powerPlayAssists = 0L,
penaltyMinutes = 0L, faceOffWins = 0L, faceoffTaken = 0L,
takeaways = 3L, giveaways = 0L, shortHandedGoals = 0L,
shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L,
evenTimeOnIce = "10:51", powerPlayTimeOnIce = "0:00",
shortHandedTimeOnIce = "3:03"))))
答:
这很棘手,因为这是一个相当深度的嵌套列表,列表的元素大小不等。
由于所有列表都解析为单个 1 元素的“节点”,因此我们可以将每个子列表保留为最顶层的节点。
然后,将从生成的命名向量创建 data.frames。
这将以长格式创建 data.frames 列表。
然后,我们可以创建一个整洁的 data.frame,每个玩家只有一行。
最后,创建最终的 data.frame。列名变得有点笨拙,但可以很容易地用 或 修改。away_players
unlist
enframe
pivot_wider
bind_rows
rename_with
janitor::clean_names
library(purrr)
library(dplyr)
library(tibble)
away_players |>
map(unlist) |>
map(enframe) |>
map(\(x) pivot_wider(x,
names_from = name,
values_from = value)) |>
bind_rows()
# A tibble: 2 × 51
person.id person.fullName person.link person.firstName person.lastName person.primaryNumber person.birthDate
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 8482062 Cole Smith /api/v1/people/8482… Cole Smith 36 1995-10-28
2 8478508 Yakov Trenin /api/v1/people/8478… Yakov Trenin 13 1997-01-13
# ℹ 44 more variables: person.currentAge <chr>, person.birthCity <chr>, person.birthStateProvince <chr>,
# person.birthCountry <chr>, person.nationality <chr>, person.height <chr>, person.weight <chr>,
# person.active <chr>, person.alternateCaptain <chr>, person.captain <chr>, person.rookie <chr>,
# person.shootsCatches <chr>, person.rosterStatus <chr>, person.currentTeam.id <chr>, person.currentTeam.name <chr>,
# person.currentTeam.link <chr>, person.primaryPosition.code <chr>, person.primaryPosition.name <chr>,
# person.primaryPosition.type <chr>, person.primaryPosition.abbreviation <chr>, jerseyNumber <chr>,
# position.code <chr>, position.name <chr>, position.type <chr>, position.abbreviation <chr>, …
评论
home_players
rbind
df
nhl_games_boxscore
gameid
boxscores<-nhl_games_boxscore(gameIds = c(2023020001, 2023020002)); away_players <- map(boxstores, \(x) x[[1]][["teams"]][["away"]][["players"]])
map(away_players, \(game) game |> map(unlist).....
away_players <- map(boxscores, \(x) x[[1]][["teams"]][["away"]][["players"]])
subscript out of bounds
subscript out of bounds
评论
dplyr::bind_cols(df_away_players, df_away_stats)
join
join
dplyr::left_join(df_away_players, df_away_stats, by = id_column_name)
df_away_stats
lapply(1:length(away_players), function(i) { away_players[[i]][["person"]] ...
purrr::map(away_players, purrr:::pluck("person"))...
lapply(away_players, \(x) x[["person"]])
away_players
dput(away_players)