从大型列表中提取多个元素,并将它们放在 R 中的数据帧中

Extract multiple elements from a large list and put them in a dataframe in R

提问人:Michael 提问时间:10/15/2023 最后编辑:GuedesBFMichael 更新时间:10/15/2023 访问量:109

问:

这与此处发布的问题类似。

我使用 nhlapi 包来抓取一些 boxscores,这会产生一个大的嵌套列表。

我最终想得到的是一个数据帧,其中包含主队和客队的所有球员以及所有统计数据,就像它在 NHL 网站上显示的那样。

获取 boxscore 数据的代码为:

install.packages("nhlapi")
library(nhlapi)

boxscores<-nhl_games_boxscore(gameIds = 2023020001)

然后,我在类似的问题中使用了建议的答案。我可以使用以下方法获取玩家详细信息:

away_players <- boxscores[[1]][["teams"]][["away"]][["players"]]

df_away_players <- lapply(1:length(away_players), function(i) {
  away_players[[i]][["person"]] %>% 
    data.frame()
}) %>% 
  bind_rows()

head(df_away_players)

       id       fullName                   link firstName lastName primaryNumber  birthDate
1 8482062     Cole Smith /api/v1/people/8482062      Cole    Smith            36 1995-10-28
2 8478508   Yakov Trenin /api/v1/people/8478508     Yakov   Trenin            13 1997-01-13
3 8474679 Gustav Nyquist /api/v1/people/8474679    Gustav  Nyquist            14 1989-09-01
4 8476925 Colton Sissons /api/v1/people/8476925    Colton  Sissons            10 1993-11-05
5 8478438    Tommy Novak /api/v1/people/8478438     Tommy    Novak            82 1997-04-28
6 8476887 Filip Forsberg /api/v1/people/8476887     Filip Forsberg             9 1994-08-13

我还可以使用以下方法获取 skaterStats:

df_away_stats <- lapply(1:length(away_players), function(i) {
  away_players[[i]][["stats"]] %>% 
    data.frame()
}) %>% 
  bind_rows()

head(df_away_stats)
  skaterStats.timeOnIce skaterStats.assists skaterStats.goals skaterStats.shots
1                 12:52                   0                 0                 1
2                 13:54                   0                 0                 1
3                 14:53                   1                 0                 2
4                 14:45                   0                 0                 3
5                 15:43                   0                 1                 3
6                 20:51                   2                 0                 6

我试图用这个将两者结合起来:

df_combined <- c(df_away_players, df_away_stats)

这确实会生成一个列表,但我无法弄清楚如何将所有这些信息放入数据帧中。

str(df_combined)
List of 63
 $ id                                    : int [1:22] 8482062 8478508 8474679 8476925 8478438 8476887 8474600 8474568 8481239 8478851 ...
 $ fullName                              : chr [1:22] "Cole Smith" "Yakov Trenin" "Gustav Nyquist" "Colton Sissons" ...
 $ link                                  : chr [1:22] "/api/v1/people/8482062" "/api/v1/people/8478508" "/api/v1/people/8474679" "/api/v1/people/8476925" ...
 $ firstName                             : chr [1:22] "Cole" "Yakov" "Gustav" "Colton" ...
 $ lastName                              : chr [1:22] "Smith" "Trenin" "Nyquist" "Sissons" ...
 $ primaryNumber                         : chr [1:22] "36" "13" "14" "10" ...
 $ birthDate                             : chr [1:22] "1995-10-28" "1997-01-13" "1989-09-01" "1993-11-05" ...
 $ currentAge                            : int [1:22] 27 26 34 29 26 29 33 33 23 27 ...

它应该看起来与 nhl 网站上显示的内容非常相似。使用 URL:https://www.nhl.com/gamecenter/nsh-vs-tbl/2023/10/10/2023020001/boxscore

这是它的样子:

enter image description here

如果可能的话,我还想添加日期,并且还可以为多个游戏执行此操作(我相信该函数接受多个 gameId),但我怀疑我需要某种循环?nhl_games_boxscore

前两名球员的输出,如以下评论中的要求:away_players

dput(df_away_players[c(1, 2)]))

away_players<- list(ID8482062 = list(person = list(id = 8482062L, fullName = "Cole Smith", 
                                    link = "/api/v1/people/8482062", firstName = "Cole", lastName = "Smith", 
                                    primaryNumber = "36", birthDate = "1995-10-28", currentAge = 27L, 
                                    birthCity = "Brainerd", birthStateProvince = "MN", birthCountry = "USA", 
                                    nationality = "USA", height = "6' 3\"", weight = 195L, active = TRUE, 
                                    alternateCaptain = FALSE, captain = FALSE, rookie = FALSE, 
                                    shootsCatches = "L", rosterStatus = "Y", currentTeam = list(
                                        id = 18L, name = "Nashville Predators", link = "/api/v1/teams/18"), 
                                    primaryPosition = list(code = "L", name = "Left Wing", type = "Forward", 
                                                           abbreviation = "LW")), jerseyNumber = "36", position = list(
                                                               code = "L", name = "Left Wing", type = "Forward", abbreviation = "LW"), 
                      stats = list(skaterStats = list(timeOnIce = "12:52", assists = 0L, 
                                                      goals = 0L, shots = 1L, hits = 2L, powerPlayGoals = 0L, 
                                                      powerPlayAssists = 0L, penaltyMinutes = 0L, faceOffWins = 0L, 
                                                      faceoffTaken = 0L, takeaways = 2L, giveaways = 1L, shortHandedGoals = 0L, 
                                                      shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L, 
                                                      evenTimeOnIce = "7:52", powerPlayTimeOnIce = "0:00", 
                                                      shortHandedTimeOnIce = "5:00"))), ID8478508 = list(person = list(
                                                          id = 8478508L, fullName = "Yakov Trenin", link = "/api/v1/people/8478508", 
                                                          firstName = "Yakov", lastName = "Trenin", primaryNumber = "13", 
                                                          birthDate = "1997-01-13", currentAge = 26L, birthCity = "Chelyabinsk", 
                                                          birthCountry = "RUS", nationality = "RUS", height = "6' 2\"", 
                                                          weight = 201L, active = TRUE, alternateCaptain = FALSE, captain = FALSE, 
                                                          rookie = FALSE, shootsCatches = "L", rosterStatus = "Y", 
                                                          currentTeam = list(id = 18L, name = "Nashville Predators", 
                                                                             link = "/api/v1/teams/18"), primaryPosition = list(code = "C", 
                                                                                                                                name = "Center", type = "Forward", abbreviation = "C")), 
                                                          jerseyNumber = "13", position = list(code = "C", name = "Center", 
                                                                                               type = "Forward", abbreviation = "C"), stats = list(skaterStats = list(
                                                                                                   timeOnIce = "13:54", assists = 0L, goals = 0L, shots = 1L, 
                                                                                                   hits = 3L, powerPlayGoals = 0L, powerPlayAssists = 0L, 
                                                                                                   penaltyMinutes = 0L, faceOffWins = 0L, faceoffTaken = 0L, 
                                                                                                   takeaways = 3L, giveaways = 0L, shortHandedGoals = 0L, 
                                                                                                   shortHandedAssists = 0L, blocked = 0L, plusMinus = 0L, 
                                                                                                   evenTimeOnIce = "10:51", powerPlayTimeOnIce = "0:00", 
                                                                                                   shortHandedTimeOnIce = "3:03"))))
r dplyr 咕噜

评论

1赞 GuedesBF 10/15/2023
你在找吗?dplyr::bind_cols(df_away_players, df_away_stats)
1赞 GuedesBF 10/15/2023
那时您可能正在研究一项操作。为此,您应该在两个 data.frame 中都有一个适当的 ID 变量,否则您可能会连接来自不同播放器的数据。典型的是 .您可以更改提取方式,因为 id 在途中丢失了。joinjoindplyr::left_join(df_away_players, df_away_stats, by = id_column_name)df_away_stats
1赞 GuedesBF 10/15/2023
用于提取嵌套列表的代码可能有效,但看起来也有点复杂。不会产生与更简单的相同的结果吗?lapply(1:length(away_players), function(i) { away_players[[i]][["person"]] ...purrr::map(away_players, purrr:::pluck("person"))...
1赞 GuedesBF 10/15/2023
或者也许 ?lapply(away_players, \(x) x[["person"]])
1赞 GuedesBF 10/15/2023
无论如何,您的所有信息都首先收集在对象中。请共享代码,以便我们可以重现此对象。您可以通过粘贴away_playersdput(away_players)

答:

1赞 GuedesBF 10/15/2023 #1

这很棘手,因为这是一个相当深度的嵌套列表,列表的元素大小不等。 由于所有列表都解析为单个 1 元素的“节点”,因此我们可以将每个子列表保留为最顶层的节点。 然后,将从生成的命名向量创建 data.frames。 这将以长格式创建 data.frames 列表。 然后,我们可以创建一个整洁的 data.frame,每个玩家只有一行。 最后,创建最终的 data.frame。列名变得有点笨拙,但可以很容易地用 或 修改。away_playersunlistenframepivot_widerbind_rowsrename_withjanitor::clean_names

library(purrr)
library(dplyr)
library(tibble)

away_players |> 
    map(unlist) |> 
    map(enframe) |> 
    map(\(x) pivot_wider(x,
                         names_from = name,
                         values_from = value)) |> 
    bind_rows()

# A tibble: 2 × 51
  person.id person.fullName person.link          person.firstName person.lastName person.primaryNumber person.birthDate
  <chr>     <chr>           <chr>                <chr>            <chr>           <chr>                <chr>           
1 8482062   Cole Smith      /api/v1/people/8482… Cole             Smith           36                   1995-10-28      
2 8478508   Yakov Trenin    /api/v1/people/8478… Yakov            Trenin          13                   1997-01-13      
# ℹ 44 more variables: person.currentAge <chr>, person.birthCity <chr>, person.birthStateProvince <chr>,
#   person.birthCountry <chr>, person.nationality <chr>, person.height <chr>, person.weight <chr>,
#   person.active <chr>, person.alternateCaptain <chr>, person.captain <chr>, person.rookie <chr>,
#   person.shootsCatches <chr>, person.rosterStatus <chr>, person.currentTeam.id <chr>, person.currentTeam.name <chr>,
#   person.currentTeam.link <chr>, person.primaryPosition.code <chr>, person.primaryPosition.name <chr>,
#   person.primaryPosition.type <chr>, person.primaryPosition.abbreviation <chr>, jerseyNumber <chr>,
#   position.code <chr>, position.name <chr>, position.type <chr>, position.abbreviation <chr>, …

评论

0赞 Michael 10/15/2023
完善!我可以为一个完整的人做同样的事情.但问题是,您知道如何添加 gameId 吗?甚至可能是日期?我知道该函数接受多个 gameId,我想知道如何为多个游戏执行此操作?需要一个循环吗?home_playersrbinddfnhl_games_boxscore
1赞 GuedesBF 10/15/2023
不确定。你不能创建一个具有新最高层的列表吗?类似 ?,然后循环创建游戏列表?gameidboxscores<-nhl_games_boxscore(gameIds = c(2023020001, 2023020002)); away_players <- map(boxstores, \(x) x[[1]][["teams"]][["away"]][["players"]])map(away_players, \(game) game |> map(unlist).....
0赞 Michael 10/15/2023
我试过了,但有错误?away_players <- map(boxscores, \(x) x[[1]][["teams"]][["away"]][["players"]])subscript out of bounds
0赞 Michael 10/16/2023
关于如何解决错误的任何想法?subscript out of bounds