提问人:littleworth 提问时间:8/2/2017 最后编辑:littleworth 更新时间:4/13/2023 访问量:36496
如何将列表列表转换为 tibble (dataframe)
How to convert list of list into a tibble (dataframe)
问:
我有以下列表列表。它包含两个变量:配对和基因。的 contains 始终是包含两个字符串的向量。变量是一个向量,可以包含 1 个以上的值。pair
genes
lol <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"), .Names = c("pair",
"genes")), structure(list(pair = c("BoneMarrow", "Umbilical"),
genes = "GNB2L1"), .Names = c("pair", "genes")), structure(list(
pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair",
"genes")))
lol
#> [[1]]
#> [[1]]$pair
#> [1] "BoneMarrow" "Pulmonary"
#>
#> [[1]]$genes
#> [1] "PRR11"
#>
#>
#> [[2]]
#> [[2]]$pair
#> [1] "BoneMarrow" "Umbilical"
#>
#> [[2]]$genes
#> [1] "GNB2L1"
#>
#>
#> [[3]]
#> [[3]]$pair
#> [1] "Pulmonary" "Umbilical"
#>
#> [[3]]$genes
#> [1] "ATP1B1"
如何将其转换为此数据帧:
pair1 pair2 genes_vec
BoneMarrow Pulmonary PRR11
BoneMarrow Umbilical GNB2L1
Pulmonary Umbilical ATP1B1
请注意,变量是一个向量,而不是单个字符串。genes
我最好的尝试是这个没有给出我想要的:
> do.call(rbind, lapply(lol, data.frame, stringsAsFactors=FALSE))
pair genes
1 BoneMarrow PRR11
2 Pulmonary PRR11
3 BoneMarrow GNB2L1
4 Umbilical GNB2L1
5 Pulmonary ATP1B1
6 Umbilical ATP1B1
更新:
用新示例显示矢量内容genes
lol2 <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = c("GNB2L1",
"PRR11")), .Names = c("pair", "genes")), structure(list(pair = c("BoneMarrow",
"Umbilical"), genes = "GNB2L1"), .Names = c("pair", "genes")),
structure(list(pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair",
"genes")))
lol2
#> [[1]]
#> [[1]]$pair
#> [1] "BoneMarrow" "Pulmonary"
#>
#> [[1]]$genes
#> [1] "GNB2L1" "PRR11"
#>
#>
#> [[2]]
#> [[2]]$pair
#> [1] "BoneMarrow" "Umbilical"
#>
#> [[2]]$genes
#> [1] "GNB2L1"
#>
#>
#> [[3]]
#> [[3]]$pair
#> [1] "Pulmonary" "Umbilical"
#>
#> [[3]]$genes
#> [1] "ATP1B1"
预期输出为:
pair1 pair2 genes_vec
BoneMarrow Pulmonary PRR11,GNB2L1
BoneMarrow Umbilical GNB2L1
Pulmonary Umbilical ATP1B1
答:
1赞
Prasanna Nandakumar
8/2/2017
#1
> lol1 <- data.frame(t(sapply(lol,c)))
> as.data.frame(t(apply(lol1, 1, unlist)))
pair1 pair2 genes
1 BoneMarrow Pulmonary PRR11
2 BoneMarrow Umbilical GNB2L1
3 Pulmonary Umbilical ATP1B1
评论
0赞
littleworth
8/2/2017
谢谢。不完全是我想要的,我怎样才能进一步将这对分成两列?
0赞
Prasanna Nandakumar
8/2/2017
@yaffle 更新了解决方案
0赞
littleworth
8/2/2017
谢谢,但是当向量时,您的最新方法似乎失败了。查看我的更新genes
2赞
Florian
8/2/2017
#2
编辑:更新为使用矢量lol2。
也许是这样的:
as.data.frame(do.call(rbind,lapply(lol2, function(x) {c(unlist(x[1]),gene=paste(unlist(x[2]),collapse=","))})),stringsAsFactors = F)
pair1 pair2 genes
1 BoneMarrow Pulmonary GNB2L1, PRR11
2 BoneMarrow Umbilical GNB2L1
3 Pulmonary Umbilical ATP1B1
评论
0赞
littleworth
8/2/2017
谢谢。当向量时,您的方法似乎失败了。查看我的更新。genes
0赞
littleworth
8/2/2017
感谢有没有一种方法可以将您最后的输出简化为仅使用 3 个变量(列)的简单数据帧。现在显示数据框包含嵌套列表。要进一步检查,您可以尝试 .str()
as.tibble(your_outpput)
0赞
Florian
8/2/2017
是的,您可以粘贴基因列,再次更新,希望这能更好地反映您的预期输出。
16赞
cderv
8/2/2017
#3
使用 ,你可以用它来帮助你tidyverse
purrr
library(dplyr)
library(purrr)
tibble(
pair = map(lol, "pair"),
genes_vec = map_chr(lol, "genes")
) %>%
mutate(
pair1 = map_chr(pair, 1),
pair2 = map_chr(pair, 2)
) %>%
select(pair1, pair2, genes_vec)
#> # A tibble: 3 x 3
#> pair1 pair2 genes_vec
#> <chr> <chr> <chr>
#> 1 BoneMarrow Pulmonary PRR11
#> 2 BoneMarrow Umbilical GNB2L1
#> 3 Pulmonary Umbilical ATP1B1
在第二个示例中,只需将 替换为 as you want to keep a nested DataFrame with a list column.map_chr(lol, "genes")
map(lol2, "genes")
tibble(
pair = map(lol2, "pair"),
genes_vec = map(lol2, "genes")
) %>%
mutate(
pair1 = map_chr(pair, 1),
pair2 = map_chr(pair, 2)
) %>%
select(pair1, pair2, genes_vec)
#> # A tibble: 3 x 3
#> pair1 pair2 genes_vec
#> <chr> <chr> <list>
#> 1 BoneMarrow Pulmonary <chr [2]>
#> 2 BoneMarrow Umbilical <chr [1]>
#> 3 Pulmonary Umbilical <chr [1]>
更通用的方法是使用嵌套的 tibble 并根据需要取消嵌套它们
library(dplyr)
library(purrr)
library(tidyr)
tab1 <-lol %>%
transpose() %>%
as_tibble() %>%
mutate(pair = map(pair, ~as_tibble(t(.x)))) %>%
mutate(pair = map(pair, ~set_names(.x, c("pair1", "pair2"))))
tab1
#> # A tibble: 3 x 2
#> pair genes
#> <list> <list>
#> 1 <tibble [1 x 2]> <chr [1]>
#> 2 <tibble [1 x 2]> <chr [1]>
#> 3 <tibble [1 x 2]> <chr [1]>
除非列表而不是lol2
lol2
lol1
tab2 <- lol2 %>%
transpose() %>%
as_tibble() %>%
mutate(pair = map(pair, ~as_tibble(t(.x)))) %>%
mutate(pair = map(pair, ~set_names(.x, c("pair1", "pair2"))))
tab2
#> # A tibble: 3 x 2
#> pair genes
#> <list> <list>
#> 1 <tibble [1 x 2]> <chr [2]>
#> 2 <tibble [1 x 2]> <chr [1]>
#> 3 <tibble [1 x 2]> <chr [1]>
然后,您可以取消嵌套所需的列
tab1 %>%
unnest()
#> # A tibble: 3 x 3
#> genes pair1 pair2
#> <chr> <chr> <chr>
#> 1 PRR11 BoneMarrow Pulmonary
#> 2 GNB2L1 BoneMarrow Umbilical
#> 3 ATP1B1 Pulmonary Umbilical
tab2 %>%
unnest(pair)
#> # A tibble: 3 x 3
#> genes pair1 pair2
#> <list> <chr> <chr>
#> 1 <chr [2]> BoneMarrow Pulmonary
#> 2 <chr [1]> BoneMarrow Umbilical
#> 3 <chr [1]> Pulmonary Umbilical
2赞
Onyambu
8/2/2017
#4
这应该有效:
data.frame(do.call(rbind,lol2))
data.frame(do.call(rbind,lol2))
pair genes
1 BoneMarrow, Pulmonary GNB2L1, PRR11
2 BoneMarrow, Umbilical GNB2L1
3 Pulmonary, Umbilical ATP1B1
将基因视为载体的方式与将基因对视为载体的方式相同:您只需同时使用它们,而不是对 1 和 2。
3赞
Matifou
12/5/2020
#5
对于第一个问题,与其他答案几乎相同,只是略短/更紧凑:
library(tidyverse)
lol <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"),
.Names = c("pair", "genes")),
structure(list(pair = c("BoneMarrow", "Umbilical"), genes = "GNB2L1"),
.Names = c("pair", "genes")),
structure(list(pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair","genes")))
map_dfr(lol, ~as_tibble(.) %>%
mutate(row=paste0("pair", row_number()))%>%
spread(row, pair) %>%
select(pair1, pair2, genes))
#> # A tibble: 3 x 3
#> pair1 pair2 genes
#> <chr> <chr> <chr>
#> 1 BoneMarrow Pulmonary PRR11
#> 2 BoneMarrow Umbilical GNB2L1
#> 3 Pulmonary Umbilical ATP1B1
创建于 2020-12-04 由 reprex 软件包 (v0.3.0)
0赞
petzi
4/13/2023
#6
另一个整洁的解决方案:版本结果与 OP 中的结果相同。 将基因载体分成适当数量的列:lol
lol2
lol2 <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = c("GNB2L1",
"PRR11")), .Names = c("pair", "genes")), structure(list(pair = c("BoneMarrow",
"Umbilical"), genes = "GNB2L1"), .Names = c("pair", "genes")),
structure(list(pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair",
"genes")))
lol2_result <- lol2 |>
purrr::transpose() |>
tibble::as_tibble() |>
tidyr::unnest_wider(col = c(pair, genes), names_sep = "_")
lol2_result
#> # A tibble: 3 × 4
#> pair_1 pair_2 genes_1 genes_2
#> <chr> <chr> <chr> <chr>
#> 1 BoneMarrow Pulmonary GNB2L1 PRR11
#> 2 BoneMarrow Umbilical GNB2L1 <NA>
#> 3 Pulmonary Umbilical ATP1B1 <NA>
创建于 2023-04-13 with reprex v2.0.2
评论