提问人:Mohamed Rahouma 提问时间:7/23/2023 最后编辑:Darren TsaiMohamed Rahouma 更新时间:7/23/2023 访问量:74
将事件级数据集转换为患者级数据(在 r 中)
Convert the event-level dataset to the patient-level data in r
问:
我需要将事件级数据集转换为患者级数据,即将长数据集转换为更宽的数据集,作为关键变量。此外,我想为每个生成的事件及其事件时间创建列。如果同一患者有多个事件,请考虑最早的事件时间。deidnum
下面是一个类似的数据示例和我的代码:
df <- read.table(text = "deidnum,eventc,EVENTDT,MI_COMPLICATED
325107,MI,21,1
325107,New Rose Dyspnea Scale 2 or more,1468,NA
418351,New Rose Dyspnea Scale 2 or more,207,NA
839172,New Rose Dyspnea Scale 2 or more,1060,NA
839172,New Rose Dyspnea Scale 2 or more,1718,NA
1487422,MI,990,0
1487422,DEATH,1113,NA
1511165,MI,424,0
1511165,MI,608,1
1511165,New Rose Dyspnea Scale 2 or more,721,NA
", sep = ",", header = TRUE)
library(reshape2)
wide.df <- dcast(df, deidnum ~ eventc)
wide.df
电流输出
deidnum DEATH MI New Rose Dyspnea Scale 2 or more
1 325107 0 1 1
2 418351 0 0 1
3 839172 0 0 2
4 1487422 1 1 0
5 1511165 0 2 1
任何建议将不胜感激。
答:
2赞
Darren Tsai
7/23/2023
#1
工作流程:tidyverse
library(tidyr)
library(dplyr)
df %>%
slice_min(EVENTDT, by = c(deidnum, eventc)) %>%
pivot_wider(id_cols = deidnum, names_from = eventc,
values_from = c(eventc, EVENTDT),
values_fn = list(eventc = length),
values_fill = list(eventc = 0),
unused_fn = first) %>%
rename_with(~ sub("eventc_", "", .x), starts_with("eventc"))
# # A tibble: 5 × 8
# deidnum MI `New Rose Dyspnea Scale 2 or more` DEATH EVENTDT_MI `EVENTDT_New Rose Dyspnea Scale 2 or more` EVENTDT_DEATH MI_COMPLICATED
# <int> <int> <int> <int> <int> <int> <int> <int>
# 1 325107 1 1 0 21 1468 NA 1
# 2 418351 0 1 0 NA 207 NA NA
# 3 839172 0 1 0 NA 1060 NA NA
# 4 1487422 1 0 1 990 NA 1113 0
# 5 1511165 1 1 0 424 721 NA 0
注意:unused_fn =
first 用于按id_cols
列 (deidnum
) 分组,然后使用 first() 汇总未使用的列 (MI_COMPLICATED)(
假设它已按
EVENTDT
排序)。
评论
1赞
Mark
7/23/2023
谢谢你把我拉上来!我完全想念那个达伦 😅
1赞
Mohamed Rahouma
7/23/2023
@DarrenTsai感谢您的所有努力。点赞。
2赞
jay.sf
7/23/2023
#2
merge
它与基本调用。reshape
reshape2::dcast(df, deidnum ~ eventc, value.var='MI_COMPLICATED', fun=length) |>
merge(reshape(df, idvar='deidnum', timevar='eventc', direction='wide')) |>
suppressWarnings() ## warns for more than one event which is acc. to OP fine
# deidnum DEATH MI New Rose Dyspnea Scale 2 or more EVENTDT.MI MI_COMPLICATED.MI
# 1 325107 0 1 1 21 1
# 2 418351 0 0 1 NA NA
# 3 839172 0 0 2 NA NA
# 4 1487422 1 1 0 990 0
# 5 1511165 0 2 1 424 0
# EVENTDT.New Rose Dyspnea Scale 2 or more MI_COMPLICATED.New Rose Dyspnea Scale 2 or more
# 1 1468 NA
# 2 207 NA
# 3 1060 NA
# 4 NA NA
# 5 721 NA
# EVENTDT.DEATH MI_COMPLICATED.DEATH
# 1 NA NA
# 2 NA NA
# 3 NA NA
# 4 1113 NA
# 5 NA NA
评论
0赞
Mohamed Rahouma
7/23/2023
非常感谢您的回复。欣赏这一点。我不确定为什么它给了我们这些标题.感谢任何意见。MI_COMPLICATED.New Rose Dyspnea Scale 2 or more
EVENTDT.DEATH MI_COMPLICATED.DEATH
0赞
jay.sf
7/23/2023
@MohamedRahouma 除了 id 和 time 列(即 deidum 和 eventc)之外,我们还有 EVENTDT 和 MI_COMPLICATED,因此标题(又名列名)需要前缀以免重复。另一个答案似乎缺少一列。
0赞
Mohamed Rahouma
7/23/2023
感谢您的指导。放 as 而不是只是为了避免标题名称的任何混淆,因为我的数据集中有很多列,这不行吗?谢谢,点赞你的答案。deidnum
value.var
MI_COMPLICATED
0赞
jay.sf
7/23/2023
@MohamedRahouma 值变量是 EVENTDT andMI_COMPLICATEDm,请从中进行选择。deidnum 是你的索引变量。
1赞
TarJae
7/23/2023
#3
这里有一个更整洁的方法:
library(dplyr)
library(tidyr)
df %>%
select(deidnum, eventc) %>%
summarise(n = n(), .by = c(deidnum, eventc)) %>%
pivot_wider(names_from = eventc, values_from = n, names_prefix = "", values_fill = 0) %>%
left_join(df %>%
group_by(deidnum, eventc) %>%
filter(EVENTDT == min(EVENTDT)) %>%
ungroup() %>%
pivot_wider(names_from = eventc,
values_from = c(EVENTDT, MI_COMPLICATED),
names_sep = "_") %>%
arrange(deidnum) %>%
select(1:5), by = "deidnum"
)
deidnum MI `New Rose Dyspnea Scale 2 or more` DEATH EVENTDT_MI `EVENTDT_New Rose Dyspnea Scale 2 or more` EVENTDT_DEATH MI_COMPLICATED_MI
<int> <int> <int> <int> <int> <int> <int> <int>
1 325107 1 1 0 21 1468 NA 1
2 418351 0 1 0 NA 207 NA NA
3 839172 0 2 0 NA 1060 NA NA
4 1487422 1 0 1 990 NA 1113 0
5 1511165 2 1 0 424 721 NA 0
评论
1赞
Mohamed Rahouma
7/25/2023
感谢您的宝贵回复。欣赏它。点赞。
评论
839172
有 2 个“新玫瑰”事件,有 2 个“MI”事件,所以他们的活动时间应该是 2 个。为什么它们在预期输出中为 1?1511165