根据多个列中的条件创建具有相同前缀的多个列

Create multiple columns with same prefix based on conditions from multiple columns

提问人:Yanting Luo 提问时间:11/14/2023 最后编辑:Yanting Luo 更新时间:11/15/2023 访问量:49

问:

我有这样的数据,包括患者的出生和死亡日期。

ID      DOB       Birth_Year    DOD      Death_Year
1    2016-10-01     2016     2019-10-15     2019
2    2017-07-01     2019     2022-01-10     2022
3    2017-04-35     2017     2020-08-15     2020

我需要为 2015 年至 2020 年的每一年创建跟进时间。如果出生年份的随访时间是出生日期与出生年份最后日期之间的差值。同样,死亡年份的随访时间是死亡年份的第一天到死亡日期之间的时间。否则,随访时间为零或一年。我期待一个输出数据帧,其中包含Year_2015 2020 年的新列,如下所示。

ID      DOB       Birth_Year    DOD      Death_Year Year_2015 Year_2016 Year_2017 Year_2018 Year_2019 Year_2020
1    2016-10-01     2016     2019-06-30     2019        0        0.25       1         1        0.5        0
2    2017-07-01     2019     2022-01-10     2022        0         0        0.5        1         1         1
3    2017-04-15     2017     2020-08-15     2020        0         0        0.3        1         1        0.8

我尝试在 dplyr 包中使用 case_when,或使用 if else 语句循环函数。首先,我成功地创建了具有相同前缀的列,Year_2015要Year_2020,但未能编写基于多个条件遍历列的循环。我在循环中引用不同的列名时遇到了问题。或者也许有一种方法可以在 R 中使用 apply 函数。任何帮助都是值得赞赏的!

for (i in 2015:2020) {
  FoUp_Year <- paste0('Year_', i) 
  df[, Birth_Year_end] <- make_date(year = Birth_Year, month = 12, day = 31)
  df[, Death_Year_start] <- make_date(year = Death_Year, month = 1, day = 1)
  if (i<df[, Birth_Year] | i>df[, Death_Year]) { 
    df[,FoUp_Year] <- 0
  }
  else if(i==df[, Birth_Year] && i<df[, Death_Year]) {
    df[,FoUp_Year] <- df[, Birth_Year_end]-df[,DOB]
  }
  else if(i==df[, Death_Year] && i>df[, Birth_Year]) {
    df[,FoUp_Year] <- df[, DOD]-df[,Death_Year_start]
  }
  else if(i==df[, Birth_Year] && i==df[, Death_Year]) {
    df[,FoUp_Year] <- df[, DOD]-df[,DOB]
  }
  else if(i>df[, Birth_Year] && i<df[, Death_Year]) {
    df[,FoUp_Year] <- 1
  }
}
R 数据库 循环条件 语句 前缀

评论


答:

0赞 marcguery 11/15/2023 #1

该套餐将计算日期之间的时差。然后,日期之间的月数(或天数,如果您想要更精确)可以除以一年中的总月数(或天数,也要考虑闰年)。lubridate

library(lubridate) #To calculate time intervals between dates

## Convert to Date format
dob_dod$DOB <- as.Date(dob_dod$DOB)
dob_dod$DOD <- as.Date(dob_dod$DOD)
## Years to process
mindate <- min(dob_dod$DOB)
maxdate <- max(dob_dod$DOD)
all_years <- as.Date(paste0(c(year(mindate):year(maxdate)),"-01-01"))

## Fraction of the years between 'DOB' and 'DOD', precision at the month level
frac_years_month <- data.frame(t(apply(dob_dod[,c("DOB","DOD")],1,function(x){
  time_period <- rep(0,length.out = length(all_years))
  yob <- year(x[1])
  yod <- year(x[2])
  yob_rank <- which(year(all_years)==yob)
  yod_rank <- which(year(all_years)==yod)
  time_period[yob_rank] <- (12-(interval(all_years[yob_rank], x[1]) %/% months(1)))/12
  time_period[yod_rank] <- (interval(all_years[yod_rank], x[2]) %/% months(1))/12
  time_period[c((yob_rank+1):(yod_rank-1))] <- 1
  return(time_period)
})))
colnames(frac_years_month) <- paste0("Year_",year(all_years))

## Fraction of the years between 'DOB' and 'DOD', precision at the day level
frac_years_day <- data.frame(t(apply(dob_dod[,c("DOB","DOD")],1,function(x){
  time_period <- rep(0, length.out = length(all_years))
  yob <- year(x[1])
  yod <- year(x[2])
  yob_rank <- which(year(all_years)==yob)
  yod_rank <- which(year(all_years)==yod)
  yob_days <- ifelse(leap_year(yob), 366, 365)
  yod_days <- ifelse(leap_year(yod), 366, 365)
  time_period[yob_rank] <- (yob_days-(interval(all_years[yob_rank], x[1]) %/% days(1)))/yob_days
  time_period[yod_rank] <- (interval(all_years[yod_rank], x[2]) %/% days(1))/yod_days
  time_period[c((yob_rank+1):(yod_rank-1))] <- 1
  return(time_period)
})))
colnames(frac_years_day) <- paste0("Year_",year(all_years))

## Use frac_years_month or frac_years_day depending on the level of precision desired
dob_dod <- cbind(dob_dod,frac_years_month)

月级精度:

ID  DOB         Birth_Year  DOD         Death_Year  Year_2016  Year_2017  Year_2018  Year_2019  Year_2020  Year_2021  Year_2022
 1  2016-10-01        2016  2019-10-15        2019       0.25       1             1       0.75     0               0          0
 2  2017-07-01        2019  2022-01-10        2022       0          0.5           1       1        1               1          0
 3  2017-04-15        2017  2020-08-15        2020       0          0.75          1       1        0.5833          0          0

日级精度:

ID  DOB         Birth_Year  DOD         Death_Year  Year_2016  Year_2017  Year_2018  Year_2019  Year_2020  Year_2021  Year_2022
 1  2016-10-01        2016  2019-10-15        2019     0.2514     1               1     0.7863     0               0     0
 2  2017-07-01        2019  2022-01-10        2022     0          0.5041          1     1          1               1     0.0247
 3  2017-04-15        2017  2020-08-15        2020     0          0.7151          1     1          0.6202          0     0

原始数据

dob_dod <- read.table(textConnection("ID      DOB       Birth_Year    DOD      Death_Year
1    2016-10-01     2016     2019-10-15     2019
2    2017-07-01     2019     2022-01-10     2022
3    2017-04-15     2017     2020-08-15     2020"),
                   h = T)