将循环中的函数应用于数据帧列表

Applying function within a loop to a list of dataframes

提问人:Elizabeth G 提问时间:5/24/2023 更新时间:5/24/2023 访问量:14

问:

我正在尝试在数据帧列表上运行一个包含函数的循环。我的数据结构如下:

型号 <-

变量1 变量2 var3
1 “温度” “pres” “dis”
2 “萨尔” “温度” “量具”
3 “dis” “温度” “量具”
4 “pres” “温度” “量具”
5 “萨尔” “dis” “量具”
6 “萨尔” “温度” “pres”

数据<-

y “温度” “pres” “dis” “萨尔” “量具” “延迟”
5 15.5 0.3 1500 0.01 1 0
1 15.6 0.1 1700 0.03 3 0
3 15.7 0.0 1450 0.01 5 0
4 15.9 0.2 1560 0.02 4 0
5 15.5 0.3 1500 0.01 1 1
1 15.6 0.1 1700 0.03 3 1
3 15.7 0.0 1450 0.01 5 1
4 15.9 0.2 1560 0.02 4 1

该函数从“data”中选择与数据帧“models”中每行 i 的字符串匹配的列,然后运行每个模型并返回汇总统计信息。

library(tidyverse)
runningmodels <- function(df){
  var1 <- models[[i,1]] #selects the variable
  var2 <- models[[i,2]]
  var3 <- models[[i,3]]
  cols <- c("y",var1, var2, var3, "Delay") #makes a list of the variable strings
  df <- data %>% select(cols) #selects columns in data that match with cols
  df <- df %>% rename(
    y=1,
    x1=2,
    x2=3,
    x3=4) #renames the columns to a dummy variable so I can apply this across multiple models
  lm <- lm(y ~ x1*x2*x3, data=df) #running the model
  l <- as.list(summary(lm)) #getting summary data
  dat <- tibble(model=models[[i,4]], r2=l$r.squared, Delay=df[[1,5]]) #pasting what model has been run based on the models dataframe and pasting what delay was used
  return(dat)
}

此函数(运行模型)使用以下循环为“models”中的每一行运行:

models_summary <- list()
for(i in 1:nrow(models)){
  tryCatch({
    models_summary[[i]] <- runningmodels(models)}, error=function(e){cat("Error")})
}
models_summary <- do.call("rbind",models_summary)

我能够在我的数据集中运行包含 runningmodels 函数的循环,但是我想为每个 Delay 单独执行此操作。当我手动将每个延迟解析到它自己的数据帧中时,我可以成功地做到这一点(如下所示):

data1 <- data %>% filter(Delay == 0) 
data2 <- data %>% filter(Delay == 1)

并在每个数据帧(data1、data2)上运行循环。但是,我的实际数据集有 6 个列表,所以这很烦人,我想避免。

我还尝试合并 lapply,它将为每个延迟列表运行 model-1,但在拆分数据后不会遍历所有模型的整个循环:

data_split \<- split(data, data$Delay)
x <- lapply(data_split, runningmodels) ##this produces just the results of model-1 for each delay

我得到的最接近的是使用循环创建第二个函数,但这会导致 NULL 输出列表。

modelruns <- function(x) {
  for(i in 1:nrow(models)) { 
    tryCatch({
      df <- runningmodels(models)
      models_summary[[i]] <- df
    }, error=function(e){cat("ERROR")})
  }#loop to repeat resampling n times
}
models_all <- lapply(data_split, modelruns)

我怀疑我对循环 + 函数 + 列表的排序方式不正确,并且我尝试了单个函数、单个循环、每个函数的多个等的其他迭代,但无济于事。有什么建议吗?

假数据:

models <- tibble(var1 = c("temp","sal","dis","pres","sal","sal"),
                 var2 = c("pres","temp","temp","temp","dis","temp"),
                 var3 = c("dis","gage","gage","gage","gage","pres"),
                 model = c("model-1","model-2","model-3","model-4","model-5","model-6"))
data <- tibble(y = rep(c(5,1,2,4,8,6,2,3),times=10),
               temp = rep(c(15.5, 15.6, 15.7, 15.8, 15.9, 16.0, 16.1, 16.2),times=10),
               pres = rep(c(0.3,0.1,0.0,0.2,0.4,0.5,0.6,0.7),times=10),
               dis = rep(c(1500,1810,1800,1560,1540,1700,1450,1510),times=10),
               sal = rep(c(0.01,0.03,0.02,0.04,0.07,0.06,0.00,0.08),times=10),
               gage = rep(c(1,2,5,3,4,8,-3,-2),times=10),
               Delay = rep(c(0:1),each=40))
list 函数 循环嵌 lapply

评论


答: 暂无答案