将循环中的函数应用于数据帧列表-解网

问：

我正在尝试在数据帧列表上运行一个包含函数的循环。我的数据结构如下：

型号 <-

变量1	变量2	var3	型
1	“温度”	“pres”	“dis”
2	“萨尔”	“温度”	“量具”
3	“dis”	“温度”	“量具”
4	“pres”	“温度”	“量具”
5	“萨尔”	“dis”	“量具”
6	“萨尔”	“温度”	“pres”

数据<-

y	“温度”	“pres”	“dis”	“萨尔”	“量具”	“延迟”
5	15.5	0.3	1500	0.01	1	0
1	15.6	0.1	1700	0.03	3	0
3	15.7	0.0	1450	0.01	5	0
4	15.9	0.2	1560	0.02	4	0
5	15.5	0.3	1500	0.01	1	1
1	15.6	0.1	1700	0.03	3	1
3	15.7	0.0	1450	0.01	5	1
4	15.9	0.2	1560	0.02	4	1

该函数从“data”中选择与数据帧“models”中每行 i 的字符串匹配的列，然后运行每个模型并返回汇总统计信息。

library(tidyverse)
runningmodels <- function(df){
  var1 <- models[[i,1]] #selects the variable
  var2 <- models[[i,2]]
  var3 <- models[[i,3]]
  cols <- c("y",var1, var2, var3, "Delay") #makes a list of the variable strings
  df <- data %>% select(cols) #selects columns in data that match with cols
  df <- df %>% rename(
    y=1,
    x1=2,
    x2=3,
    x3=4) #renames the columns to a dummy variable so I can apply this across multiple models
  lm <- lm(y ~ x1*x2*x3, data=df) #running the model
  l <- as.list(summary(lm)) #getting summary data
  dat <- tibble(model=models[[i,4]], r2=l$r.squared, Delay=df[[1,5]]) #pasting what model has been run based on the models dataframe and pasting what delay was used
  return(dat)
}

此函数（运行模型）使用以下循环为“models”中的每一行运行：

models_summary <- list()
for(i in 1:nrow(models)){
  tryCatch({
    models_summary[[i]] <- runningmodels(models)}, error=function(e){cat("Error")})
}
models_summary <- do.call("rbind",models_summary)

我能够在我的数据集中运行包含 runningmodels 函数的循环，但是我想为每个 Delay 单独执行此操作。当我手动将每个延迟解析到它自己的数据帧中时，我可以成功地做到这一点（如下所示）：

data1 <- data %>% filter(Delay == 0) 
data2 <- data %>% filter(Delay == 1)

并在每个数据帧（data1、data2）上运行循环。但是，我的实际数据集有 6 个列表，所以这很烦人，我想避免。

我还尝试合并 lapply，它将为每个延迟列表运行 model-1，但在拆分数据后不会遍历所有模型的整个循环：

data_split \<- split(data, data$Delay)
x <- lapply(data_split, runningmodels) ##this produces just the results of model-1 for each delay

我得到的最接近的是使用循环创建第二个函数，但这会导致 NULL 输出列表。

modelruns <- function(x) {
  for(i in 1:nrow(models)) { 
    tryCatch({
      df <- runningmodels(models)
      models_summary[[i]] <- df
    }, error=function(e){cat("ERROR")})
  }#loop to repeat resampling n times
}
models_all <- lapply(data_split, modelruns)

我怀疑我对循环 + 函数 + 列表的排序方式不正确，并且我尝试了单个函数、单个循环、每个函数的多个等的其他迭代，但无济于事。有什么建议吗？

假数据：

models <- tibble(var1 = c("temp","sal","dis","pres","sal","sal"),
                 var2 = c("pres","temp","temp","temp","dis","temp"),
                 var3 = c("dis","gage","gage","gage","gage","pres"),
                 model = c("model-1","model-2","model-3","model-4","model-5","model-6"))
data <- tibble(y = rep(c(5,1,2,4,8,6,2,3),times=10),
               temp = rep(c(15.5, 15.6, 15.7, 15.8, 15.9, 16.0, 16.1, 16.2),times=10),
               pres = rep(c(0.3,0.1,0.0,0.2,0.4,0.5,0.6,0.7),times=10),
               dis = rep(c(1500,1810,1800,1560,1540,1700,1450,1510),times=10),
               sal = rep(c(0.01,0.03,0.02,0.04,0.07,0.06,0.00,0.08),times=10),
               gage = rep(c(1,2,5,3,4,8,-3,-2),times=10),
               Delay = rep(c(0:1),each=40))

list 函数循环嵌套 lapply

将循环中的函数应用于数据帧列表

Applying function within a loop to a list of dataframes

评论