提问人:Elizabeth G 提问时间:5/24/2023 更新时间:5/24/2023 访问量:14
将循环中的函数应用于数据帧列表
Applying function within a loop to a list of dataframes
问:
我正在尝试在数据帧列表上运行一个包含函数的循环。我的数据结构如下:
型号 <-
变量1 | 变量2 | var3 | 型 |
---|---|---|---|
1 | “温度” | “pres” | “dis” |
2 | “萨尔” | “温度” | “量具” |
3 | “dis” | “温度” | “量具” |
4 | “pres” | “温度” | “量具” |
5 | “萨尔” | “dis” | “量具” |
6 | “萨尔” | “温度” | “pres” |
数据<-
y | “温度” | “pres” | “dis” | “萨尔” | “量具” | “延迟” |
---|---|---|---|---|---|---|
5 | 15.5 | 0.3 | 1500 | 0.01 | 1 | 0 |
1 | 15.6 | 0.1 | 1700 | 0.03 | 3 | 0 |
3 | 15.7 | 0.0 | 1450 | 0.01 | 5 | 0 |
4 | 15.9 | 0.2 | 1560 | 0.02 | 4 | 0 |
5 | 15.5 | 0.3 | 1500 | 0.01 | 1 | 1 |
1 | 15.6 | 0.1 | 1700 | 0.03 | 3 | 1 |
3 | 15.7 | 0.0 | 1450 | 0.01 | 5 | 1 |
4 | 15.9 | 0.2 | 1560 | 0.02 | 4 | 1 |
该函数从“data”中选择与数据帧“models”中每行 i 的字符串匹配的列,然后运行每个模型并返回汇总统计信息。
library(tidyverse)
runningmodels <- function(df){
var1 <- models[[i,1]] #selects the variable
var2 <- models[[i,2]]
var3 <- models[[i,3]]
cols <- c("y",var1, var2, var3, "Delay") #makes a list of the variable strings
df <- data %>% select(cols) #selects columns in data that match with cols
df <- df %>% rename(
y=1,
x1=2,
x2=3,
x3=4) #renames the columns to a dummy variable so I can apply this across multiple models
lm <- lm(y ~ x1*x2*x3, data=df) #running the model
l <- as.list(summary(lm)) #getting summary data
dat <- tibble(model=models[[i,4]], r2=l$r.squared, Delay=df[[1,5]]) #pasting what model has been run based on the models dataframe and pasting what delay was used
return(dat)
}
此函数(运行模型)使用以下循环为“models”中的每一行运行:
models_summary <- list()
for(i in 1:nrow(models)){
tryCatch({
models_summary[[i]] <- runningmodels(models)}, error=function(e){cat("Error")})
}
models_summary <- do.call("rbind",models_summary)
我能够在我的数据集中运行包含 runningmodels 函数的循环,但是我想为每个 Delay 单独执行此操作。当我手动将每个延迟解析到它自己的数据帧中时,我可以成功地做到这一点(如下所示):
data1 <- data %>% filter(Delay == 0)
data2 <- data %>% filter(Delay == 1)
并在每个数据帧(data1、data2)上运行循环。但是,我的实际数据集有 6 个列表,所以这很烦人,我想避免。
我还尝试合并 lapply,它将为每个延迟列表运行 model-1,但在拆分数据后不会遍历所有模型的整个循环:
data_split \<- split(data, data$Delay)
x <- lapply(data_split, runningmodels) ##this produces just the results of model-1 for each delay
我得到的最接近的是使用循环创建第二个函数,但这会导致 NULL 输出列表。
modelruns <- function(x) {
for(i in 1:nrow(models)) {
tryCatch({
df <- runningmodels(models)
models_summary[[i]] <- df
}, error=function(e){cat("ERROR")})
}#loop to repeat resampling n times
}
models_all <- lapply(data_split, modelruns)
我怀疑我对循环 + 函数 + 列表的排序方式不正确,并且我尝试了单个函数、单个循环、每个函数的多个等的其他迭代,但无济于事。有什么建议吗?
假数据:
models <- tibble(var1 = c("temp","sal","dis","pres","sal","sal"),
var2 = c("pres","temp","temp","temp","dis","temp"),
var3 = c("dis","gage","gage","gage","gage","pres"),
model = c("model-1","model-2","model-3","model-4","model-5","model-6"))
data <- tibble(y = rep(c(5,1,2,4,8,6,2,3),times=10),
temp = rep(c(15.5, 15.6, 15.7, 15.8, 15.9, 16.0, 16.1, 16.2),times=10),
pres = rep(c(0.3,0.1,0.0,0.2,0.4,0.5,0.6,0.7),times=10),
dis = rep(c(1500,1810,1800,1560,1540,1700,1450,1510),times=10),
sal = rep(c(0.01,0.03,0.02,0.04,0.07,0.06,0.00,0.08),times=10),
gage = rep(c(1,2,5,3,4,8,-3,-2),times=10),
Delay = rep(c(0:1),each=40))
答: 暂无答案
评论