提问人:stats_noob 提问时间:11/23/2021 最后编辑:stats_noob 更新时间:11/23/2021 访问量:57
R:正确调用包含“i”的变量
R: Correctly calling a variable that contains `i`
问:
我正在使用 R 编程语言。我有以下代码,可以创建 100 个数据集(包含一个固定分量和一个随机分量):
a = rnorm(300,10,5)
b = rnorm(300,3,1)
c = rnorm(300,12,1)
e = "original"
d = data.frame(a,b,c,e)
results <- list()
for (i in 1:100){
a = rnorm(100,10,10)
b = rnorm(100,10,10)
c = rnorm(100,10,10)
e = "simulated"
d_i = data.frame(a,b,c,e)
data_i = rbind(d, d_i)
data_i$iteration = i
results[[i]] <- data_i
}
results_df <- do.call(rbind.data.frame, results)
目前,这 100 个数据集都放在同一个文件(“results_df”)中。现在,我想将“results_df”文件分解为以下 100 个数据集中的每一个(使用“迭代”列作为索引):
results_df$iteration = as.factor(results_df$iteration)
X<-split(results_df, results_df$iteration)
这个“X”文件似乎是一个“列表”,其中每个数据集都列出了 100 个数据集,如下所示:
我可以通过调用“索引”来访问这些文件中的每一个,例如i
> head(X$`1`)
a b c e iteration
1 2.141495 3.984072 12.73344 original 1
2 8.769269 4.267464 11.32566 original 1
3 5.413573 2.823608 12.06713 original 1
4 11.710470 3.710500 12.20715 original 1
5 14.423155 2.944178 10.56673 original 1
6 6.886629 2.843195 12.43074 original 1
> head(X$`2`)
a b c e iteration
401 2.141495 3.984072 12.73344 original 2
402 8.769269 4.267464 11.32566 original 2
403 5.413573 2.823608 12.06713 original 2
404 11.710470 3.710500 12.20715 original 2
405 14.423155 2.944178 10.56673 original 2
406 6.886629 2.843195 12.43074 original 2
> head(X$`98`)
a b c e iteration
38801 2.141495 3.984072 12.73344 original 98
38802 8.769269 4.267464 11.32566 original 98
38803 5.413573 2.823608 12.06713 original 98
38804 11.710470 3.710500 12.20715 original 98
38805 14.423155 2.944178 10.56673 original 98
38806 6.886629 2.843195 12.43074 original 98
我的问题:我现在想编写另一个函数,该函数对这 100 个数据集中的每一个执行线性回归,保存回归系数,并将它们放入单个文件中。我尝试为此编写代码:
results_1 <- list()
for (i in 1:100){
model_i <- lm(a ~ b +c, data = X$`i`)
coeff_i = model_i$coefficients
results_1[[i]] <- coeff_i
}
results_df_1 <- do.call(rbind.data.frame, results_1)
乍一看,这似乎有效 - 但这显示所有回归系数都相同。这是不可能的,因为回归模型在不同的数据集上运行了 100 次:
#for some reason, the column names have been corrupted
hist(results_df_1$c.14.5741211250235..14.5741211250235..14.5741211250235..14.5741211250235.., main = "first coeff")
hist(results_df_1$c..0.105285805666629...0.105285805666629...0.105285805666629.., main = "second coeff")
hist(results_df_1$c..0.236548691738492...0.236548691738492...0.236548691738492.., main = "third coeff")
有人可以帮我解决这个问题吗?当您在 R 中使用“split()”函数时,这是在将来的命令中“调用”“拆分组件”的正确方法吗?
model_i <- lm(a ~ b +c, data = X$`i`)
谢谢!
答:
0赞
stats_noob
11/23/2021
#1
我能够解决这个问题:
a = rnorm(300,10,5)
b = rnorm(300,3,1)
c = rnorm(300,12,1)
e = "original"
d = data.frame(a,b,c,e)
results <- list()
for (i in 1:100){
a = rnorm(100,10,10)
b = rnorm(100,10,10)
c = rnorm(100,10,10)
e = "simulated"
d_i = data.frame(a,b,c,e)
data_i = rbind(d, d_i)
data_i$iteration = i
results[[i]] <- data_i
}
results_df <- do.call(rbind.data.frame, results)
X<-split(results_df, results_df$iteration)
#####
results_1 <- list()
for (i in 1:100){
#here was the problem
model_i <- lm(a ~ b +c, data = X[[i]])
coeff_i = model_i$coefficients
results_1[[i]] <- model_i$coefficients
}
results_df_1 <- do.call(rbind.data.frame, results_1)
par(mfrow = c(1, 3))
hist(results_df_1[,1], main = "first coeff")
hist(results_df_1[,2], main = "second coeff")
hist(results_df_1[,3], main = "third coeff")
上一个:R:满足条件时停止循环
评论