提问人:flâneur 提问时间:10/5/2023 最后编辑:flâneur 更新时间:10/7/2023 访问量:85
plot_slopes无法在迭代函数中计算预测的回归模型值,但可以在函数外部预测每个值
plot_slopes cannot compute predicted regression model values in an iterative function, but can predict each one outside of the function
问:
这是一个很难回答的问题,因为我无法提供实际的复制数据(很抱歉,数据太大/敏感),但我很感激任何人的想法。我在这里使用示例数据来显示代码的结构,尽管此代码不会产生我在下面详述的错误。
我有一个复杂的函数,我执行多重回归以隔离交互效应,并使用精彩包中的函数将它们绘制在一起。我的代码结构如下。plot_slopes
marginaleffects
library(tidyverse)
library(marginaleffects)
outcome_var_list = c("mpg","cyl","hp")
interact_var_list = c("am","wt")
subset_var_list = c("3","4")
combos = expand.grid(outcome_var_list, subset_var_list)
model_ <- function(k,c){
mtcars = mtcars[mtcars$gear == c,]
outcome_var_list = outcome_var_list[outcome_var_list == k]
results = list()
for (r in interact_var_list) {
f = paste(k, "~", r, "*factor(vs)")
m = lm(f, subset(mtcars, carb %in% c(1,2,3,4)))
s = plot_slopes(m, variables = r, condition = "vs", draw = FALSE)
tmp = s[, c("estimate", "conf.low", "conf.high", "vs")]
tmp$outcome = k
tmp$regressor = r
results = c(results, list(tmp))
}
results = do.call("rbind", results)
plot1 = results %>%
mutate(min = min(conf.low), max = max(conf.high)) %>%
ggplot(aes(x=factor(vs),
y=estimate,
color = regressor,
ymin=conf.low,
ymax=conf.high)) +
geom_errorbar(position = position_dodge(0.4)) +
geom_point(position = position_dodge(0.4)) +
scale_x_discrete(expand = c(0,0)) +
theme_light() +
ggtitle(label = paste0("Model 1: ",k)) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y= "Interaction Coefficient", x = "X") +
theme(plot.title = element_textbox_simple(vjust=-1)) +
theme(plot.margin = margin(2,0,0,0, "cm")) +
theme(axis.text.x = element_text(size = 5))
ggsave(plot1,file=paste0("plot_",k,".png"),path=paste0(c))
}
safe_model <- safely(model_)
iterate <-mapply(safe_model,combos$Var1,combos$Var2)
当我在添加额外内容之前运行它时,一切都很好。subset_var_list
但是,当我添加子集 var 并切换到 mapply 时,我现在为整个数据子集生成模型,其中只有几秒可以生成要保存的图。我认为这是由于缺乏覆盖范围或其他原因,所以我删除了 ,这揭示了以下错误:outcome_var
safely
Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata` argument.
If this does not work, you can file a report on the Github Issue Tracker:
https://github.com/vincentarelbundock/marginaleffects/issues
我再次认为这是由于缺乏覆盖率/变异性,因此我尝试手动运行每个单独的交互回归,而不是通过函数运行,并且每个图/回归都成功为其中一个子集生成,否则不会生成图。因此,这似乎不是模型/缺乏变化的问题。plot_slopes
我可能会误解错误消息中提出的论点是如何工作的,但似乎它具有执行任务的完整数据,这应该不会带来任何好处。回归/斜率模型可以单独/手动生成,而当放在这个函数中时,它们无法预测,这有什么原因吗?如何解决此问题并为每个子集创建所需的所有绘图?newdata
更新 2
我能够让它工作,但我仍然想知道问题是什么,希望是否有办法提高运行模型的速度。
我在 for 循环中过滤了模型,以删除结果变量的 NA。我这样做的理由是,我的数据集中的某些交互变量没有任何数据覆盖率,因此,当它们都是 NA 时,即使所有 NA 都在回归中被删除,它也会触发有关因子水平不足的消息,因此应该删除它们。在回归之前,我手动执行此操作,现在它可以工作了
我的新代码如下:
library(tidyverse)
library(marginaleffects)
outcome_var_list = c("mpg","cyl","hp")
interact_var_list = c("am","wt")
subset_var_list = c("3","4")
combos = expand.grid(outcome_var_list, subset_var_list)
model_ <- function(k,c){
mtcars = mtcars[mtcars$gear == c,]
outcome_var_list = outcome_var_list[outcome_var_list == k]
results = list()
for (r in interact_var_list) {
f = paste(k, "~", r, "*factor(vs)")
m = lm(f, mtcars %>% filter(is.na(k))
s = plot_slopes(m, variables = r, condition = "vs", draw = FALSE)
tmp = s[, c("estimate", "conf.low", "conf.high", "vs")]
tmp$outcome = k
tmp$regressor = r
results = c(results, list(tmp))
}
results = do.call("rbind", results)
plot1 = results %>%
mutate(min = min(conf.low), max = max(conf.high)) %>%
ggplot(aes(x=factor(vs),
y=estimate,
color = regressor,
ymin=conf.low,
ymax=conf.high)) +
geom_errorbar(position = position_dodge(0.4)) +
geom_point(position = position_dodge(0.4)) +
scale_x_discrete(expand = c(0,0)) +
theme_light() +
ggtitle(label = paste0("Model 1: ",k)) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y= "Interaction Coefficient", x = "X") +
theme(plot.title = element_textbox_simple(vjust=-1)) +
theme(plot.margin = margin(2,0,0,0, "cm")) +
theme(axis.text.x = element_text(size = 5))
ggsave(plot1,file=paste0("plot_",k,".png"),path=paste0(c))
}
safe_model <- safely(model_)
iterate <-mapply(safe_model,combos$Var1,combos$Var2)
这将运行模型并允许我成功生成它们。但是,当我尝试过滤循环外的数据帧以提高处理速度时,我仍然收到原始错误:results = list(...
Error: Unable to compute predicted values with this model. You can try to supply a different dataset to the `newdata`
argument. If this does not work, you can file a report on the Github Issue Tracker:
https://github.com/vincentarelbundock/marginaleffects/issues
这对我来说很奇怪,因为从理论上讲,如果在任一阶段过滤 df,结果应该是相同的。在for循环之外过滤时,是否有任何原因导致此错误?如果无法解决,在for循环中过滤时,有没有办法加快这个过程?
答:
无论如何,使用 tidyverse,您可以创建一个具有所需组合的网格 (),并使用一列来扩展此数据帧以保存模型,另一列用于保存绘图对象以供进一步使用(在此过程中将它们保存为副作用)。请注意,使用 rowwise
() 和 list()
将任意对象放入 DataFrame 单元格中:expand.grid
library(dplyr)
library(ggplot2)
## subset mtcars once only (unless you want different subsets
## per row):
d <- subset(mtcars, carb %in% 1:4)
result <-
expand.grid(outcome = c('mpg','cyl','hp'),
regressor = c('am', 'wt')
) |>
rowwise() |>
mutate(model = list(
lm(sprintf('%s ~ %s * factor(vs)', outcome, regressor), data = d)
),
slopes = list(slopes(model)),
plots = list({p <- ggplot(data = slopes) +
geom_point(aes(factor(vs), estimate, color = term))
## save the plot as you go:
ggsave(p, filename = sprintf('plot_%s_%s.png', outcome, regressor))
p ## return the plot object
})
)
输出:
## > result
## # A tibble: 6 x 5
## # Rowwise:
## outcome regressor model slopes plots
## <fct> <fct> <list> <list> <list>
## 1 mpg am <lm> <slopes [60 x 15]> <gg>
## 2 cyl am <lm> <slopes [60 x 15]> <gg>
## 3 hp am <lm> <slopes [60 x 15]> <gg>
## 4 mpg wt <lm> <slopes [60 x 15]> <gg>
## 5 cyl wt <lm> <slopes [60 x 15]> <gg>
根据需要拾取和重复使用对象,例如:
> result$model[[3]]
Call:
lm(formula = sprintf("%s ~ %s * factor(vs)", outcome, regressor),
data = d)
Coefficients:
(Intercept) am factor(vs)1 am:factor(vs)1
194.17 -50.42 -92.02 28.85
编辑
要计算每个齿轮的叶子,您可以稍微修改上面的代码({dplyr} 将是另一种方法):nest_by
result <-
expand.grid(gear = 3:5, ## add gear to factor combinations
outcome = c('mpg','cyl','hp'),
regressor = c('am', 'wt')
) |>
rowwise() |>
mutate(model = list(
lm(sprintf('%s ~ %s * factor(vs)', outcome, regressor),
## subset by gear:
data = subset(d, gear == gear)
)
),
slopes = list(slopes(model)),
plots = list({p <- ggplot(data = slopes) +
geom_point(aes(factor(vs), estimate, color = term))
## save the plot as you go:
ggsave(p, filename = sprintf('plot_%s_%s.png', outcome, regressor))
p ## return the plot object
})
)
输出:
## # A tibble: 18 x 6
## # Rowwise:
## gear outcome regressor model slopes plots
## <int> <fct> <fct> <list> <list> <list>
## 1 3 mpg am <lm> <slopes [60 x 15]> <gg>
## 2 4 mpg am <lm> <slopes [60 x 15]> <gg>
## 3 5 mpg am <lm> <slopes [60 x 15]> <gg>
## 4 3 cyl am <lm> <slopes [60 x 15]> <gg>
## ... 14 more rows
评论
crossing()
expand()
gear
评论
wt
outcome_var_list
interact_var_list
slopes()
plot_slopes()
slopes(model, newdata=datagrid(vs = seq(min(vs), max(vs), length.out = 100)))