增强似然比,用于比较使用 lm() 函数获得的模型

Enhanced likelihood ratio for comparison of models obtained with the lm() function

提问人:Rodrigo Saquicela 提问时间:2/15/2023 最后编辑:jay.sfRodrigo Saquicela 更新时间:2/15/2023 访问量:34

问:

我在 R 中寻找一个函数来比较使用 lm() 函数创建的模型,并具有以下统计数据:

  Models df    AIC    BIC logLik    Test L.ratio p.value
1 model1  6 161.65 170.44 -74.83                        
2 model2  8 159.39 171.11 -71.69 1 vs. 2    6.26  0.0437
3 model3  5 159.77 167.10 -74.88 2 vs. 3    6.38  0.0945
4 model4  5 162.26 169.59 -76.13 3 vs. 4    2.49       0

R 中是否有函数可以返回所示的比较表,特别是针对使用 lm() 创建的模型?我在 R 中使用以下代码获取了比较表:

dat <- read.csv2("https://raw.githubusercontent.com/saquicela/mineria/main/vapor_data.csv",sep=";",dec = ".", stringsAsFactors = TRUE)

model1 <- lm(y ~ x1 + x2 + x3 + x4, data = dat)
model2 <- lm(y ~ poly(x1,2) + x2 + poly(x3,2) + x4, data = dat)
model3 <- lm(y ~ x2 + x3 + x4, data = dat)
model4 <- lm(y ~ x1 + x2 + x4 , data = dat)

aic <- AIC(model1,model2,model3,model4)
bic <- BIC(model1,model2,model3,model4)

row_names <- rownames(aic)

library(lmtest)
likelihood_tests <- lrtest(model1,model2,model3,model4)

test_names <- c("",paste(seq(1,length(row_names)-1,1),
                             "vs.",
                             seq(2,length(row_names),1)))

models_comparison <- data.frame("Models" = row_names,
                                "df" = aic$df,
                                "AIC" = round(aic$AIC,2),
                                "BIC" = round(bic$BIC,2),
                                "logLik" = round(likelihood_tests$LogLik,2),
                                "Test" = test_names,
                                "L.ratio" = round(likelihood_tests$Chisq,2),
                                "p-value" = round(likelihood_tests$`Pr(>Chisq)`,4))

models_comparison[is.na(models_comparison)] <- ""
models_comparison

非常感谢您的帮助。

R 函数 模型 比较 LM

评论

1赞 Rui Barradas 2/15/2023
anova(model1, model2, model3, model4)不完全是你要找的,但它能帮上忙吗?

答:

1赞 jay.sf 2/15/2023 #1

实际上,您刚刚编写了函数。我们可以使用代替和添加意义星星,这也是一个回报,这不是ed。formatCroundinvisibleround

my_LR <- \(..., quiet=FALSE) {
  aic <- AIC(...)
  bic <- BIC(...)
  row_names <- sapply(substitute(...()), deparse)
  likelihood_tests <- lmtest::lrtest(...)
  test_names <- c("", paste(seq(1, length(row_names)-1, 1),
                            "vs.",
                            seq(2, length(row_names), 1)))
  models_comparison <- out <- data.frame(Models=row_names,
                                         df=aic$df,
                                         AIC=aic$AIC,
                                         BIC=bic$BIC,
                                         logLik=likelihood_tests$LogLik,
                                         Test=test_names,
                                         L.ratio=likelihood_tests$Chisq,
                                         p.value=likelihood_tests$`Pr(>Chisq)`
  )
  symp <- symnum(out$p.value[-1], corr=FALSE,
                 cutpoints=c(0, .001, .01, .05, .1, 1),
                 symbols=c("***", "** ", "*  ", ".  ", "   "))
  out[, c(3:5, 7:8)] <- mapply(\(x, y) formatC(x, digits=y, format='f'),
                               out[, c(3:5, 7:8)], c(2, 2, 2, 2, 4))
  out[1, 7:8] <- ''
  if (!quiet) print(cbind(out, ' '=c(' ', symp)))
  return(invisible(models_comparison))
}

用法

res <- my_LR(model1, model2, model3, model4)
#   Models df    AIC    BIC logLik    Test L.ratio p.value    
# 1 model1  6 161.65 170.44 -74.83                            
# 2 model2  8 159.39 171.11 -71.69 1 vs. 2    6.26  0.0437 *  
# 3 model3  5 159.77 167.10 -74.88 2 vs. 3    6.38  0.0945 .  
# 4 model4  5 162.26 169.59 -76.13 3 vs. 4    2.49  0.0000 ***

res
#   Models df      AIC      BIC    logLik    Test  L.ratio    p.value
# 1 model1  6 161.6505 170.4449 -74.82524               NA         NA
# 2 model2  8 159.3886 171.1145 -71.69430 1 vs. 2 6.261884 0.04367663
# 3 model3  5 159.7684 167.0971 -74.88422 2 vs. 3 6.379839 0.09452374
# 4 model4  5 162.2623 169.5909 -76.13113 3 vs. 4 2.493821 0.00000000

如果我们不想打印任何东西,只需返回即可。quiet=FALSE