向逻辑回归 (glm) 模型添加多个解释变量会产生错误?

Adding more than one explanatory variable to Logistic Regression (glm) model gives an error?

提问人:Troy 提问时间:5/26/2017 更新时间:7/19/2018 访问量:6591

问:

我尝试拟合以下线性模型:

ad.glm.all <- glm(WinLoss ~  Score + Margin + Opposition + Venue + Disposals + Marks + Goals + Behinds + Hitouts + Tackles + Rebound50s + Inside50s + Clearances + Clangers + FreesFor + ContendedPossessions + ContestedMarks + MarksInside50 + OnePercenters + Bounces+GoalAssists, 
                  data = ad.train, family = binomial)

每次尝试运行此代码时,我都会收到以下错误消息:

glm.fit: algorithm did not convergeglm.fit: fitted probabilities numerically 0 or 1 occurred

当我查看这个回归模型的摘要时,我得到:

Call:
glm(formula = WinLoss ~ Score + Margin + Disposals + Marks + 
    Goals + Behinds + Hitouts + Tackles + Rebound50s + Inside50s + 
    Clearances + Clangers + FreesFor + ContendedPossessions + 
    ContestedMarks + MarksInside50 + OnePercenters + Bounces + 
    GoalAssists, family = binomial, data = ad.train)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.980e-05  -2.100e-08   2.100e-08   2.100e-08   3.569e-05  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)          -8.578e+00  2.502e+06   0.000        1
Score                 4.194e+00  5.165e+04   0.000        1
Margin                2.187e+00  3.742e+03   0.001        1
Disposals             8.946e-02  3.549e+03   0.000        1
Marks                 1.427e-01  1.938e+03   0.000        1
Goals                -2.288e+01  3.082e+05   0.000        1
Behinds              -7.034e+00  5.482e+04   0.000        1
Hitouts               3.640e-02  5.167e+03   0.000        1
Tackles               8.939e-01  7.075e+03   0.000        1
Rebound50s           -2.064e-01  8.497e+03   0.000        1
Inside50s             5.645e-01  8.133e+03   0.000        1
Clearances           -1.930e-01  1.525e+04   0.000        1
Clangers             -2.040e-01  1.056e+04   0.000        1
FreesFor             -7.699e-01  1.762e+04   0.000        1
ContendedPossessions -5.752e-01  7.424e+03   0.000        1
ContestedMarks       -1.869e+00  1.069e+04   0.000        1
MarksInside50         6.742e-01  1.676e+04   0.000        1
OnePercenters         1.616e-01  6.888e+03   0.000        1
Bounces              -8.763e-01  7.669e+03   0.000        1
GoalAssists           7.570e-01  3.299e+04   0.000        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.2540e+02  on 91  degrees of freedom
Residual deviance: 7.1154e-09  on 72  degrees of freedom
AIC: 40

Number of Fisher Scoring iterations: 25

显然这里出了大问题,对吧?每个变量的 P 值不能全部为 1,而 Z 值全部为 0;右?

我给了它一个谷歌,我能找到的最好的是有人建议错误可能是由于变量太多(考虑到我有多少变量,这是有道理的)。所以我开始一个接一个地删除它们,每次尝试我仍然会得到错误,直到我只有一个变量 (x ~ y);只有这样我才会得到任何错误。

有人可以向我解释这个错误可能意味着什么吗?为什么我所有的 P 值都是 1 而 z 值是 0?

提前致谢!

-特洛伊

R GLM系列

评论


答: 暂无答案