提问人:carl o 提问时间:11/4/2023 更新时间:11/4/2023 访问量:13
使用自相关残差进行回归 - 使用 AR(1) 过程添加滞后 dep. 变量或模型残差?
Regression with autocorrelated residuals - Add lagged dep. variable or model residuals with AR(1) process?
问:
我正在尝试解决一个问题,我想知道如果我假设自变量将保持在当前水平,因变量将收敛到什么水平。
我的问题是残差是强自相关的,所以我不能使用“简单”y(t) = a+b*x(t)+eps(t) 方法,但我不确定如何以最佳方式解决这个问题......
我不是这个领域的专家,但可以想出两种可能的方法来解决这个问题: (1) 添加一个滞后因变量,这将使方程看起来像 y(t)=a+b x(t)+c y(t-1)+eps(t) (2) 让残差遵循 AR(1) 过程,这将使我的方程变为: y(t)=a+b x(t)+eps(t) 哪里 EPS(吨) = CEPS(t-1)+W(吨)
对我来说,问题是上述两种方法产生了截然不同的结果,这让我认为我在某个地方做错了什么......
输入数据:输入数据
哪个是正相关的:散点图
但残差之间有很强的相关性:ACF
在提供的代码中,如果我假设自变量将在 obs_no >= 60 内保持其当前水平,我将尝试找到因变量的收敛水平。
因变量的时间序列以及 obs_nbr >= 60 的收敛水平(使用):
- 简单的线性回归
- 具有滞后因变量的线性回归
- 线性回归,其中残差被建模为 AR(1) 过程 可以在下面找到:
我使用的 R 代码可以在下面看到。我将非常感谢任何关于选择哪种方法的指导。我认为我做了一些愚蠢的事情,因为结果因我使用的两种方法中的哪一种而大不相同。
library(forecast)
library(ggplot2)
# Creating the dataframe with the data that I am using... It seems as it is not possible to upload files so I have to do it this way.
obs_nbr <- 1:226
indep_variable <- c(0.0231,0.0226,0.0228,0.0206,0.0208,0.0197,0.0194,0.022,0.0209,0.0215,0.0228,0.0224,0.0201,0.0204,0.0192,0.0231,0.0251,0.0253,0.0256,0.0244,0.0222,0.0224,0.0221,0.0211,0.0237,0.0238,0.0229,0.0244,0.0248,0.0251,0.026,0.0242,0.0248,0.0233,0.0236,0.0228,0.0234,0.022,0.0245,0.0264,0.0267,0.0279,0.0275,0.0279,0.0278,0.0302,0.0358,0.0321,0.0246,0.0332,0.0308,0.0268,0.0329,0.03,0.0256,0.0293,0.0279,0.0265,0.0283,0.0258,0.0278,0.0283,0.0304,0.0289,0.0256,0.0243,0.0245,0.0267,0.0219,0.024,0.0203,0.0237,0.026,0.0298,0.0284,0.0271,0.026,0.026,0.0259,0.021,0.0186,0.0172,0.0144,0.0138,0.0136,0.0122,0.0136,0.0157,0.0132,0.0093,0.0106,0.0087,0.0103,0.0106,0.0093,0.0095,0.0102,0.012,0.0128,0.0131,0.0097,0.0154,0.0185,0.0196,0.0199,0.0193,0.0193,0.0218,0.0227,0.0191,0.0186,0.0183,0.0165,0.0148,0.0152,0.0145,0.0127,0.0165,0.0147,0.0137,0.0121,0.0104,0.011,0.0111,0.0119,0.0125,0.0139,0.0125,0.0147,0.0151,0.0137,0.014,0.0158,0.0153,0.0145,0.0113,0.0108,0.0114,0.0091,0.0095,0.0084,0.009,0.0101,0.0124,0.0131,0.0118,0.0094,0.0095,0.0089,0.0088,0.0104,0.0118,0.0102,0.0113,0.0111,0.0111,0.0093,0.0094,0.0105,0.0095,0.0092,0.0097,0.009,0.0097,0.0101,0.0115,0.0147,0.0143,0.0138,0.013,0.0134,0.0113,0.0118,0.0093,0.0104,0.0103,0.0058,0.0073,0.0085,0.0074,0.009,0.0054,0.004,0.0032,0.0022,0.0033,0.0028,0.0002,0.0014,0.0008,0.0016,-0.0007,0.0003,0.0011,0.0046,0.0056,0.0055,0.0046,0.0028,0.0028,0.0019,0.0017,0.0014,-0.001,0.0007,0.0034,0.0047,0.0061,0.0095,0.0144,0.0184,0.0166,0.0161,0.0232,0.0213,0.019,0.0193,0.0183,0.0182,0.0195,0.0195,0.0195,0.0182,0.0209,0.022,0.0256)
dep_variable <- c(0.0639,0.0655,0.0643,0.0656,0.0664,0.0648,0.066,0.0644,0.0665,0.0674,0.0697,0.0675,0.0681,0.0671,0.0676,0.0675,0.0673,0.0704,0.0711,0.0715,0.0705,0.0694,0.0679,0.0674,0.0666,0.0658,0.0678,0.0674,0.0656,0.0643,0.0663,0.0697,0.0694,0.0673,0.0665,0.0689,0.069,0.0727,0.0751,0.0753,0.0709,0.0709,0.0781,0.0778,0.0762,0.083,0.0904,0.0917,0.0842,0.0821,0.0884,0.0797,0.072,0.0695,0.0727,0.0694,0.0683,0.0675,0.0715,0.0695,0.0698,0.0747,0.0735,0.0706,0.0727,0.0806,0.0861,0.0821,0.087,0.0805,0.0787,0.0797,0.076,0.0761,0.0746,0.076,0.0758,0.0781,0.08,0.0828,0.0883,0.0948,0.0844,0.0852,0.0847,0.0813,0.0785,0.0769,0.0789,0.0846,0.0815,0.0798,0.079,0.0772,0.0785,0.0786,0.0785,0.075,0.075,0.0727,0.0716,0.0708,0.0724,0.0694,0.0722,0.0705,0.0677,0.0662,0.0647,0.0673,0.0647,0.0647,0.0651,0.0644,0.0637,0.0658,0.0638,0.0652,0.063,0.0616,0.0608,0.0611,0.0578,0.0593,0.059,0.059,0.0605,0.0597,0.0637,0.0654,0.0601,0.0602,0.0613,0.0633,0.0633,0.0595,0.0595,0.0593,0.0597,0.0581,0.0587,0.059,0.0607,0.059,0.0585,0.0579,0.056,0.0565,0.0567,0.0566,0.0569,0.0562,0.0567,0.056,0.0555,0.0545,0.0549,0.0558,0.0589,0.061,0.0619,0.0614,0.0617,0.0605,0.0593,0.0595,0.0644,0.0632,0.0691,0.0633,0.0615,0.0606,0.0589,0.0636,0.0595,0.0589,0.0601,0.0592,0.0578,0.056,0.0546,0.0552,0.0599,0.0613,0.0485,0.0462,0.0462,0.0456,0.0437,0.0464,0.049,0.0455,0.0441,0.0466,0.0466,0.0456,0.0461,0.0472,0.047,0.0477,0.0472,0.0497,0.047,0.0482,0.0466,0.05,0.0522,0.0513,0.0569,0.0576,0.0631,0.0572,0.0599,0.0657,0.0598,0.0565,0.0598,0.0555,0.0569,0.0552,0.0546,0.0551,0.0518,0.0506,0.0527,0.0558)
regression_data <- as.data.frame(cbind(obs_nbr, indep_variable, dep_variable))
colnames(regression_data) <- c("OBS_NBR", "INDEPENDENT_VARIABLE", "DEPENDENT_VARIABLE")
# Creating some graphs to see what kind of data issues we have...
ggplot(data = regression_data, aes(x=OBS_NBR)) +
geom_line(aes(y=INDEPENDENT_VARIABLE, color = "INDEPENDENT_VARIABLE")) +
geom_line(aes(y=DEPENDENT_VARIABLE, color = "DEPENDENT_VARIABLE")) +
ylab(" ") +
scale_color_manual(values= c("INDEPENDENT_VARIABLE" = "orange", "DEPENDENT_VARIABLE"="black"))
ggplot(data = regression_data, aes(x=INDEPENDENT_VARIABLE, y = DEPENDENT_VARIABLE)) +
geom_point()
lin_model <- lm(regression_data$DEPENDENT_VARIABLE ~ regression_data$INDEPENDENT_VARIABLE)
acf(lin_model$residuals)
# Adding the lagged dependent variable to the dataframe
regression_data$DEPENDENT_VARIABLE_LAGGED <- c(NA, regression_data$DEPENDENT_VARIABLE[seq_along(regression_data$DEPENDENT_VARIABLE) -1])
# Adding columns for storing output when using a simple linear model...
empty_vector <- rep(NA, nrow(regression_data))
regression_data$LM_INTERCEPT <- empty_vector
regression_data$LM_COEFF_INDEPENDENT_VARIABLE <- empty_vector
regression_data$LM_CONVERGENCE_LEVEL <- empty_vector
# Adding columns for storing output when using linear model with lagged ependent variable...
regression_data$LM_W_LAG_INTERCEPT <- empty_vector
regression_data$LM_W_LAG_COEFF_INDEPENDENT_VARIABLE <- empty_vector
regression_data$LM_W_LAG_COEFF_LAGGED_DEPENDENT_VARIABLE <- empty_vector
regression_data$LM_W_LAG_CONVERGENCE_LEVEL <- empty_vector
# Adding columns for storing output when using linear model with residuals following an AR(1) process...
regression_data$LM_AR1_RES_INTERCEPT <- empty_vector
regression_data$LM_AR1_RES_COEFF_INDEPENDENT_VARIABLE <- empty_vector
regression_data$LM_AR1_RES_COEFF_AR1 <- empty_vector
regression_data$LM_AR1_RES_CONVERGENCE_LEVEL <- empty_vector
for (i in 61: nrow(regression_data)){
###################################################################
# Simple linear model
lin_model <- lm(regression_data$DEPENDENT_VARIABLE[1:i] ~ regression_data$INDEPENDENT_VARIABLE[1:i])
regression_data$LM_INTERCEPT[i] <- lin_model$coefficients[1]
regression_data$LM_COEFF_INDEPENDENT_VARIABLE[i] <- lin_model$coefficients[2]
regression_data$LM_CONVERGENCE_LEVEL[i] <- lin_model$coefficients[1] + lin_model$coefficients[2] * regression_data$INDEPENDENT_VARIABLE[i]
###################################################################
# Linear model with lagged dependent variable
lin_model_w_lag <-lm(regression_data$DEPENDENT_VARIABLE[2:i] ~ cbind(regression_data$INDEPENDENT_VARIABLE[2:i], regression_data$DEPENDENT_VARIABLE_LAGGED[2:i]))
regression_data$LM_W_LAG_INTERCEPT[i] <- lin_model_w_lag$coefficients[1]
regression_data$LM_W_LAG_COEFF_INDEPENDENT_VARIABLE[i] <- lin_model_w_lag$coefficients[2]
regression_data$LM_W_LAG_COEFF_LAGGED_DEPENDENT_VARIABLE[i] <- lin_model_w_lag$coefficients[3]
# y(t) = a+b*x(t)+c*y(t-1) => y(t)-c*y(t-1) = a+b*x(t) =>
# => [It is assumed that x remains at the current level, when convergence is reached (at t=T) then y(T) = y(T-1)] =>
# => y(T)-c*y(T) = a+b*x(T) =>
# => y(T)*(1-c) = a+b*x(T) =>
# => y(T) = a/(1-c) + b/(1-c)*x(T)
regression_data$LM_W_LAG_CONVERGENCE_LEVEL[i] <-
lin_model_w_lag$coefficients[1]/(1-lin_model_w_lag$coefficients[3]) +
lin_model_w_lag$coefficients[2]/(1-lin_model_w_lag$coefficients[3]) * regression_data$INDEPENDENT_VARIABLE[i]
###################################################################
# Linear model with residuals following an AR(1) process
x_values <- regression_data$INDEPENDENT_VARIABLE[1:i]
y_values <- regression_data$DEPENDENT_VARIABLE[1:i]
arima_model <- arima(y_values, xreg= x_values, order = c(1,0,0))
regression_data$LM_AR1_RES_INTERCEPT[i] <- arima_model$coef["intercept"]
regression_data$LM_AR1_RES_COEFF_INDEPENDENT_VARIABLE[i] <- arima_model$coef["x_values"]
regression_data$regression_data$LM_AR1_RES_COEFF_AR1[i] <- arima_model$coef["ar1"]
regression_data$LM_AR1_RES_CONVERGENCE_LEVEL[i] <- arima_model$coef["intercept"] + arima_model$coef["x_values"] *regression_data$INDEPENDENT_VARIABLE[i]
}
plot_data <- as.data.frame(cbind(
regression_data$OBS_NBR,
regression_data$DEPENDENT_VARIABLE,
regression_data$LM_CONVERGENCE_LEVEL,
regression_data$LM_W_LAG_CONVERGENCE_LEVEL,
regression_data$LM_AR1_RES_CONVERGENCE_LEVEL
))
colnames(plot_data) <- c("Obs_nbr", "DependentVar", "Conv_Lin_Model", "Conv_Lin_Model_w_Lag", "Conv_Lin_Model_w_AR1Res")
ggplot(data = plot_data, aes(x=Obs_nbr)) +
geom_line(aes(y=DependentVar, color = "DependentVar")) +
geom_line(aes(y=Conv_Lin_Model, color = "Conv_Lin_Model")) +
geom_line(aes(y=Conv_Lin_Model_w_Lag, color = "Conv_Lin_Model_w_Lag")) +
geom_line(aes(y=Conv_Lin_Model_w_AR1Res, color = "Conv_Lin_Model_w_AR1Res")) +
ylab(" ") +
scale_color_manual(values= c("DependentVar"="black", "Conv_Lin_Model"="red", "Conv_Lin_Model_w_Lag"="green", "Conv_Lin_Model_w_AR1Res"="blue"))
答: 暂无答案
评论