提问人:terrabowing 提问时间:10/21/2023 更新时间:10/28/2023 访问量:84
截断的普通 MLE
Truncated Normal MLE
问:
我想对截断的法线进行最大似然估计。首先,我使用两个简单的方程生成数据 X_1 = X_0 + E_1 和 X_2 = X_0 + E_2 并根据x_1 > x_0截断它们。这里的 X_0 和 E_2 是正态分布的,我使用 rnorm 生成值,此外我使用 x_0 = mu + 2*sigma 作为截止点。之后,我使用“mledist”函数推导出x_1的估计值,并得到非常糟糕的结果。为了修复这个错误,我对 mu 做了一个 100 种不同估计的storage_mu,它们都不好。我在下面给出了我用于生成的代码。
m = 100
i = 1
storage_mu = rep(0,100)
while(i <= m){
#Parameters
N = 10000
mu = 10
sigma_0 = 1
sigma_e = 1
X_0 = rnorm(N, mean = mu, sd = sigma_0)
E_1 = rnorm(N, mean = 0, sd = sigma_e)
E_2 = rnorm(N, mean = 0, sd = sigma_e)
X_1 = X_0 + E_1
X_2 = X_0 + E_2
x_bi = cbind(X_1, X_2)
#Cut-Off and truncation
x_0_cut = mu + 2 * sqrt(sigma_0^2 + sigma_e^2)
x_bi_trunc = subset(x_bi, x_bi[,1] >= x_0_cut)
#Select n observations
#x_bi_sort = x_bi_trunc[order(x_bi_trunc[,1]),]
n = 2500
x_bi_final = head(x_bi_trunc, n)
#for faster working
x_1 = x_bi_final[,1]
x_2 = x_bi_final[,2]
x_1_bar = mean(x_1)
sd_1 = sqrt(var(x_1)*(n-1)/n)
#Berechnung von MLE für univariate trunc_norm
fit = mledist(x_1, "truncnorm", start = list(mean = x_1_bar, sd = sd_1))$estimate
mu_1_hat = fit[1]
sigma_1_hat = fit[2]
storage_mu[i] = mu_1_hat
i = i +1
}
storage_mu
在代码之后,这里是 mu 的 100 个估计器的输出。我的偏见约为 +3。
[1] 13.36466 13.32620 13.26061 13.35483 13.33904 13.32613 13.35538 13.41398
[9] 13.34077 13.31515 13.36057 13.36290 13.37306 13.37546 13.35423 13.35764
[17] 13.35481 13.31707 13.35890 13.34394 13.44947 13.35364 13.36820 13.35658
[25] 13.36899 13.33603 13.33170 13.40991 13.35209 13.35108 13.36597 13.35709
[33] 13.29799 13.36056 13.33210 13.34038 13.35774 13.41122 13.32358 13.37611
[41] 13.34839 13.33610 13.30683 13.31361 13.37794 13.36822 13.42950 13.39687
[49] 13.38384 13.32224 13.32566 13.37058 13.29208 13.40263 13.32874 13.38417
[57] 13.31360 13.34144 13.39018 13.34612 13.35063 13.35206 13.37924 13.43540
[65] 13.33795 13.35608 13.32996 13.35488 13.32289 13.38497 13.33944 13.36007
[73] 13.36286 13.34323 13.34397 13.33800 13.38267 13.32556 13.35907 13.32954
[81] 13.38601 13.32422 13.38291 13.32262 13.39936 13.34364 13.39338 13.34509
[89] 13.30802 13.39126 13.35310 13.29634 13.38961 13.33402 13.31962 13.38288
[97] 13.33798 13.31185 13.30590 13.32078
我试图通过使用“mledist”选项来获得更好的近似值,例如,我将 a 的值固定在我使用的截止点x_0,但随后值变得更糟。函数如下所示
mledist(x_1, "truncnorm", start = list(mean = x_1_bar, sd = sd_1), fix.arg = list(a=x_0_cut))$estimate
结果如下:
[1] 10.140061223 12.089945831 11.048595750 11.581995517 11.129860901
[6] 9.418521020 9.608019087 9.929831748 4.699461364 11.267935812
[11] 11.714949354 11.762986025 10.202113218 12.369248730 11.844003279
[16] 8.687867353 11.314833612 11.690386639 4.494877436 8.705778578
[21] 11.764867122 9.981540165 11.328531141 9.703258348 11.655422495
[26] 12.073462623 8.681263340 11.024820357 12.068915824 10.770174316
[31] 5.468735578 9.840865977 12.186883743 8.993979680 5.369682551
[36] 3.479905781 11.499858120 8.020687499 11.853293176 10.506828481
[41] 10.231765273 10.511779270 10.411593933 11.773604687 9.777336311
[46] 10.853945953 11.327343019 11.089845843 2.991276391 0.151239390
[51] 9.489660148 11.726098700 11.509123940 7.453213566 11.932400933
[56] 4.188084979 10.977560797 -0.007370121 10.904831509 6.100907337
[61] 11.727064157 10.029395630 5.662605692 9.946740799 12.359169774
[66] -2.686121867 10.387318263 11.170782824 9.334853648 9.839261227
[71] 7.703257871 8.971979528 8.753366471 12.486250630 11.729089349
[76] 7.663378529 -1.581499286 10.500557773 9.504805508 -15.956289261
[81] -13.784320458 9.248311024 11.267769705 9.608263889 8.128767660
[86] 11.186614967 10.300034199 10.912469218 10.975790552 10.304912099
[91] 8.654392575 7.547824160 11.879473071 11.379844872 8.978183752
[96] 11.564179091 10.771436935 11.132971477 9.125936329 -7.406405917
如果有人有时间和动力帮助我,将不胜感激!
问候 Terrabowing(大地弓形)
答:
1赞
jblood94
10/28/2023
#1
使用截断的普通 MLE:optim
tnorm.mle <- function(x, a = -Inf, b = Inf) {
# parameter estimation for truncated normal data with known extents
n <- length(x)
ab <- c(a, b)
init <- c(x[1], log(sd(x)))
if (is.finite(a)) {
if (is.finite(b)) {
# two-sided truncation
f <- function(params) {
mu <- params[1]
sigma <- exp(params[2])
n*log(diff(pnorm(ab, mu, sigma))) - sum(dnorm(x, mu, sigma, TRUE))
}
} else {
# left-truncated
f <- function(params) {
mu <- params[1]
sigma <- exp(params[2])
n*pnorm(a, mu, sigma, FALSE, TRUE) - sum(dnorm(x, mu, sigma, TRUE))
}
}
} else {
if (is.finite(b)) {
# right-truncated
f <- function(params) {
mu <- params[1]
sigma <- exp(params[2])
n*pnorm(b, mu, sigma, TRUE, TRUE) - sum(dnorm(x, mu, sigma, TRUE))
}
} else {
# non-truncated normal
return(c(mu = mean(x), sigma = sd(x)))
}
}
# solve for mu and sigma
params <- optim(init, f)$par
c(mu = params[1], sigma = exp(params[2]))
}
测试:
set.seed(1288276868)
a <- 4
b <- 10
mu <- 2
sigma <- 3
x <- qnorm(runif(1e6, pnorm(a, mu, sigma), pnorm(b, mu, sigma)), mu, sigma)
tnorm.mle(x, a, b)
#> mu sigma
#> 2.034619 2.987204
x <- qnorm(runif(1e6, pnorm(a, mu, sigma)), mu, sigma)
tnorm.mle(x, a)
#> mu sigma
#> 2.012977 2.998484
x <- qnorm(runif(1e6, 0, pnorm(b, mu, sigma)), mu, sigma)
tnorm.mle(x, b = b)
#> mu sigma
#> 1.999469 3.000672
评论
0赞
terrabowing
11/4/2023
非常感谢您提供此代码。我尝试使用它,但它也没有收敛到正确的值。但因此我知道数据的构造/截断方式是错误的原因。
0赞
jblood94
11/4/2023
您正在测试代码的哪些值和截断扩展?
0赞
terrabowing
11/6/2023
我在正态分布的随机数据(例如 X ~ N(10, 1))上测试代码。截断位于左侧,当截止点变低时,偏置也会变低。因此,当我切断更多数据时,偏差会变得更大。
0赞
terrabowing
11/7/2023
好的,非常感谢,我想我现在已经收集了足够多的想法来以某种方式解决我的问题。
评论