截断的普通 MLE

Truncated Normal MLE

提问人:terrabowing 提问时间:10/21/2023 更新时间:10/28/2023 访问量:84

问:

我想对截断的法线进行最大似然估计。首先,我使用两个简单的方程生成数据 X_1 = X_0 + E_1 和 X_2 = X_0 + E_2 并根据x_1 > x_0截断它们。这里的 X_0 和 E_2 是正态分布的,我使用 rnorm 生成值,此外我使用 x_0 = mu + 2*sigma 作为截止点。之后,我使用“mledist”函数推导出x_1的估计值,并得到非常糟糕的结果。为了修复这个错误,我对 mu 做了一个 100 种不同估计的storage_mu,它们都不好。我在下面给出了我用于生成的代码。

m = 100
i = 1
storage_mu = rep(0,100)


while(i <= m){
#Parameters
N = 10000
mu = 10
sigma_0 = 1
sigma_e = 1
  
X_0 = rnorm(N, mean = mu, sd = sigma_0)
E_1 = rnorm(N, mean = 0, sd = sigma_e)
E_2 = rnorm(N, mean = 0, sd = sigma_e)
X_1 = X_0 + E_1
X_2 = X_0 + E_2
x_bi = cbind(X_1, X_2)
  
  
#Cut-Off and truncation
x_0_cut = mu + 2 * sqrt(sigma_0^2 + sigma_e^2)
x_bi_trunc = subset(x_bi, x_bi[,1] >= x_0_cut)
  
#Select n observations
#x_bi_sort = x_bi_trunc[order(x_bi_trunc[,1]),] 
n = 2500
x_bi_final = head(x_bi_trunc, n)
  
#for faster working
x_1 = x_bi_final[,1]
x_2 = x_bi_final[,2]
  
x_1_bar = mean(x_1) 
sd_1 = sqrt(var(x_1)*(n-1)/n)

  

#Berechnung von MLE für univariate trunc_norm
fit = mledist(x_1, "truncnorm", start = list(mean = x_1_bar, sd = sd_1))$estimate
mu_1_hat = fit[1]
sigma_1_hat = fit[2]

storage_mu[i] = mu_1_hat

i = i +1
}
storage_mu

在代码之后,这里是 mu 的 100 个估计器的输出。我的偏见约为 +3。

[1] 13.36466 13.32620 13.26061 13.35483 13.33904 13.32613 13.35538 13.41398
[9] 13.34077 13.31515 13.36057 13.36290 13.37306 13.37546 13.35423 13.35764
[17] 13.35481 13.31707 13.35890 13.34394 13.44947 13.35364 13.36820 13.35658
[25] 13.36899 13.33603 13.33170 13.40991 13.35209 13.35108 13.36597 13.35709
[33] 13.29799 13.36056 13.33210 13.34038 13.35774 13.41122 13.32358 13.37611
[41] 13.34839 13.33610 13.30683 13.31361 13.37794 13.36822 13.42950 13.39687
[49] 13.38384 13.32224 13.32566 13.37058 13.29208 13.40263 13.32874 13.38417
[57] 13.31360 13.34144 13.39018 13.34612 13.35063 13.35206 13.37924 13.43540
[65] 13.33795 13.35608 13.32996 13.35488 13.32289 13.38497 13.33944 13.36007
[73] 13.36286 13.34323 13.34397 13.33800 13.38267 13.32556 13.35907 13.32954
[81] 13.38601 13.32422 13.38291 13.32262 13.39936 13.34364 13.39338 13.34509
[89] 13.30802 13.39126 13.35310 13.29634 13.38961 13.33402 13.31962 13.38288
[97] 13.33798 13.31185 13.30590 13.32078

我试图通过使用“mledist”选项来获得更好的近似值,例如,我将 a 的值固定在我使用的截止点x_0,但随后值变得更糟。函数如下所示

mledist(x_1, "truncnorm", start = list(mean = x_1_bar, sd = sd_1), fix.arg = list(a=x_0_cut))$estimate

结果如下:

[1]  10.140061223  12.089945831  11.048595750  11.581995517  11.129860901
[6]   9.418521020   9.608019087   9.929831748   4.699461364  11.267935812
[11]  11.714949354  11.762986025  10.202113218  12.369248730  11.844003279
[16]   8.687867353  11.314833612  11.690386639   4.494877436   8.705778578
[21]  11.764867122   9.981540165  11.328531141   9.703258348  11.655422495
[26]  12.073462623   8.681263340  11.024820357  12.068915824  10.770174316
[31]   5.468735578   9.840865977  12.186883743   8.993979680   5.369682551
[36]   3.479905781  11.499858120   8.020687499  11.853293176  10.506828481
[41]  10.231765273  10.511779270  10.411593933  11.773604687   9.777336311
[46]  10.853945953  11.327343019  11.089845843   2.991276391   0.151239390
[51]   9.489660148  11.726098700  11.509123940   7.453213566  11.932400933
[56]   4.188084979  10.977560797  -0.007370121  10.904831509   6.100907337
[61]  11.727064157  10.029395630   5.662605692   9.946740799  12.359169774
[66]  -2.686121867  10.387318263  11.170782824   9.334853648   9.839261227
[71]   7.703257871   8.971979528   8.753366471  12.486250630  11.729089349
[76]   7.663378529  -1.581499286  10.500557773   9.504805508 -15.956289261
[81] -13.784320458   9.248311024  11.267769705   9.608263889   8.128767660
[86]  11.186614967  10.300034199  10.912469218  10.975790552  10.304912099
[91]   8.654392575   7.547824160  11.879473071  11.379844872   8.978183752
[96]  11.564179091  10.771436935  11.132971477   9.125936329  -7.406405917

如果有人有时间和动力帮助我,将不胜感激!

问候 Terrabowing(大地弓形)

R 正态分布 mle 被截断

评论

0赞 jblood94 10/23/2023
您是要估计截断区间还是只想估计均值和方差(即截断区间是否已知)?
0赞 terrabowing 10/27/2023
我只想估计均值和方差 - 但是一旦我使用 fix.arg 作为区间,mu 和 sigma 的估计就比没有 :/

答:

1赞 jblood94 10/28/2023 #1

使用截断的普通 MLE:optim

tnorm.mle <- function(x, a = -Inf, b = Inf) {
  # parameter estimation for truncated normal data with known extents
  
  n <- length(x)
  ab <- c(a, b)
  init <- c(x[1], log(sd(x)))
  
  if (is.finite(a)) {
    if (is.finite(b)) {
      # two-sided truncation
      f <- function(params) {
        mu <- params[1]
        sigma <- exp(params[2])
        n*log(diff(pnorm(ab, mu, sigma))) - sum(dnorm(x, mu, sigma, TRUE))
      }
    } else {
      # left-truncated
      f <- function(params) {
        mu <- params[1]
        sigma <- exp(params[2])
        n*pnorm(a, mu, sigma, FALSE, TRUE) - sum(dnorm(x, mu, sigma, TRUE))
      }
    }
  } else {
    if (is.finite(b)) {
      # right-truncated
      f <- function(params) {
        mu <- params[1]
        sigma <- exp(params[2])
        n*pnorm(b, mu, sigma, TRUE, TRUE) - sum(dnorm(x, mu, sigma, TRUE))
      }
    } else {
      # non-truncated normal
      return(c(mu = mean(x), sigma = sd(x)))
    }
  }
  
  # solve for mu and sigma
  params <- optim(init, f)$par
  c(mu = params[1], sigma = exp(params[2]))
}

测试:

set.seed(1288276868)

a <- 4
b <- 10
mu <- 2
sigma <- 3
x <- qnorm(runif(1e6, pnorm(a, mu, sigma), pnorm(b, mu, sigma)), mu, sigma)
tnorm.mle(x, a, b)
#>       mu    sigma 
#> 2.034619 2.987204

x <- qnorm(runif(1e6, pnorm(a, mu, sigma)), mu, sigma)
tnorm.mle(x, a)
#>       mu    sigma 
#> 2.012977 2.998484

x <- qnorm(runif(1e6, 0, pnorm(b, mu, sigma)), mu, sigma)
tnorm.mle(x, b = b)
#>       mu    sigma 
#> 1.999469 3.000672

评论

0赞 terrabowing 11/4/2023
非常感谢您提供此代码。我尝试使用它,但它也没有收敛到正确的值。但因此我知道数据的构造/截断方式是错误的原因。
0赞 jblood94 11/4/2023
您正在测试代码的哪些值和截断扩展?
0赞 terrabowing 11/6/2023
我在正态分布的随机数据(例如 X ~ N(10, 1))上测试代码。截断位于左侧,当截止点变低时,偏置也会变低。因此,当我切断更多数据时,偏差会变得更大。
0赞 jblood94 11/6/2023
提供的多语教育没有错误,只是有偏见。请参阅此简历问答,进行良好的讨论。tnorm.mle
0赞 terrabowing 11/7/2023
好的,非常感谢,我想我现在已经收集了足够多的想法来以某种方式解决我的问题。