何时使用浮点运算计算 ceiling-integer-log2 失败?

When does using floating point operations to calculate ceiling-integer-log2 fail?

提问人:JamesTheAwesomeDude 提问时间:9/20/2023 最后编辑:JamesTheAwesomeDude 更新时间:9/28/2023 访问量:68

问:

我很好奇区分这两个函数的第一个输入是什么:

from math import *

def ilog2_ceil_alt(i: int) -> int:
    # DO NOT USE THIS FUNCTION IT'S WRONG
    return ceil(log2(i))

def ilog2_ceil(i: int) -> int:
    # Correct
    if i <= 0:
        raise ValueError("math domain error")
    return (i-1).bit_length()

...

显然,第一个对于某些输入来说会失败,因为在管道中通过有限值填充(对数)无限大小的整数时会出现舍入/截断错误——但是,我尝试运行这个测试代码几分钟,它没有发现问题:double

...

def test(i):
    if ilog2_ceil(i) != ilog2_ceil_alt(i):
        return i

def main(start=1):
    import multiprocessing, itertools
    p = multiprocessing.Pool()
    it = p.imap_unordered(test, itertools.count(start), 100)
    return next(i for i in it if i is not None)

if __name__ == '__main__':
    i = main()
    print("Failing case:", i)

我尝试测试各种大值,例如 和 ,但它并没有失败。2**322**64 + 9999

“alt”函数失败的最小(正)整数是多少?

python 数学 浮点 舍入误差

评论

1赞 Mark Ransom 9/20/2023
几乎可以肯定是 2 的幂,+/-1。
1赞 chtz 9/20/2023
在我的机器上失败(它返回 49,但应该返回 50)ilog2_ceil_alt(2**49+1)
0赞 vinc17 9/20/2023
@EricPostpischil 否,因为 log2(2^n + 1) = n + log2(1 + 2^(−n)) ≈ n + 0.43·2^(−n),因此一旦 n 得到 p − log2(p),你就会遇到舍入问题。或者,当 C 库支持时,您应该使用向 +∞ 舍入。

答:

4赞 vinc17 9/20/2023 #1

第一个问题是,整数将首先转换为朝向 0 的浮点类型(这是错误的方向!),此处的精度为 53 位。例如,如果是 253 + 1,它将转换为 2 53,您将得到53 而不是 54ceil(log2(i))ii

但是,即使使用较小的值,也可能会出现另一个问题:要考虑的值是那些略大于 2 的幂的值,例如 2 n + k,小整数 k > 0,因为 log2 可能会四舍五入到整数 n(即使 log2 正确四舍五入),而此时您需要一个略大于n 的 FP 数。因此,这将给出 ,即 n 而不是 n+1。ceil(n)

现在,让我们考虑 k 的最坏情况,即 k = 1。让 p 表示精度(在您的示例中,p = 53 表示双精度,但让我们概括一下)。一个有 log2(2 n + 1) = n + log2(1 + 2−n) ≈n + 0.43·2−n如果 n 的表示正好需要 q 位,则 ulp 将为 2q−p。为了得到预期的结果,0.43·2−n需要大于1/2 ulp,即2−n−2 ⩾ 2 q−p−1,即n ⩽ p−q−1

由于 q = ⌈log2(n)⌉,因此条件为 n + ⌈log2(n)⌉ ⩽ p − 1。

但是由于 ⌈log2(n)⌉ 与 n 相比很小,因此 n 的最大值将是 p 的数量级,因此通过将 ⌈log2(n)⌉ 替换为 ⌈log2(p)⌉ 来得到近似条件,即 n ⩽ p − ⌈log2(p)⌉ − 1。

下面是一个使用 GNU MPFR 的 C 程序,用于查找每个精度 p 失败的第一个 n 值:

#include <stdio.h>
#include <stdlib.h>
#include <mpfr.h>

static void test (long p)
{
  mpfr_t x;

  mpfr_init2 (x, p);
  for (long n = 1; ; n++)
    {
      /* i = 2^n + 1 */
      mpfr_set_ui_2exp (x, 1, n, MPFR_RNDN);
      mpfr_add_ui (x, x, 1, MPFR_RNDN);
      mpfr_log2 (x, x, MPFR_RNDN);
      mpfr_ceil (x, x);
      long r = mpfr_get_si (x, MPFR_RNDN);
      if (r != n + 1)
        {
          printf ("p = %ld, fail for n = %ld\n", p, n);
          break;
        }
    }
  mpfr_clear (x);
}

int main (int argc, char **argv)
{
  long p, pmax;

  if (argc != 2)
    exit (1);
  pmax = strtol (argv[1], NULL, 0);
  for (p = 2; p <= pmax; p++)
    test (p);

  return 0;
}

使用参数 64,可以得到:

p = 2, fail for n = 2
p = 3, fail for n = 3
p = 4, fail for n = 4
p = 5, fail for n = 4
p = 6, fail for n = 5
p = 7, fail for n = 6
p = 8, fail for n = 7
p = 9, fail for n = 8
p = 10, fail for n = 8
p = 11, fail for n = 9
p = 12, fail for n = 10
p = 13, fail for n = 11
p = 14, fail for n = 12
p = 15, fail for n = 13
p = 16, fail for n = 14
p = 17, fail for n = 15
p = 18, fail for n = 16
p = 19, fail for n = 16
p = 20, fail for n = 17
p = 21, fail for n = 18
p = 22, fail for n = 19
p = 23, fail for n = 20
p = 24, fail for n = 21
p = 25, fail for n = 22
p = 26, fail for n = 23
p = 27, fail for n = 24
p = 28, fail for n = 25
p = 29, fail for n = 26
p = 30, fail for n = 27
p = 31, fail for n = 28
p = 32, fail for n = 29
p = 33, fail for n = 30
p = 34, fail for n = 31
p = 35, fail for n = 32
p = 36, fail for n = 32
p = 37, fail for n = 33
p = 38, fail for n = 34
p = 39, fail for n = 35
p = 40, fail for n = 36
p = 41, fail for n = 37
p = 42, fail for n = 38
p = 43, fail for n = 39
p = 44, fail for n = 40
p = 45, fail for n = 41
p = 46, fail for n = 42
p = 47, fail for n = 43
p = 48, fail for n = 44
p = 49, fail for n = 45
p = 50, fail for n = 46
p = 51, fail for n = 47
p = 52, fail for n = 48
p = 53, fail for n = 49
p = 54, fail for n = 50
p = 55, fail for n = 51
p = 56, fail for n = 52
p = 57, fail for n = 53
p = 58, fail for n = 54
p = 59, fail for n = 55
p = 60, fail for n = 56
p = 61, fail for n = 57
p = 62, fail for n = 58
p = 63, fail for n = 59
p = 64, fail for n = 60

编辑:因此,对于双精度 (p = 53),如果 log2 正确舍入(或至少具有良好的精度),则最小的失败整数为 249 + 1。

评论

0赞 JamesTheAwesomeDude 9/20/2023
感谢您包含示例代码!不过,我对 MPFR 的接口不是很熟悉——只是为了规范答案,当双精度用于此操作时,最低失败整数是多少?是吗?2**49 + 1
0赞 JamesTheAwesomeDude 9/20/2023
我问是因为谷歌建议双精度有一个 52 位尾数,但你发布的输出说和 ,在我的机器上成功了。p = 52, fail for n = 48i = 2^n + 1test(2**(n:=48) + 1)
2赞 vinc17 9/20/2023
@JamesTheAwesomeDude我刚刚编辑了我的答案:在双精度 (p = 53) 中,如果 log2 正确舍入(或至少具有良好的精度),则最小的失败整数确实是(如果 log2 的精度很差,这种情况可能会成功和/或者您可能有更小的整数失败;这取决于实现)。精度为 53,但仅存储 52 位,因为第一个位始终为 1(“隐式位”规则)。2**49 + 1