求满足浮点不等式的最小整数-解网

问：

我正在寻找一种快速算法，该算法可以找到最小的整数 N，该算法将满足以下不等式，其中、、和是数字（使用 IEEE-754 binary32 格式）：squpfloat

s > q + u * p / (N - 1)

其中 N 可以是任何由有符号 32 位整数表示的正整数。转换为后，所有算术都在中计算。(N - 1)floatfloat

其他限制包括：

0 < < 1.p
-1 ≤ ≤ 1.q
q < s.
0 < .u

我很难弄清楚如何以一种稳健的方式做到这一点，以正确处理浮点舍入错误和比较。这是我对一个不快的解决方案的糟糕尝试，甚至不健壮，因为我无法确定最小值：SOME_AMOUNT

int n = std::max(1.0f, floorf((u * p / (s - q)) - 1.0f));

// Floating point math might require to round up by some amount...
for (int i = 0; i < SOME_AMOUNT; ++i)
    if (!(q + (u * p / (n + 1)) < second))
        ++n;

你可以在上面看到我使用基本代数计算的公式。for 循环是我试图解释浮点舍入错误的粗略方法。我正在用蛮力检查它，如下所示：n

int nExact = 0;
bool found = false;
for (; nExact < SOME_BIG_NUMBER; ++nExact) {
    if (q + (u * p / (nExact + 1)) < second) {
        found = true;
        break;
    }
}
assert(found);
assert(n == nExact);

任何浮点大师在 C++ 中都有相当快的答案吗？

坦率地说，如果有人能给出上面“SOME_AMOUNT”上限的理论证明，我会相当高兴......

C++ 精度浮点转换不等式

在释放手指编写代码之前，先在纸上做一些基本的代数操作，以变成一边是不等式，另一边是其他一切。您必须考虑一些情况（例如，如果代数操作涉及除以某物，则注意该某物为零的情况），但您最终会得到一些简单的闭式公式来计算给定、、和的值。最多，几个和，绝对不需要循环。s > q + u * p / (N - 1)NNpqusif()else

4赞 Eric Postpischil 5/17/2020

您想要一个在用浮点算术计算时为真的解，还是在用实数算术计算时 s > q + u * p / （N - 1）为真的解？N 的域是可以以浮点格式表示的整数集还是整数集？p 和 q 有相同的符号吗？s > q吗？您对 s、q、u 和 p 了解多少？你知道他们的价值观有什么界限吗？对他们的域名有什么限制吗？它们来自哪里？s > q + u * p / (N - 1)

1赞 Eric Postpischil 5/17/2020

只是为了切除问题的一部分，给定，如果和具有不同的符号，那么解是 2，假设 1 由于除以零而被排除在外，因为然后是负数或零，并且是真的。因此，我们可以将问题简化为具有相同的符号。可以用代替，因为它们不参与表达式。所以我们有，其中 x 为正。s > ququ * q / (2-1)s > q + u * q / (2-1)upu * qxs > q + x / (N-1)

1赞 Eric Postpischil 5/17/2020

在浮点运算中，基本算术运算是弱单调的，其中相应的实数运算是单调的或弱单调的。这可能有助于建立检查候选项的边界。（显然，N在实算中可以很容易地找到，但是由于我们被要求在浮点算术中找到一个解，四舍五入问题可能会导致浮点解与N的实解不同。NN

2赞 Eric Postpischil 5/18/2020

需要考虑的一件事是，由于 N 是一个 32 位整数，并且表达式的计算公式是使用，必须将 N 转换为，这会导致舍入误差。考虑 q 至少为 1/2 秒的情况。然后计算为精确值（没有舍入误差），满足的最小值是高或低 1 个 ULP，具体取决于除法中的四舍五入。例如，我们可能会发现这是2147483392。在这种情况下，将是 2147483266，因为 then 是 2147483265，这是四舍五入到 2147483392 的最小整数。floatfloats-qfloatfloat ns > q + x/n(s-q)/xnNN-1

答：

0赞 Ankur Parihar 5/21/2020 #1

为了安全起见，我们可以首先得到一个更大的可能值（上限）和一个更小的可能值（下限），然后将其简化为我们的实际答案，这样它就会比只迭代数字更准确、更快。

通过解决我们得到的不平等，

N > u * p / (s - q) + 1

获取上限

因此，您将首先使用整数找到一个最大猜测的答案。我们将增加分子和整数铸分母

int UP = (int)(u * p + 1);    // Increase by one
int D = (int)(s - q);         // we don't increase this because it  would cause g to decrease, which we don't want

float g = UP / (float)D + 1;  // we again float cast D to avoid integer division
int R = (int)(g + 1);         // Now again increase g

/******** Or a more straight forward approach ********/
int R = (int)(((int)(u*p+1))/(s-q) + 1 + 1)

// Add rounding-off error here
if(R + 128 < 0) R = 2147483647;    // The case of overflow
else R += 128;

这是您的最大答案（上限）。

获得下限

和以前一样，但这次我们将增加分母和整数转换分子

int UP = (int)(u * p);         // will automatically decrease
int D = (int)(s - q + 1);      // we increase this because it would cause g to decrease, which we want

float g = UP / (float)D + 1;   // we again float cast D to avoid integer division
int L = (int)g;                // Integer cast, will automatically decrease
/******** Or a more straight forward approach ********/
int L = (int)(((int)(u*p))/(s-q+1) + 1)

// Subtract rounding-off error
if(L - 128 <= 1 ) L = 2;        // N cannot be below 2
else L -= 128;

这是您的最低答案（下限）。

注意：整数转换的原因是为了减少我们的样本空间。如果您觉得如此，可以省略它。

消除可能的数字并获得正确的数字

for (int i = L; i <= R; ++i){
    if ((s > q + u*p/(i-1))) break;   // answer would be i
}
N = i;    // least number which satisfies the condition

如果边界间隙（R-L）很大，则可以使用二叉搜索更快地完成此操作。至于差值为 2^n 的数范围，只需 n 步即可减少。

// we know that
// lower limit = L;
// upper limit = R;
// Declare u, p, q, s in global space or pass as parameters to biranySearch

int binarySearch(int l, int r)
{
    if(l==r) return l;

    if (r > l) {
        int mid = l + (r - l) / 2;

        bool b = (s > q + (p*u)/(mid-1));

        if (b==true){
            // we know that numbers >= mid will all satisfy
            // so our scope reduced to [l, mid]
            return binarySearch(l, mid);
        }
        // If mid doesn't satisfy
        // we know that our element is greater than mid
        return binarySearch(mid+1, r); 
    } 
} 

int main(void) 
{
    // calculate lower bound L and upper bound R here using above methods
    int N = binarySearch(L, R);
    // N might have rounding-off errors, so check for them
    // There might be fluctuation of 128 [-63 to 64] so we will manually check.
    // To be on safe side I will assume fluctuation of 256
    L = N-128 > 2 ? N-128 : 2;
    R = N+128 < 0 ? 2147483647 : N+128;
    for(int i=L; i<=R; ++i){
        if( s > q + u * p / ((float)i - 1)) {
            break;
        }
    }
    cout << i << endl;
}

它主要是一个概念，但它既快速又安全。唯一的问题是我还没有测试过它，但它应该可以工作！

@YesheTenley 感谢您指出这个四舍五入的东西，我发现了一个巨大的错误。四舍五入 4.3 将变成 4，但有意识地我希望它变成 5，所以我删除了四舍五入，而是添加了 1。现在好了！之前的错误是由于复制粘贴了两次相同的代码，我忘记编辑注释。

0赞 Eric Postpischil 5/22/2020

对于 s = 1， q = 0， u = 2^30 = 1073741824， p = 1，此代码给出的下限为 536870912，上限为 1073741824，但正确答案是 1073741890

0赞 Ankur Parihar 5/22/2020

对于我的代码给出的这些约束边界@EricPostpischil是 .正确答案是 2^30+2 => 1073741826它小于你的答案，位于边界内并满足不等式。请再检查一次！[2^29+1, 2^30+3] => [536870913, 1073741827]

0赞 Eric Postpischil 5/22/2020 #2

这是解决方案的开始。一些注意事项：

它是用 C 语言，而不是 C++。
它假定 IEEE-754 算术，四舍五入到最接近。
它不处理不等式要求 N 超出从 2 到的边界的情况。INT_MAX
我没有测试太多。

该代码首先使用浮点运算来估计不等式变化的边界的位置，忽略舍入误差。它测试不等式，看看它是否需要增加或减少候选值。然后，它遍历连续的整数值以查找边界。我的感觉是这需要几次迭代，但我还没有完全分析它。float

这会产生一个整数值的最小值，当用于代替分母时，该整数值满足不等式。然后，代码找到舍入到该的最小值，并且应该是满足不等式的最小值。floatN-1int NN-1floatNint

#include <math.h>
#include <stdio.h>
#include <stdlib.h>


//  Test the inequality.
static int Test(float s, float q, float u, float p, int N)
{
    return s > q + (float) (((float) (u * p)) / (N-1));
}


int main(void)
{
    float s = 1;
    float q = 0;
    float u = 0x1p30, p = 1;

    /*  Approximate the desired denominator (N-1) -- would be exact with real
        arithmetic but is subject to rounding errors.
    */
    float D = floorf(u*p/(s-q));

    //  Test which side of the boundary where the inequality changes we are on.
    if (Test(s, q, u, p, (int) D + 1))
    {
        //  We are above the boundary, decrement find the boundary.
        float NextD = D;
        do
        {
            D = NextD;
            //  Decrement D by the greater of 1 or 1 ULP.
            NextD = fminf(D-1, nexttowardf(D, 0));
        }
        while (Test(s, q, u, p, (int) NextD + 1));
    }
    else
        //  We are below the boundary, increment to find the boundary.
        do
            //  Increment D by the greater of 1 or 1 ULP.
            D = fmaxf(D+1, nexttowardf(D, INFINITY));
        while (!Test(s, q, u, p, (int) D + 1));

    //  Find the distance to the next lower float, as an integer.
    int distance = D - nexttowardf(D, 0);

    /*  Find the least integer that rounds to D.  If the distance to the next
        lower float is less than 1, then D is that integer.  Otherwise, we want
        either the midpoint between the D and the next lower float or one more
        than that, depending on whether the low bit of D in the float
        significand is even (midpoint will round to it, so use midpoint) or odd
        (midpoint will not round to it, so use one higher).

        (int) D - distance/2 is the midpoint.

        ((int) D / distance) & 1 scales D to bring the low bit of its
        significand to the one’s position and tests it, producing 0 if it is
        even and 1 if it is odd.
    */
    int I = distance == 0 ? (int) D
        : (int) D - distance/2 + (((int) D / distance) & 1);

    //  Set N to one more than that integer.
    int N = I+1;

    printf("N = %d.\n", N);

    if (Test(s, q, u, p, N-1) || !Test(s, q, u, p, N))
    {
        fprintf(stderr, "Error, solution is wrong.\n");
        exit(EXIT_FAILURE);
    }
}

上一个：R 与 Python 在浮点分辨率上的差异：是否可以将 Python 设置为像 R 一样？

下一个：在同一台机器上运行时，C 程序的浮点精度在过去两周内发生了变化

求满足浮点不等式的最小整数

Find smallest integer that satisfies floating point inequality equation

评论

评论