我应该如何进行浮点比较?

How should I do floating point comparison?

提问人:Mike Bailey 提问时间:2/7/2011 最后编辑:phuclvMike Bailey 更新时间:9/7/2023 访问量:130255

问:

我目前正在编写一些代码,其中有如下内容:

double a = SomeCalculation1();
double b = SomeCalculation2();

if (a < b)
    DoSomething2();
else if (a > b)
    DoSomething3();

然后在其他地方,我可能需要做平等:

double a = SomeCalculation3();
double b = SomeCalculation4();

if (a == 0.0)
   DoSomethingUseful(1 / a);
if (b == 0.0)
   return 0; // or something else here

简而言之,我有很多浮点数学运算,我需要对条件进行各种比较。我不能将其转换为整数数学,因为在这种情况下,这样的事情毫无意义。

我之前读过浮点比较可能不可靠,因为您可能会遇到这样的事情:

double a = 1.0 / 3.0;
double b = a + a + a;
if ((3 * a) != b)
    Console.WriteLine("Oh no!");

简而言之,我想知道:如何可靠地比较浮点数(小于、大于、相等)?

我使用的数字范围大致从 10E-14 到 10E6,所以我确实需要处理小数字和大数字。

我将其标记为与语言无关,因为我对无论我使用哪种语言如何实现这一目标感兴趣。

与语言无关的 比较 浮点

评论

0赞 toochin 2/7/2011
使用浮点数时,无法可靠地执行此操作。对于计算机来说,总会有一些数字是相等的,尽管实际上不是相等的(比如 1E+100、1E+100+1),而且你通常也会得到对计算机来说不相等的计算结果,尽管实际上相等(参见对 nelhage 答案的评论之一)。您将不得不选择您不太想要的两者中的哪一个。
0赞 toochin 2/7/2011
另一方面,如果你只处理有理数,你可能会实现一些基于整数的有理数算术,然后如果两个数字中的一个可以被抵消到另一个数字,那么两个数字被认为是相等的。
0赞 Mike Bailey 2/7/2011
好吧,目前我正在进行模拟。我通常做这些比较的地方与可变时间步长有关(用于解决一些颂歌)。在一些情况下,我需要检查一个对象的给定时间步长是否等于、小于或大于另一个对象的时间步长。
0赞 phuclv 4/3/2016
浮点数和双重比较的最有效方法

答:

0赞 nelhage 2/7/2011 #1

标准建议是使用一些小的“epsilon”值(可能根据您的应用程序选择),并考虑彼此 epsilon 内的浮点数相等。例如,类似

#define EPSILON 0.00000001

if ((a - b) < EPSILON && (b - a) < EPSILON) {
  printf("a and b are about equal\n");
}

一个更完整的答案很复杂,因为浮点误差非常微妙,推理起来很混乱。如果你真的关心任何精确意义上的平等,你可能正在寻找一个不涉及浮点的解决方案。

评论

0赞 toochin 2/7/2011
如果他使用的是非常小的浮点数,比如 2.3E-15 呢?
1赞 Mike Bailey 2/7/2011
我正在处理大约[10E-14,10E6]的范围,不完全是机器epsilon,但非常接近它。
2赞 toochin 2/7/2011
如果您记住,您必须处理相对误差,那么处理小数字不是问题。如果您不关心相对较大的误差容差,如果您将其替换为类似的东西,则可以满足上述条件if ((a - b) < EPSILON/a && (b - a) < EPSILON/a)
2赞 toochin 2/7/2011
上面给出的代码在处理非常大的数字时也是有问题的,因为一旦你的数字足够大,EPSILON 就会小于 的机器精度。例如,假设 .那么很可能大于 1。ccc = 1E+22; d=c/3; e=d+d+d;e-c
1赞 toochin 2/7/2011
例如,try(根据 pnt 和 nelhage 的 a 和 c 不相等)或(根据 pnt 和 nelhage 的 a 和 b 相等)double a = pow(8,20); double b = a/7; double c = b+b+b+b+b+b+b; std::cout<<std::scientific<<a-c;double a = pow(10,-14); double b = a/2; std::cout<<std::scientific<<a-b;
-1赞 pnt 2/7/2011 #2

比较双精度相等/不相等的最佳方法是取其差值的绝对值,并将其与足够小的(取决于您的上下文)值进行比较。

double eps = 0.000000001; //for instance

double a = someCalc1();
double b = someCalc2();

double diff = Math.abs(a - b);
if (diff < eps) {
    //equal
}
-1赞 toochin 2/7/2011 #3

您需要考虑到截断误差是相对误差。如果两个数字的差值与它们的 ulp(最后的单位)一样大,则它们大致相等。

但是,如果进行浮点计算,则每次操作时都可能增加误差(尤其是要小心减法!),因此您的容错能力需要相应增加。

0赞 Mike Bailey 2/7/2011 #4

我尝试在考虑上述注释的情况下编写一个相等函数。这是我想出的:

编辑:从 Math.Max(a, b) 更改为 Math.Max(Math.Abs(a), Math.Abs(b))

static bool fpEqual(double a, double b)
{
    double diff = Math.Abs(a - b);
    double epsilon = Math.Max(Math.Abs(a), Math.Abs(b)) * Double.Epsilon;
    return (diff < epsilon);
}

思潮?我仍然需要计算出一个大于和一个小于。

评论

0赞 toochin 2/7/2011
epsilon应为 ,否则它总是小于负数和 。而且我认为你的太小了,该函数可能不会返回与运算符不同的任何内容。大于 是 。Math.abs(Math.Max(a, b)) * Double.Epsilon;diffabepsilon==a < b && !fpEqual(a,b)
1赞 Michael Borgwardt 2/7/2011
当两个值都正好为零时失败,Double.Epsilon 和 -Double.Epsilon 失败,无穷大失败。
1赞 Mike Bailey 2/7/2011
在我的特定应用程序中,无穷大的情况不是问题,但已适当注意。
83赞 Michael Borgwardt 2/7/2011 #5

比较更大/更小并不是真正的问题,除非您正好在浮点数/双精度限制的边缘工作。

对于“模糊等于”的比较,这个(Java代码,应该很容易适应)是我经过大量工作并考虑到大量批评后为浮点指南想出的:

public static boolean nearlyEqual(float a, float b, float epsilon) {
    final float absA = Math.abs(a);
    final float absB = Math.abs(b);
    final float diff = Math.abs(a - b);

    if (a == b) { // shortcut, handles infinities
        return true;
    } else if (a == 0 || b == 0 || diff < Float.MIN_NORMAL) {
        // a or b is zero or both are extremely close to it
        // relative error is less meaningful here
        return diff < (epsilon * Float.MIN_NORMAL);
    } else { // use relative error
        return diff / (absA + absB) < epsilon;
    }
}

它带有一个测试套件。您应该立即忽略任何不这样做的解决方案,因为在某些边缘情况下,它几乎肯定会失败,例如有一个值 0、两个与零相反的非常小的值或无穷大。

另一种方法(有关详细信息,请参阅上面的链接)是将浮点数的位模式转换为整数,并接受固定整数距离内的所有内容。

无论如何,可能没有任何解决方案适合所有应用。理想情况下,您将使用涵盖实际用例的测试套件来开发/调整自己的测试套件。

评论

1赞 Michael Borgwardt 2/7/2011
@toochin:取决于你希望允许的误差幅度有多大,但当你考虑最接近零的非规范化数字时,这成为一个最明显的问题,正数和负数 - 除了零之外,它们比其他任何两个值都更接近,但许多基于相对误差的朴素实现会认为它们相距太远。
2赞 Mark Dickinson 2/7/2011
嗯。你有一个测试,但你在同一行上的注释是 。但这两者不是不同的东西吗?例如,如果 和 则条件为真。else if (a * b == 0)a or b or both are zeroa == 1e-162b == 2e-162a * b == 0
1赞 Michael Borgwardt 2/8/2011
@toochin:主要是因为该代码应该很容易移植到其他可能没有该功能的语言(它也仅在 1.5 中添加到 Java 中)。
1赞 11/9/2011
如果这个函数被大量使用(例如,视频游戏的每一帧),我会用史诗般的优化在汇编中重写它。
1赞 Franz D. 3/5/2015
很好的指南和很好的答案,特别是考虑到这里的答案。两个问题:(1)将所有 s 都改为 s,从而允许“零 eps”比较,相当于精确比较,这不是更好吗?(2)用(最后一行)代替(最后一行)不是更好吗-- ?abs(a-b)<eps<<=diff < epsilon * (absA + absB);diff / (absA + absB) < epsilon;
16赞 tech_loafer 10/13/2012 #6

我遇到了比较浮点数的问题,这似乎是有效的:A < BA > B

if(A - B < Epsilon) && (fabs(A-B) > Epsilon)
{
    printf("A is less than B");
}

if (A - B > Epsilon) && (fabs(A-B) > Epsilon)
{
    printf("A is greater than B");
}

晶圆厂 - 绝对值 - 负责它们是否基本相等。

评论

4赞 fishinear 10/10/2019
如果您进行第一次测试,则完全无需使用fabsif (A - B < -Epsilon)
12赞 nni6 9/27/2013 #7

我们必须选择一个容差级别来比较浮点数。例如

final float TOLERANCE = 0.00001;
if (Math.abs(f1 - f2) < TOLERANCE)
    Console.WriteLine("Oh yes!");

一个音符。你的例子很有趣。

double a = 1.0 / 3.0;
double b = a + a + a;
if (a != b)
    Console.WriteLine("Oh no!");

这里有一些数学

a = 1/3
b = 1/3 + 1/3 + 1/3 = 1.

1/3 != 1

哦,是的。。

你的意思是

if (b != 1)
    Console.WriteLine("Oh no!")
1赞 Dennis #8

从Michael Borgwardt和bosonix的回答中改编为PHP:

class Comparison
{
    const MIN_NORMAL = 1.17549435E-38;  //from Java Specs

    // from http://floating-point-gui.de/errors/comparison/
    public function nearlyEqual($a, $b, $epsilon = 0.000001)
    {
        $absA = abs($a);
        $absB = abs($b);
        $diff = abs($a - $b);

        if ($a == $b) {
            return true;
        } else {
            if ($a == 0 || $b == 0 || $diff < self::MIN_NORMAL) {
                return $diff < ($epsilon * self::MIN_NORMAL);
            } else {
                return $diff / ($absA + $absB) < $epsilon;
            }
        }
    }
}
4赞 Andy Poes 6/24/2015 #9

我在 swift 中进行浮点比较的想法

infix operator ~= {}

func ~= (a: Float, b: Float) -> Bool {
    return fabsf(a - b) < Float(FLT_EPSILON)
}

func ~= (a: CGFloat, b: CGFloat) -> Bool {
    return fabs(a - b) < CGFloat(FLT_EPSILON)
}

func ~= (a: Double, b: Double) -> Bool {
    return fabs(a - b) < Double(FLT_EPSILON)
}
108赞 P-Gn 9/1/2015 #10

TL;DR

  • Use the following function instead of the currently accepted solution to avoid some undesirable results in certain limit cases, while being potentially more efficient.
  • Know the expected imprecision you have on your numbers and feed them accordingly in the comparison function.
bool nearly_equal(
  float a, float b,
  float epsilon = 128 * FLT_EPSILON, float abs_th = FLT_MIN)
  // those defaults are arbitrary and could be removed
{
  assert(std::numeric_limits<float>::epsilon() <= epsilon);
  assert(epsilon < 1.f);

  if (a == b) return true;

  auto diff = std::abs(a-b);
  auto norm = std::min((std::abs(a) + std::abs(b)), std::numeric_limits<float>::max());
  // or even faster: std::min(std::abs(a + b), std::numeric_limits<float>::max());
  // keeping this commented out until I update figures below
  return diff < std::max(abs_th, epsilon * norm);
}

Graphics, please?

When comparing floating point numbers, there are two "modes".

The first one is the relative mode, where the difference between and is considered relatively to their amplitude . When plot in 2D, it gives the following profile, where green means equality of and . (I took an of 0.5 for illustration purposes).xy|x| + |y|xyepsilon

enter image description here

The relative mode is what is used for "normal" or "large enough" floating points values. (More on that later).

The second one is an absolute mode, when we simply compare their difference to a fixed number. It gives the following profile (again with an of 0.5 and a of 1 for illustration).epsilonabs_th

enter image description here

This absolute mode of comparison is what is used for "tiny" floating point values.

Now the question is, how do we stitch together those two response patterns.

In Michael Borgwardt's answer, the switch is based on the value of , which should be below ( in his answer). This switch zone is shown as hatched in the graph below.diffabs_thFloat.MIN_NORMAL

enter image description here

Because is smaller that , the green patches do not stick together, which in turn gives the solution a bad property: we can find triplets of numbers such that and yet but .abs_th * epsilonabs_thx < y_1 < y_2x == y2x != y1

enter image description here

Take this striking example:

x  = 4.9303807e-32
y1 = 4.930381e-32
y2 = 4.9309825e-32

We have , and in fact is more than 2000 times larger than . And yet with the current solution,x < y1 < y2y2 - xy1 - x

nearlyEqual(x, y1, 1e-4) == False
nearlyEqual(x, y2, 1e-4) == True

By contrast, in the solution proposed above, the switch zone is based on the value of , which is represented by the hatched square below. It ensures that both zones connects gracefully.|x| + |y|

enter image description here

Also, the code above does not have branching, which could be more efficient. Consider that operations such as and , which a priori needs branching, often have dedicated assembly instructions. For this reason, I think this approach is superior to another solution that would be to fix Michael's by changing the switch from to , which would then produce essentially the same response pattern.maxabsnearlyEqualdiff < abs_thdiff < eps * abs_th

Where to switch between relative and absolute comparison?

The switch between those modes is made around , which is taken as in the accepted answer. This choice means that the representation of is what limits the precision of our floating point numbers.abs_thFLT_MINfloat32

This does not always make sense. For example, if the numbers you compare are the results of a subtraction, perhaps something in the range of makes more sense. If they are squared roots of subtracted numbers, the numerical imprecision could be even higher.FLT_EPSILON

It is rather obvious when you consider comparing a floating point with . Here, any relative comparison will fail, because . So the comparison needs to switch to absolute mode when is on the order of the imprecision of your computation -- and rarely is it as low as .0|x - 0| / (|x| + 0) = 1xFLT_MIN

This is the reason for the introduction of the parameter above.abs_th

Also, by not multiplying with , the interpretation of this parameter is simple and correspond to the level of numerical precision that we expect on those numbers.abs_thepsilon

Mathematical rumbling

(kept here mostly for my own pleasure)

More generally I assume that a well-behaved floating point comparison operator should have some basic properties.=~

The following are rather obvious:

  • self-equality: a =~ a
  • symmetry: implies a =~ bb =~ a
  • invariance by opposition: implies a =~ b-a =~ -b

(We don't have and implies , is not an equivalence relationship).a =~ bb =~ ca =~ c=~

I would add the following properties that are more specific to floating point comparisons

  • if , then implies (closer values should also be equal)a < b < ca =~ ca =~ b
  • if then implies (larger values with the same difference should also be equal)a, b, m >= 0a =~ ba + m =~ b + m
  • if then implies (perhaps less obvious to argument for).0 <= λ < 1a =~ bλa =~ λb

Those properties already give strong constrains on possible near-equality functions. The function proposed above verifies them. Perhaps one or several otherwise obvious properties are missing.

When one think of as a family of equality relationship parameterized by and , one could also add=~=~[Ɛ,t]Ɛabs_th

  • if then implies (equality for a given tolerance implies equality at a higher tolerance)Ɛ1 < Ɛ2a =~[Ɛ1,t] ba =~[Ɛ2,t] b
  • if then implies (equality for a given imprecision implies equality at a higher imprecision)t1 < t2a =~[Ɛ,t1] ba =~[Ɛ,t2] b

The proposed solution also verifies these.

评论

4赞 anneb 5/8/2020
c++ implementation question: can ever be greater than ?(std::abs(a) + std::abs(b))std::numeric_limits<float>::max()
5赞 Paul Groke 12/10/2020
@anneb Yes, it can be +INF.
0赞 andypea 2/2/2021
The parameter names in your code appear to be reversed. The 'relth' parameter is being used as an absolute threshold, whilst the 'epsilon' parameter is being used as a relative threshold.
1赞 P-Gn 2/22/2021
@andypea Thanks. Actually it's "just" terrible naming -- I switched to a much more meaningful .abs_th
0赞 L4ZZA 12/1/2021
while the code can be translated you can't compare it with the chosen solution since they're to different languages, hence you can't say "chose this over that"..
1赞 fishinear 8/20/2018 #11

You should ask yourself why you are comparing the numbers. If you know the purpose of the comparison then you should also know the required accuracy of your numbers. That is different in each situation and each application context. But in pretty much all practical cases there is a required absolute accuracy. It is only very seldom that a relative accuracy is applicable.

To give an example: if your goal is to draw a graph on the screen, then you likely want floating point values to compare equal if they map to the same pixel on the screen. If the size of your screen is 1000 pixels, and your numbers are in the 1e6 range, then you likely will want 100 to compare equal to 200.

Given the required absolute accuracy, then the algorithm becomes:

public static ComparisonResult compare(float a, float b, float accuracy) 
{
    if (isnan(a) || isnan(b))   // if NaN needs to be supported
        return UNORDERED;    
    if (a == b)                 // short-cut and takes care of infinities
        return EQUAL;           
    if (abs(a-b) < accuracy)    // comparison wrt. the accuracy
        return EQUAL;
    if (a < b)                  // larger / smaller
        return SMALLER;
    else
        return LARGER;
}
0赞 NewSites 3/23/2021 #12

I came up with a simple approach to adjusting the size of epsilon to the size of the numbers being compared. So, instead of using:

iif(abs(a - b) < 1e-6, "equal", "not")

if and can be large, I changed that to:ab

iif(abs(a - b) < (10 ^ -abs(7 - log(a))), "equal", "not")

I suppose that doesn't satisfy all the theoretical issues discussed in the other answers, but it has the advantage of being one line of code, so it can be used in an Excel formula or an Access query without needing a VBA function.

I did a search to see if others have used this method and I didn't find anything. I tested it in my application and it seems to be working well. So it seems to be a method that is adequate for contexts that don't require the complexity of the other answers. But I wonder if it has a problem I haven't thought of since no one else seems to be using it.

If there's a reason the test with the log is not valid for simple comparisons of numbers of various sizes, please say why in a comment.

0赞 mdk 9/7/2023 #13

so what do you think of this solution?

#define TRUE 1
#define FALSE 0

bool float_compare (float a, float b) ;

bool float_compare (float a, float b) 
{
    if ( a > b)
        return FALSE;

    if ( b > a )
        return FALSE;
        
    return TRUE;
}

int main()
{
    bool res = FALSE;
    float X = 0.00001;
    float Y = 0.00001;
    
    res = float_compare (X, Y) ;
    
    if (res) {
        /* Do what you need to, if X and Y are equal */
        
    } else {
        /* Do what you need to, if X and Y are NOT equal */
        
    }

    return 0;
}

This logic should work for any of the data types like double / float / int / etc.

Am I missing something?

评论

0赞 syockit 11/7/2023
That function is no different than doing , so yes, it's missing a lot.a == b