提问人:Mike Bailey 提问时间:2/7/2011 最后编辑:phuclvMike Bailey 更新时间:9/7/2023 访问量:130255
我应该如何进行浮点比较?
How should I do floating point comparison?
问:
我目前正在编写一些代码,其中有如下内容:
double a = SomeCalculation1();
double b = SomeCalculation2();
if (a < b)
DoSomething2();
else if (a > b)
DoSomething3();
然后在其他地方,我可能需要做平等:
double a = SomeCalculation3();
double b = SomeCalculation4();
if (a == 0.0)
DoSomethingUseful(1 / a);
if (b == 0.0)
return 0; // or something else here
简而言之,我有很多浮点数学运算,我需要对条件进行各种比较。我不能将其转换为整数数学,因为在这种情况下,这样的事情毫无意义。
我之前读过浮点比较可能不可靠,因为您可能会遇到这样的事情:
double a = 1.0 / 3.0;
double b = a + a + a;
if ((3 * a) != b)
Console.WriteLine("Oh no!");
简而言之,我想知道:如何可靠地比较浮点数(小于、大于、相等)?
我使用的数字范围大致从 10E-14 到 10E6,所以我确实需要处理小数字和大数字。
我将其标记为与语言无关,因为我对无论我使用哪种语言如何实现这一目标感兴趣。
答:
标准建议是使用一些小的“epsilon”值(可能根据您的应用程序选择),并考虑彼此 epsilon 内的浮点数相等。例如,类似
#define EPSILON 0.00000001
if ((a - b) < EPSILON && (b - a) < EPSILON) {
printf("a and b are about equal\n");
}
一个更完整的答案很复杂,因为浮点误差非常微妙,推理起来很混乱。如果你真的关心任何精确意义上的平等,你可能正在寻找一个不涉及浮点的解决方案。
评论
if ((a - b) < EPSILON/a && (b - a) < EPSILON/a)
c
c
c = 1E+22; d=c/3; e=d+d+d;
e-c
double a = pow(8,20); double b = a/7; double c = b+b+b+b+b+b+b; std::cout<<std::scientific<<a-c;
double a = pow(10,-14); double b = a/2; std::cout<<std::scientific<<a-b;
比较双精度相等/不相等的最佳方法是取其差值的绝对值,并将其与足够小的(取决于您的上下文)值进行比较。
double eps = 0.000000001; //for instance
double a = someCalc1();
double b = someCalc2();
double diff = Math.abs(a - b);
if (diff < eps) {
//equal
}
您需要考虑到截断误差是相对误差。如果两个数字的差值与它们的 ulp(最后的单位)一样大,则它们大致相等。
但是,如果进行浮点计算,则每次操作时都可能增加误差(尤其是要小心减法!),因此您的容错能力需要相应增加。
我尝试在考虑上述注释的情况下编写一个相等函数。这是我想出的:
编辑:从 Math.Max(a, b) 更改为 Math.Max(Math.Abs(a), Math.Abs(b))
static bool fpEqual(double a, double b)
{
double diff = Math.Abs(a - b);
double epsilon = Math.Max(Math.Abs(a), Math.Abs(b)) * Double.Epsilon;
return (diff < epsilon);
}
思潮?我仍然需要计算出一个大于和一个小于。
评论
epsilon
应为 ,否则它总是小于负数和 。而且我认为你的太小了,该函数可能不会返回与运算符不同的任何内容。大于 是 。Math.abs(Math.Max(a, b)) * Double.Epsilon;
diff
a
b
epsilon
==
a < b && !fpEqual(a,b)
比较更大/更小并不是真正的问题,除非您正好在浮点数/双精度限制的边缘工作。
对于“模糊等于”的比较,这个(Java代码,应该很容易适应)是我经过大量工作并考虑到大量批评后为浮点指南想出的:
public static boolean nearlyEqual(float a, float b, float epsilon) {
final float absA = Math.abs(a);
final float absB = Math.abs(b);
final float diff = Math.abs(a - b);
if (a == b) { // shortcut, handles infinities
return true;
} else if (a == 0 || b == 0 || diff < Float.MIN_NORMAL) {
// a or b is zero or both are extremely close to it
// relative error is less meaningful here
return diff < (epsilon * Float.MIN_NORMAL);
} else { // use relative error
return diff / (absA + absB) < epsilon;
}
}
它带有一个测试套件。您应该立即忽略任何不这样做的解决方案,因为在某些边缘情况下,它几乎肯定会失败,例如有一个值 0、两个与零相反的非常小的值或无穷大。
另一种方法(有关详细信息,请参阅上面的链接)是将浮点数的位模式转换为整数,并接受固定整数距离内的所有内容。
无论如何,可能没有任何解决方案适合所有应用。理想情况下,您将使用涵盖实际用例的测试套件来开发/调整自己的测试套件。
评论
else if (a * b == 0)
a or b or both are zero
a == 1e-162
b == 2e-162
a * b == 0
abs(a-b)<eps
<
<=
diff < epsilon * (absA + absB);
diff / (absA + absB) < epsilon;
我遇到了比较浮点数的问题,这似乎是有效的:A < B
A > B
if(A - B < Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is less than B");
}
if (A - B > Epsilon) && (fabs(A-B) > Epsilon)
{
printf("A is greater than B");
}
晶圆厂 - 绝对值 - 负责它们是否基本相等。
评论
fabs
if (A - B < -Epsilon)
我们必须选择一个容差级别来比较浮点数。例如
final float TOLERANCE = 0.00001;
if (Math.abs(f1 - f2) < TOLERANCE)
Console.WriteLine("Oh yes!");
一个音符。你的例子很有趣。
double a = 1.0 / 3.0;
double b = a + a + a;
if (a != b)
Console.WriteLine("Oh no!");
这里有一些数学
a = 1/3
b = 1/3 + 1/3 + 1/3 = 1.
1/3 != 1
哦,是的。。
你的意思是
if (b != 1)
Console.WriteLine("Oh no!")
从Michael Borgwardt和bosonix的回答中改编为PHP:
class Comparison
{
const MIN_NORMAL = 1.17549435E-38; //from Java Specs
// from http://floating-point-gui.de/errors/comparison/
public function nearlyEqual($a, $b, $epsilon = 0.000001)
{
$absA = abs($a);
$absB = abs($b);
$diff = abs($a - $b);
if ($a == $b) {
return true;
} else {
if ($a == 0 || $b == 0 || $diff < self::MIN_NORMAL) {
return $diff < ($epsilon * self::MIN_NORMAL);
} else {
return $diff / ($absA + $absB) < $epsilon;
}
}
}
}
我在 swift 中进行浮点比较的想法
infix operator ~= {}
func ~= (a: Float, b: Float) -> Bool {
return fabsf(a - b) < Float(FLT_EPSILON)
}
func ~= (a: CGFloat, b: CGFloat) -> Bool {
return fabs(a - b) < CGFloat(FLT_EPSILON)
}
func ~= (a: Double, b: Double) -> Bool {
return fabs(a - b) < Double(FLT_EPSILON)
}
TL;DR
- Use the following function instead of the currently accepted solution to avoid some undesirable results in certain limit cases, while being potentially more efficient.
- Know the expected imprecision you have on your numbers and feed them accordingly in the comparison function.
bool nearly_equal(
float a, float b,
float epsilon = 128 * FLT_EPSILON, float abs_th = FLT_MIN)
// those defaults are arbitrary and could be removed
{
assert(std::numeric_limits<float>::epsilon() <= epsilon);
assert(epsilon < 1.f);
if (a == b) return true;
auto diff = std::abs(a-b);
auto norm = std::min((std::abs(a) + std::abs(b)), std::numeric_limits<float>::max());
// or even faster: std::min(std::abs(a + b), std::numeric_limits<float>::max());
// keeping this commented out until I update figures below
return diff < std::max(abs_th, epsilon * norm);
}
Graphics, please?
When comparing floating point numbers, there are two "modes".
The first one is the relative mode, where the difference between and is considered relatively to their amplitude . When plot in 2D, it gives the following profile, where green means equality of and . (I took an of 0.5 for illustration purposes).x
y
|x| + |y|
x
y
epsilon
The relative mode is what is used for "normal" or "large enough" floating points values. (More on that later).
The second one is an absolute mode, when we simply compare their difference to a fixed number. It gives the following profile (again with an of 0.5 and a of 1 for illustration).epsilon
abs_th
This absolute mode of comparison is what is used for "tiny" floating point values.
Now the question is, how do we stitch together those two response patterns.
In Michael Borgwardt's answer, the switch is based on the value of , which should be below ( in his answer). This switch zone is shown as hatched in the graph below.diff
abs_th
Float.MIN_NORMAL
Because is smaller that , the green patches do not stick together, which in turn gives the solution a bad property: we can find triplets of numbers such that and yet but .abs_th * epsilon
abs_th
x < y_1 < y_2
x == y2
x != y1
Take this striking example:
x = 4.9303807e-32
y1 = 4.930381e-32
y2 = 4.9309825e-32
We have , and in fact is more than 2000 times larger than . And yet with the current solution,x < y1 < y2
y2 - x
y1 - x
nearlyEqual(x, y1, 1e-4) == False
nearlyEqual(x, y2, 1e-4) == True
By contrast, in the solution proposed above, the switch zone is based on the value of , which is represented by the hatched square below. It ensures that both zones connects gracefully.|x| + |y|
Also, the code above does not have branching, which could be more efficient. Consider that operations such as and , which a priori needs branching, often have dedicated assembly instructions. For this reason, I think this approach is superior to another solution that would be to fix Michael's by changing the switch from to , which would then produce essentially the same response pattern.max
abs
nearlyEqual
diff < abs_th
diff < eps * abs_th
Where to switch between relative and absolute comparison?
The switch between those modes is made around , which is taken as in the accepted answer. This choice means that the representation of is what limits the precision of our floating point numbers.abs_th
FLT_MIN
float32
This does not always make sense. For example, if the numbers you compare are the results of a subtraction, perhaps something in the range of makes more sense. If they are squared roots of subtracted numbers, the numerical imprecision could be even higher.FLT_EPSILON
It is rather obvious when you consider comparing a floating point with . Here, any relative comparison will fail, because . So the comparison needs to switch to absolute mode when is on the order of the imprecision of your computation -- and rarely is it as low as .0
|x - 0| / (|x| + 0) = 1
x
FLT_MIN
This is the reason for the introduction of the parameter above.abs_th
Also, by not multiplying with , the interpretation of this parameter is simple and correspond to the level of numerical precision that we expect on those numbers.abs_th
epsilon
Mathematical rumbling
(kept here mostly for my own pleasure)
More generally I assume that a well-behaved floating point comparison operator should have some basic properties.=~
The following are rather obvious:
- self-equality:
a =~ a
- symmetry: implies
a =~ b
b =~ a
- invariance by opposition: implies
a =~ b
-a =~ -b
(We don't have and implies , is not an equivalence relationship).a =~ b
b =~ c
a =~ c
=~
I would add the following properties that are more specific to floating point comparisons
- if , then implies (closer values should also be equal)
a < b < c
a =~ c
a =~ b
- if then implies (larger values with the same difference should also be equal)
a, b, m >= 0
a =~ b
a + m =~ b + m
- if then implies (perhaps less obvious to argument for).
0 <= λ < 1
a =~ b
λa =~ λb
Those properties already give strong constrains on possible near-equality functions. The function proposed above verifies them. Perhaps one or several otherwise obvious properties are missing.
When one think of as a family of equality relationship parameterized by and , one could also add=~
=~[Ɛ,t]
Ɛ
abs_th
- if then implies (equality for a given tolerance implies equality at a higher tolerance)
Ɛ1 < Ɛ2
a =~[Ɛ1,t] b
a =~[Ɛ2,t] b
- if then implies (equality for a given imprecision implies equality at a higher imprecision)
t1 < t2
a =~[Ɛ,t1] b
a =~[Ɛ,t2] b
The proposed solution also verifies these.
评论
(std::abs(a) + std::abs(b))
std::numeric_limits<float>::max()
abs_th
You should ask yourself why you are comparing the numbers. If you know the purpose of the comparison then you should also know the required accuracy of your numbers. That is different in each situation and each application context. But in pretty much all practical cases there is a required absolute accuracy. It is only very seldom that a relative accuracy is applicable.
To give an example: if your goal is to draw a graph on the screen, then you likely want floating point values to compare equal if they map to the same pixel on the screen. If the size of your screen is 1000 pixels, and your numbers are in the 1e6 range, then you likely will want 100 to compare equal to 200.
Given the required absolute accuracy, then the algorithm becomes:
public static ComparisonResult compare(float a, float b, float accuracy)
{
if (isnan(a) || isnan(b)) // if NaN needs to be supported
return UNORDERED;
if (a == b) // short-cut and takes care of infinities
return EQUAL;
if (abs(a-b) < accuracy) // comparison wrt. the accuracy
return EQUAL;
if (a < b) // larger / smaller
return SMALLER;
else
return LARGER;
}
I came up with a simple approach to adjusting the size of epsilon to the size of the numbers being compared. So, instead of using:
iif(abs(a - b) < 1e-6, "equal", "not")
if and can be large, I changed that to:a
b
iif(abs(a - b) < (10 ^ -abs(7 - log(a))), "equal", "not")
I suppose that doesn't satisfy all the theoretical issues discussed in the other answers, but it has the advantage of being one line of code, so it can be used in an Excel formula or an Access query without needing a VBA function.
I did a search to see if others have used this method and I didn't find anything. I tested it in my application and it seems to be working well. So it seems to be a method that is adequate for contexts that don't require the complexity of the other answers. But I wonder if it has a problem I haven't thought of since no one else seems to be using it.
If there's a reason the test with the log is not valid for simple comparisons of numbers of various sizes, please say why in a comment.
so what do you think of this solution?
#define TRUE 1
#define FALSE 0
bool float_compare (float a, float b) ;
bool float_compare (float a, float b)
{
if ( a > b)
return FALSE;
if ( b > a )
return FALSE;
return TRUE;
}
int main()
{
bool res = FALSE;
float X = 0.00001;
float Y = 0.00001;
res = float_compare (X, Y) ;
if (res) {
/* Do what you need to, if X and Y are equal */
} else {
/* Do what you need to, if X and Y are NOT equal */
}
return 0;
}
This logic should work for any of the data types like double / float / int / etc.
Am I missing something?
评论
a == b
评论