快速计算出 32 位浮点表示和相关损耗的方法-解网

问：

我正在处理来自高精度系统的给定范围（即朗度和滞后度）的实数，这将为我提供许多小数（通常以 15 为量级，当然还有逗号最多剩三位）。现在，这些小数点在多大程度上代表了实际的知识，我不知道，但我想把它们都用起来。

问题是，对于任何给定的数字，如何快速知道 32 位浮点表示是否会在小数位上产生任何损失，如果是，损失多少？有没有一些在线工具可以做到这一点，或者我可以在 excel 表格或其他东西中进行一些快速计算？

浮点精度

#include <float.h>
#include <math.h>

#define DBL_FLT_DIG_DIFF (DBL_MANT_DIG - FLT_MANT_DIG)
#define DBL_FLT_DIG_DIFF_MOD (1ull << DBL_FLT_DIG_DIFF)

// Return loss in fraciton of a ULB of x as a float.
double double_to_float_loss(double x) {
  if (x < 0) {
    return double_to_float_loss(-x);
  const double scale = DBL_FLT_DIG_DIFF_MOD;
  // The frexp functions break a floating-point number into a normalized
  // fraction and an integer exponent
  int expo;
  long long ifraction = (long long) (frexp(x, &expo) * scale);
  long long loss = ifraction % DBL_FLT_DIG_DIFF_MOD;
  ifraction /= DBL_FLT_DIG_DIFF_MOD;
  // 0.5 as conversion from float64 to float32 typically rounds to nearest.
  return 0.5 * loss/ifraction;
}

小数点后位置的损失，

对于给定的，很简单，只需从上面评估小数位数即可。float64loss

...
long long loss = ifraction % DBL_FLT_DIG_DIFF_MOD;
int deimcal_loss = 0;
while (loss) {
  loss /= 10;
  decimal_loss++;
}

更糟糕的情况：log10(DBL_FLT_DIG_DIFF_MOD)

上一个：为什么 MinGW GCC 对 atan2、cos、exp 和 sin 使用 x87 80 位 FP 库代码？

下一个：为什么对 0d Numpy 数组进行操作会产生 Numpy 浮点数？

快速计算出 32 位浮点表示和相关损耗的方法

Quick way to figure out 32-bit floating point representation and associated loss

评论