C++ 基本浮点解析代码与 std：：atof 结果略有不同-解网

问：

#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <iomanip>
#include <cmath>
using namespace std;

#define LIKELY(expr) (__builtin_expect(!!(expr), 1))
#define UNLIKELY(expr) (__builtin_expect(!!(expr), 0))

bool abscmp(double a, double b)
{
  if (isnan(a) && isnan(b)) return true;
  if (isnan(a) ^ isnan(b)) return false;
  return a == b;
}

template <typename T>
inline __attribute__((always_inline)) T ParseFloat(const char *a) {
  static constexpr T multers[] = {
    0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 0.00000001, 0.000000001, 0.0000000001, 0.00000000001,
    0.000000000001, 0.0000000000001, 0.00000000000001, 0.000000000000001, 0.0000000000000001, 0.00000000000000001
  };
  static_assert(std::is_floating_point_v<T>);
  int i = (a[0] == '-') | (a[0] == '+');
  T res = 0.0;
  int sign = 1 - 2 * (a[0] == '-');
  if (UNLIKELY(!a[0])) return NAN;

  while (a[i] && a[i] != '.') {
    if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
      return NAN;
    }
    res = res * static_cast<T>(10.0) + a[i] - '0';
    i++;
  }

  if (LIKELY(a[i] != '\0')) {
    i++;
    int j = i;
    //T mult = 0.1;
    while (a[i]) {
      if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
        return NAN;
      }
      res = res + (a[i] - '0') * multers[i - j];
      // res = res + (a[i] - '0') * mult;
      // mult *= 0.1;
      i++;
    }
  }

  return res * sign;
}

int main()
{
  string inputs[] = {
    "31.0911863667",
    "30.9500",
    "225.1293333333",
    "16.4850",
    "29.0507297346",
    "147.9440517474",
    "28.8500",
    "213.4600",
    "212.9105553333",
    "199.1553333333",
    "19.5884123000",
    "3092458.37500000000"
  };

  int n = sizeof(inputs) / sizeof(inputs[0]);
  for (int i = 0; i < n; i++) {
    float res1 = std::atof(inputs[i].c_str());
    float res2 = ParseFloat<double>(inputs[i].c_str());
    if (!abscmp(res1, res2)) {
      cout << std::fixed << std::setprecision(20) << "CompareConvert " << res1 << " " << res2 << " " << std::string(inputs[i]) << std::endl;
    } else {
      cout << std::fixed << std::setprecision(20) << "Correct " << res1 << std::endl;
    }
  }
}

我正在编写一个简单的解析器（具有完全有效性检查），因为它太慢了（在我的测试输入中平均快 3.2 倍 - 解析 CSV 文件的 GB）。公式非常简单，.但它给出的结果略有不同。std::atofParseFastres = res * 10 + (a[i] - '0');

我知道这是因为 IEEE-754 浮点的局限性。但是有没有便宜的方法可以给出完全相同的结果？我需要它们完全相同，因为它正在与使用 sha256sum 而不是ParseFaststd::atoffabs(a - b) < epsilon

运行命令：gcc 10.2.0g++ -o main main.cpp -O3 -std=c++17

编辑：解释为什么最后一个输入是错误的：小数部分正好介于 2 个可能的值和之间。但是使用这种解析方法，在数字处，中间结果将是四舍五入。除此之外，仍然会给.0.3750.50.25.33092458.0 + 0.3 == 3092458.250.0753092458.25

C++ 解析优化点 IEEE-754

与您的问题无关，但既然您用 C++ 编写函数，为什么不直接用作参数呢？另外，为什么不使用？如果要使用索引进行循环，请使用整数类型。哦，你的按位运算符“技巧”实际上只不过是技巧。如果你避免使用它们，并使用易于理解（和维护）的普通运算符和表达式，编译器将有更好的机会很好地优化它们。这样的伎俩通常被误导了。std::stringa[i] < '0' || a[i] > '9'!std::isdigit(a[i])unsigned

0赞 Huy Le 8/2/2023

它插入到另一个模块中，该模块需要作为输入。对于，是的，也许这是一个好主意，我会尝试一下。对于 vs ，我只是尽可能使用，因为我没有看到任何性能差异const char*!std::isdigit(a[i])intunsignedint

1赞 paddy 8/2/2023

std::atof返回的类型为。您正在用于内部计算。你可能想先比较一下，看看你有多接近。乍一看，我会说这可能是有问题的。正如您可能知道的那样，在二进制中不能清晰地表示，因此这不仅是一个直接可能的问题，而且当您这样做时也会变得复杂doublefloatParseFloat<double>mult0.1mult *= 0.1

4赞 molbdnilo 8/2/2023

0.1 没有精确表示为二进制浮点数;乘以它将不可避免地引入错误。如果有一种更快的方法可以获得与相同的结果，包括错误检查，您不认为已经这样做了吗？atofatof

1赞 user207421 8/2/2023

在它产生相同的输出之前，它的速度是完全无关紧要的。你可以在零时间内得到错误的答案，但这不是目标。我建议你在这里向风车倾斜。已经工作了 50 年，它的性能是有原因的：特别是它必须满足的所有特殊情况。你在这里最大的问题之一是，在你执行任何代码之前，朋友就已经不准确了。atof()0.1

答：

1赞 cpplearner 8/2/2023 #1

我建议使用 fast_float 库（https://github.com/fastfloat/fast_float）。据称它比（glibc 只是一个简单的包装器）快 4 到 10 倍。strtodatodstrtod

  // ...
  if (LIKELY(a[i] != '\0')) {
    T fraction = 0; // fractional part as an integer
    T power = 1;    // turns to 1, 10, 100, 1000, ... each loop iteration
    i++;
    while (a[i]) {
      if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
        return NAN;
      }
      // note: additional logic is required to make sure that trailing
      //       zeros in the fraction cannot decrease the precision
      power *= T{10};
      fraction = fraction * T{10} + (a[i] - '0');
      i++;
    }

    // note: 'power' can also be obtained from a look-up table like in
    //       your original code.
    //       Benchmark to make sure that it's actually faster to use a table.
    res += fraction / power; // perform one division in the end, e.g. 375 / 100
  }
  
  return std::copysign(res, sign); // note: prefer copysign over multiplication
}

查看编译器资源管理器上的实时示例

即使有这些更改，@cpplearner建议使用第三方库也可能更好。标准库函数类似也可能不提供最佳性能，并且性能会因标准库而异。std::strtofstd::to_chars

虽然您的解决方案在某些平台上可能更快，但在浮点乘法和除法成本更高的平台上，它的性能可能会差得多。浮点数很难。

此代码不正确。对于 binary64，当超过 22 位数字时会出现舍入错误，并且会出现舍入错误，从而导致最终总和不正确。（如果选择得当，即使 17 位有效数字足以区分 binary64 数字，也可能存在前导零（非有效数字）和/或用户可能为接近舍入过渡的数字提供许多数字。powerfraction/power

0赞 Jan Schultke 8/3/2023

@EricPostpischil是的，完全正确地实现这一点是相当困难的，尤其是在包含非规范化数字等时。此代码至少在简单情况下有效，因此它是对 OP 原始代码的改进。这种事情最好由第三方库来完成。

上一个：IEEE754单精度浮点数的字节表示形式不一致

下一个：从 0 到 N-1 的最大整数范围是多少，可以由 [0， 1] 范围内的 IEEE 754 双精度浮点值表示？

C++ 基本浮点解析代码与 std：：atof 结果略有不同

C++ basic float parsing code slightly different result from std::atof

评论

评论

评论