C++ 将二进制文件读入 uint8 数组以返回十进制 int 给出错误的结果-解网

问：

我尝试解析二进制文件并从中提取不同的数据结构。一个可以是 uint8 或 int8（也可以是 uint16、int16 ...直到 64 岁）。

为了获得最通用的方法，我从给定的文件指针中读入数据并将其保存在 uint8 数组（缓冲区）中。

在我的测试中，我假设文件内容为 40（十六进制）应导致结果为整数 64。这就是为什么我的测试方法断言这个值是关于它的。 ** 不幸的是，uint8 数组的内容总是导致十进制 int 为 52。我不知道为什么，并尝试了各种其他方法来读取特定数量的字节并将它们分配给整数变量。这是一个关于endianess的话题还是什么？

如果有人能帮上忙，请提前致谢:)

我read_int方法：

int read_int(FILE * file,int n,bool is_signed) throw(){
  assert(n>0);
  uint8_t n_chars[n];
  int result;
  for (int i = 0; i < n; i++)
  {
    if(fread(&n_chars[i],sizeof(n_chars[i]),1,file)!=1){
        std::cerr<< "fread() failed!\n";
        throw new ReadOpFailed();
    }
    result*=255;
    result+=n_chars[i];
  }
    std::cout<< "int read: "<<result<<"\n";
    return result;

//-------------Some ideas that didn't work out either------------------
    // std::stringstream ss;
    // ss << std::hex << static_cast<int>(static_cast<unsigned char>(n_chars)); // Convert byte to hexadecimal string
    // int result;
    // ss >> result; // Parse the hexadecimal string to integer
    // std::cout << "result" << result<<"\n";

一个非常失败的小测试...... 带有字节序检测的部分给出了小字节序的输出（不知道这是否是问题的一部分）。

struct TestContext{
    FILE * create_test_file_hex(char * input_hex,const char * rel_file_path = "test.gguf") {
        std::ofstream MyFile(rel_file_path, std::ios::binary);

        // Write to the file
        MyFile << input_hex;

        // Close the file
        MyFile.close();

        
        // std::fstream outfile (rel_file_path,std::ios::trunc);
        // char str[20] = 
        // outfile.write(str, 20);
        // outfile.close();

        FILE *file = fopen(rel_file_path,"rb");
        try{
            assert(file != nullptr);
        }catch (int e){
            std::cout << "file couldn't be opened due to exception n° "<<std::to_string(e)<<"\n";
            ADD_FAILURE(); 
        }
        std::remove(rel_file_path); //remove file whilst open, to be able to use it, but delete it after the last pointer was deleted.
    return file;
    }
};

TEST(test_tool_functions, test_read_int){
    int n = 1;
    // little endian if true
    if(*(char *)&n == 1) {std::cout<<"Little Endian Detected!!!\n";}
    else{std::cout<<"Big Endian Detected!!!\n";}
    std::string file_hex_content = "400A0E00080000016";
    
    uint64_t should;
    std::istringstream("40") >> std::hex >> should;
    ASSERT_EQ(should,64);
    
    uint64_t result = read_int(TestContext().create_test_file_hex(file_hex_content.data()),1,false);
    ASSERT_EQ(result,should);
}

C++ 文件解析二进制 fread

@AndrejPodzimek，谢谢。我希望读取十六进制值，然后将其转换为保存在变量中的十进制值。（谢谢你的建议。文件处理在另一个模块中完成。你知道如何让函数返回，如果文件包含十六进制值吗？（另请参阅我为这种情况编写的测试）uint864uint86440

0赞 Andrej Podzimek 11/10/2023

我添加了一个答案。基本上，您打算将值为 64 的单个字节写入文件，例如，您可以这样做。你写的是事实上的，即两个字节（），第一个的值是52（），第二个等于48（）。这里是字节值，就是和的来源。尝试将第一个字符放入。stream << '@'stream << "40"stream << '4' << '0''4''0'64 == '@'52 == '4'48 == '0'@file_hex_content

0赞 Andrej Podzimek 11/10/2023

对不起，我上面的第一条评论。我的意思是，不能再编辑它了。（并且不想为了上下文保留而删除它。5452

答：

0赞 Andrej Podzimek 11/10/2023 #1

问题的根本原因是 U 由 ASCII 字符字节（形成人类可读的数字的十六进制字符串表示形式）组成，而不是由构成二进制整数表示形式的字节组成。因此，它不是以单个字节开头的。但有一个字节（ASCII 字节值），后跟另一个字节（ASCII 值）。单个字节（）对应于 ASCII 字符，而不是两个字符和。file_hex_content0x4064'4'52'0'48640x40'@''4''0'

下面是一个小型序列化示例。只要在同一体系结构上进行序列化和反序列化，并且没有可移植性问题，字节序也不是问题。

#include <cstdint>
#include <ios>
#include <iostream>
#include <sstream>

int main() {
  std::stringstream encoded;

  const uint64_t source{0xabcd1234deadbeefULL};
  encoded.write(reinterpret_cast<const char*>(&source), sizeof(source));

  uint64_t target;
  encoded.read(reinterpret_cast<char*>(&target), sizeof(target));

  std::cout << "source == target: " << std::hex << source << " == " << target
            << "\nserialized bytes:";
  for (const uint8_t byte : encoded.str())
    std::cout << ' ' << static_cast<uint32_t>(byte);
  std::cout << std::endl;
}

上面程序的输出，在我的小端机器上执行时，如下所示：

source == target: abcd1234deadbeef == abcd1234deadbeef
serialized bytes: ef be ad de 34 12 cd ab

正如预期的那样，序列化字符串从最低顺序字节开始，以最高顺序字节结束。在大端平台上，第二行将按从最高到最低顺序的字节顺序排列，即 .0xef0xabab cd 12 34 de ad be ef

C++ 将二进制文件读入 uint8 数组以返回十进制 int 给出错误的结果

C++ Read binary file into uint8 array to return decimal int gives wrong result

评论

评论