Java Nio ByteBuffer 在缓冲区达到其边界时截断 Unicode 字符-解网

问：

我正在用 java 编写一个函数，该函数可以读取文件并将其内容转换为 String：

public static String ReadFromFile(String fileLocation) {
    StringBuilder result = new StringBuilder();
    RandomAccessFile randomAccessFile = null;
    FileChannel fileChannel = null;
    try {
        randomAccessFile = new RandomAccessFile(fileLocation, "r");
        fileChannel = randomAccessFile.getChannel();
        ByteBuffer byteBuffer = ByteBuffer.allocate(10);
        CharBuffer charBuffer = null;
        int bytesRead = fileChannel.read(byteBuffer);
        while (bytesRead != -1) {
            byteBuffer.flip();
            charBuffer = StandardCharsets.UTF_8.decode(byteBuffer);
            result.append(charBuffer.toString());
            byteBuffer.clear();
            bytesRead = fileChannel.read(byteBuffer);
        }
    } catch (IOException ignored) {
    } finally {
        try {
            if (fileChannel != null)
                fileChannel.close();
            if (randomAccessFile != null)
                randomAccessFile.close();
        } catch (IOException ignored) {
        }
    }
    return result.toString();
}

从上面的代码中，您可以看到我故意将“ByteBuffer.allocate”设置为仅 10 个字节，以使事情更清晰。现在我想读取一个名为“test.txt”的文件，其中包含中文的 unicode 字符，如下所示：

乐正绫我爱你乐正绫我爱你

以下是我的测试代码：

System.out.println(ReadFromFile("test.txt"));

控制台中的预期输出

乐正绫我爱你乐正绫我爱你

控制台中的实际输出

乐正绫���爱你��正绫我爱你

可能的原因
ByteBuffer 只分配了 10 个字节，因此 unicode 字符每 10 个字节被截断一次。

尝试解决
将 ByteBuffer 分配的字节增加到 20，我得到了以下结果：

乐正绫我爱你��正绫我爱你

不是一个强大的解决方案
将 ByteBuffer 分配给一个非常大的数字，比如 102400，但当涉及到非常大的文本文件时，这是不切实际的。

问：
如何解决这个问题？

Java Unicode IO NIO 文件阅读器

Java Nio ByteBuffer 在缓冲区达到其边界时截断 Unicode 字符

Java Nio ByteBuffer truncate unicode characters when buffer reaches its bound

评论

评论