为什么 DOMDocument 将 html 引号实体都转换为实际引号？-解网

问：

我已经在这里呆了半天了，所以现在是时候寻求帮助了。

我想要的是让 DOMDocument 保留现有实体和 utf-8 字符。我现在认为仅使用 DOMDocument 这是不可能的。

$html =
'<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8">
    </head>
    <body>
        <p>&#39; &quot; & &lt; © 庭</p>
    </body>
</html>';

然后我运行：

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_NOERROR);

echo $dom->saveHTML();

并获取实体输出：

input: &#39; &quot; & &lt; © 庭
output: ' " &amp; &lt; &copy; &#24237;

为什么 DOMDocument 要转换和实际引号？它唯一没有碰到的是.'"<

很确定版权符号正在转换，因为 DOMDocument 不认为输入 html 是 utf-8，但我完全困惑为什么它将引号转换回非实体。

我以为这个技巧可以解决 utf-8 问题，但事实并非如此。mb_convert_encoding

两者都没有诀窍。$dom->loadHTML('<?xml encoding="utf-8" ?>'.$html);

php dom文档

为什么 DOMDocument 将 html 引号实体都转换为实际引号？

Why is DOMDocument converting both html quote-entities to actual quotes?

评论

评论