我正在尝试在PHP中解析希伯来语单词。它作为一个字符串看起来没问题，但是当我尝试将其拆分为字符时，它将无法正确显示-解网

问：

这是我简化的测试代码：

<!DOCTYPE html>
    <?php
        //uncommenting the next line results in the whole page displaying in "chinese -simplified"
        //header("content-type: text/html; charset=UTF-16");
        header('Content-language: he');
    ?>
<html>
<head>
    <meta http-equiv=Content-Type content="text/html; charset=UTF-16">
    <meta http-equiv="content-language" content="he-il">
</head>
<body>
<?php
        // in Production, we are grabbing the hebrew word from the database
        //$sql = "SELECT masoretic FROM codex WHERE id = 20"; // just grabs a word from the database
                                                            // it is stored using UTF16_general_ci on mySQL
        // in this test we can mock the exact same data that was copy and pasted in
        // the results were the same with the data from the db
            $masoretic = "בָּרָ֣א";

            echo $masoretic . '<br>'; // displays correctly in HEBREW = בָּרָ֣א
            // now loop through the word and process each letter
            $length = strlen($masoretic);
            // even though there are only 3 real letters, the diacritic marks count as characters, so we should get at least 7 loops
            for ($x = 0; $x <= $length; $x++) {
                $letter = substr($masoretic,0,1); // process this letter
                $masoretic = substr($masoretic, 1); // the rest of the word
                $name = '';
                $recognized = false;
                switch($letter){
                    case 'ר':
                        $recognized = true;
                        $name = 'Raysh';
                        break;
                    case 'א':
                        $recognized = true;
                        $name = 'Aleph';
                        break;
                    default:
                        $recognized = false;
                        break;
                }
                if($recognized){
                    echo ('found a ' . $name);
                    echo $letter; // for now just display it
                }else{
                        echo 'unrecognized letter:';
                        print_r($letter);
                        echo '<br>';
                }                       
            }           
    ?>
</body>

页面显示如下：

בָּרָ֣א
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:�
unrecognized letter:

我发现很奇怪，完整的希伯来语单词显示正常，但每个单独的字母都不会显示。我认为 UTF16 有一些时髦的事情，所以我添加了标头，但在某些情况下，这实际上使情况变得更糟。（见内联注释）

PHP 字符串解析 UTF-16 希伯来语

// input in in 8 and conversion to 16 since everything on SO is UTF-8
$in_8  = 'בָּרָ֣א';
$in_16 = mb_convert_encoding($in_8, 'UTF-16', 'UTF-8');

foreach(mb_str_split($in_16, 1, 'UTF-16') as $glyph_16) {
    // covert back for example display in UTF-8
    $glyph_8 = mb_convert_encoding($glyph_16, 'UTF-8', 'UTF-16');
    printf("%s %s\n",bin2hex($glyph_16), $glyph_8);
}

您应该能够在自己的代码中省略转换，这些转换将有利于像我这样不使用 UTF-16 的人。

输出：

05d1 ב
05b8 ָ
05bc ּ
05e8 ר
05b8 ָ
05a3 ֣
05d0 א

我正在尝试在PHP中解析希伯来语单词。它作为一个字符串看起来没问题，但是当我尝试将其拆分为字符时，它将无法正确显示

I'm trying to parse a Hebrew word in PHP. It looks ok as a string, but when I try to split it out into characters it won't display correctly

评论

评论