如何处理 toupper（）在具有 UTF-8 语言环境的最新 macOS 中返回的大于 255 的值-解网

问：

下面的代码试图解决的问题是如何有效地检测基于 UTF-8 的语言环境可能正在使用中，以便不会查询 127 以上的所有代码点的属性，因为我们正在处理普通（不是宽）字符。ctype

至少在 macOS 14 中，使用基于 UTF-8 的区域设置时。以下程序将显示 2 个有问题的代码点，即使这些代码点对无符号字符有效，也会从无法适合该类型的值中获取响应：toupper()

#include <langinfo.h>
#include <ctype.h>
#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#ifdef WORKAROUND
static int is_utf8_locale(void)
{
        const char *charmap = nl_langinfo(CODESET);

        /* this shouldn't happen */
        if (!charmap)
                return 0;

        if (!strncmp(charmap, "UTF-8", 5))
                return 1;

        /*
         * nl_langinfo should never return an empty string, unless the "item" used is invalid, and it
         * should return the C/POSIX CODESET if the locale is missing one, but ...
         */
        if (!*charmap) {
                unsigned char buf[MB_CUR_MAX + 1];
                return (wctomb((char *)&buf, 0xf8ff) == 3) &&
                        (buf[0] == 0xef && buf[1] == 0xa3 && buf[2] == 0xbf);
        }

        return 0;
}
#endif

int main(int argc, char *argv[])
{
        const char *locale = (argc > 1) ? argv[1] : "fr_FR";
        int i, f = 0;

        if (!setlocale(LC_CTYPE, locale))
                return 127;

#ifdef WORKAROUND
        if (is_utf8_locale())
                return 0;
#endif

        for (i = 128; i < 256; i++) {
                int u, l;
                unsigned char c = i;

                l = tolower(c);
                u = toupper(c);

                if (l > 255) {
                        int t = l % 256;
                        f++;
                        printf("tolower(%d) %c -> %d (%#x) %c\n", c, c, l, t, t);
                }
                if (u > 255) {
                        int t = u % 256;
                        f++;
                        printf("toupper(%d) %c -> %d (%#x) %c\n", c, c, u, t, t);
                }
        }
        return f;
}

默认fr_FR区域设置的输出（经过一些次要格式化后）显示：

toupper(181) µ -> 924 (0x9c) <9c>
toupper(255) ÿ -> 376 (0x78) x

AFAIK，这种行为变化在某种程度上是最近的，至少不会发生在 10.15 中，虽然众所周知“有时”会尝试更有帮助（作为 BSD 扩展），但我在我尝试过的任何最近 BSD 系统中都看不到这个问题，他们都提到该行为已被弃用，建议改用宽字符接口。toupper()toupper()

“-DWORKAROUND”可以工作，但恕我直言，它太丑陋了，而且在线程环境中也会有问题，同时由于 macOS 定义其语言环境的方式而特别棘手。

所有没有显式语言环境（显示注释中描述的问题响应）以及包含和有时在其他系统中使用的语言环境（尽管在这些情况下它们总是返回正确的值）似乎都受到影响。.${CHARMAP}nl_langinfo().UTF-8.utf8nl_langinfo()

该解决方法显然需要非 POSIX 系统的额外代码。

受影响的应用程序不支持 UTF-8 以外的外来多字节编码，但使用的检测很脆弱，可能无法在 Apple 系统之外工作，因此也希望提供建议或测试结果。wctomb()

C macOS UTF-8 区域设置

小看起来像一个更好的mcve。181 是 compart.com/en/unicode/U+00B5，大写字符 compart.com/en/unicode/U+039C，那么这是否使 toupper 返回宽字符？我认为这是一个错误，因为应该返回一个返回 true 的字符，但是返回是什么？int main() { printf("%d\n", toupper(181)); }toupperisupperisupper((unsigned char)toupper((unsigned char)181)))

答：

2赞 KamilCuk 10/23/2023 #1

如何处理 toupper（）在具有 UTF-8 语言环境的最新 macOS 中返回的大于 255 的值

在任何区域设置和任何系统中，都不可能使用（在具有正常 8 位字符的健全系统上）处理大于 255 的值。toupper

7.4 字符处理 <ctype.h>

标头声明了几个可用于分类和映射字符的函数。在所有情况下，参数都是，其值应表示为或等于宏的值。如果参数具有任何其他值，则行为未定义。<ctype.h>intunsigned charEOF

[...]

7.4.2.2 toupper 函数

概要
#include <ctype.h>
int toupper(int c);
说明
该函数将小写字母转换为相应的大写字母。toupper

返回
如果参数是 true 的字符，并且有一个或多个相应的字符（由当前区域设置指定）为 true，则该函数返回一个相应的字符（对于任何给定的区域设置始终是相同的字符）;否则，将返回参数不变。islowerisuppertoupper

这意味着要么toupper(181)

返回一个字符，其返回非零isupper
或返回 181

如果返回的值大于定义范围内的参数，则这是一个错误，确实可能会在许多程序中导致进一步的问题，例如：https://github.com/PCRE2Project/pcre2/pull/313toupperUCHAR_MAX

我的系统（macOS 13.4，Homebrew clang 版本 16.0.6，目标：x86_64-apple-darwin22.5.0）上存在问题，如下面的测试程序所示：

#include <ctype.h>
#include <langinfo.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    const char *locale = (argc > 1) ? argv[1] : "fr_FR";
    int f = 0;

    if (!setlocale(LC_CTYPE, locale))
        return 127;

    printf("Testing locale %s:\n", locale);

    for (int c = 128; c < 256; c++) {
        int l = tolower(c);
        int u = toupper(c);

        if (l > 255) {
            char cbuf[MB_CUR_MAX];
            char lbuf[MB_CUR_MAX];
            cbuf[wctomb(cbuf, c)] = '\0';
            lbuf[wctomb(lbuf, l)] = '\0';
            printf("%d: %s  isupper(%d): %d  tolower(%d): %d, %s\n",
                   c, cbuf, c, isupper(c), c, l, lbuf);
            f++;
        }
        if (u > 255) {
            char cbuf[MB_CUR_MAX];
            char ubuf[MB_CUR_MAX];
            cbuf[wctomb(cbuf, c)] = '\0';
            ubuf[wctomb(ubuf, u)] = '\0';
            printf("%d: %s  islower(%d): %d  toupper(%d): %d, %s\n",
                   c, cbuf, c, islower(c), c, u, ubuf);
            f++;
        }
    }
    if (f) {
        printf("%d errors!\n", f);
    }
    return f;
}

我的系统中的输出：

Testing locale fr_FR:
181: µ  islower(181): 1  toupper(181): 924, Μ
255: ÿ  islower(255): 1  toupper(255): 376, Ÿ
2 errors!

查看 Apple LibC 的源代码，似乎他们试图对宏和宽字符版本使用相同的表，并且仅拒绝大于 .碰巧的是，Unicode 中的大写版本大于并且应该返回此区域设置，但应该被忽略。这是实现中的一个错误。<ctype.h>towuppertowlowerUCHAR_MAXµÿUCHAR_MAXtowuppertoupper

@n.m.couldbeanAI：问题更微妙：，如果选择了此语言环境，则在 ISO8859-1 中是大写字母，并且应返回，（，相应的小写字符），但不得返回大于 255 的值。问题是 UTF-8 不是单字节字符集（它甚至不是字符集，而是 Unicode 的编码），因此 128-255 范围内返回的字节值与字符不对应，因此函数应该返回并且应该返回它们不变。192Àtolower(192)224àtoupper(181)getchar()isxxx()0tolower()toupper()

1赞 n. m. could be an AI 10/24/2023

@CarloArenas“我们不应该使用在该区域设置上不是有效字符的值来调用它们”，这完全不是标准所说的。该标准规定，与大写/小写字符不对应的值应按 tolower/toupper 返回。这是设计使然。对于用户程序来说，找出哪些字节对应于当前区域设置中的字符并不容易（更确切地说是不可能的）。相比之下，对于tolower/toupper来说，这是微不足道的。

1赞 n. m. could be an AI 10/24/2023 #3

该问题仅存在于和/或好友损坏（不符合 C 标准）的系统上。因此，没有必要检测 UTF-8 语言环境，需要检测损坏的区域设置。touppertoupper

如果为任何输入返回一个值 >255，则它（和整个系列）被破坏（AFAICT 不仅针对该输入，而且至少针对所有输入 >127）。因此，如果检测到此类值，请立即停在那里并重新初始化表的整个上半部分。toupperctype

您甚至可以从检查开始。如果该值大于 255，则表示已损坏。此方法仅检测此特定错误，但您不能希望检测到所有可能的错误。检测并解决您知道的人。toupper(181)ctypelibc

0赞 Carlo Arenas 10/24/2023 #4

使用从 @chqrlie 提出的代码派生的调试工具，并验证了 macOS 13 中也发生的虚假响应，并且可能是由于该函数的实现错误，似乎最好的选择是做一个狭窄的解决方法，忽略虚假值并使函数以某种方式更接近标准，已经实施。isupper()

代码：

#include <langinfo.h>
#include <ctype.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    const char *locale = (argc > 1) ? argv[1] : "fr_FR";
    int verbose = (argc > 2);
    int f = 0, c;

    if (!setlocale(LC_CTYPE, locale)) {
        printf("usage: %s [<locale>] [-v]\n", argv[0]);
        printf("\tlocale defaults to fr_FR\n");
        printf("\t-v prints all characters with mappings\n");
        return 127;
    }

    printf("Testing locale %s with charmap \"%s\":\n", locale, nl_langinfo(CODESET));

    for (c = 128; c < 256; c++) {
        char cbuf[MB_CUR_MAX];
        char buf[MB_CUR_MAX];
        int bug = 0;
        int l = tolower(c);
        int u = toupper(c);

        cbuf[wctomb(cbuf, c)] = '\0';
        if (l > 255) {
            buf[wctomb(buf, l)] = '\0';
            printf("%x: %s  isupper(%d): %d  tolower(%d): %d, %s (tolower bug)\n",
                   c, cbuf, c, isupper(c), c, l, buf);
            bug++;
        }
        if (u > 255) {
            buf[wctomb(buf, u)] = '\0';
            printf("%x: %s  islower(%d): %d  toupper(%d): %d, %s (toupper bug)\n",
                   c, cbuf, c, islower(c), c, u, buf);
            bug++;
        }
        if (verbose && !bug) {
            if (l != c) {
                buf[wctomb(buf, l)] = '\0';
                printf("%x: %s  isupper(%d): %d  tolower(%d): %d, %s\n",
                       c, cbuf, c, isupper(c), c, l, buf);
            }
            if (u != c) {
                buf[wctomb(buf, u)] = '\0';
                printf("%x: %s  islower(%d): %d  toupper(%d): %d, %s\n",
                       c, cbuf, c, islower(c), c, u, buf);
            }
        }
        if (bug) {
            f += bug;
            bug = 0;
        }
    }

    if (f)
        printf("%d errors!\n", f);

    return f;
}

在 macOS 14 中，对于 3 个法语区域设置，输出如下（默认输出与相同，但原始代码中描述的“未指定”字符映射除外）fr_FR.UTF-8

% ./x fr_FR.UTF-8 -v                                     
Testing locale fr_FR.UTF-8 with charmap "UTF-8":
b5: µ  islower(181): 1  toupper(181): 924, Μ (toupper bug)
c0: À  isupper(192): 1  tolower(192): 224, à
c1: Á  isupper(193): 1  tolower(193): 225, á
c2: Â  isupper(194): 1  tolower(194): 226, â
c3: Ã  isupper(195): 1  tolower(195): 227, ã
c4: Ä  isupper(196): 1  tolower(196): 228, ä
c5: Å  isupper(197): 1  tolower(197): 229, å
c6: Æ  isupper(198): 1  tolower(198): 230, æ
c7: Ç  isupper(199): 1  tolower(199): 231, ç
c8: È  isupper(200): 1  tolower(200): 232, è
c9: É  isupper(201): 1  tolower(201): 233, é
ca: Ê  isupper(202): 1  tolower(202): 234, ê
cb: Ë  isupper(203): 1  tolower(203): 235, ë
cc: Ì  isupper(204): 1  tolower(204): 236, ì
cd: Í  isupper(205): 1  tolower(205): 237, í
ce: Î  isupper(206): 1  tolower(206): 238, î
cf: Ï  isupper(207): 1  tolower(207): 239, ï
d0: Ð  isupper(208): 1  tolower(208): 240, ð
d1: Ñ  isupper(209): 1  tolower(209): 241, ñ
d2: Ò  isupper(210): 1  tolower(210): 242, ò
d3: Ó  isupper(211): 1  tolower(211): 243, ó
d4: Ô  isupper(212): 1  tolower(212): 244, ô
d5: Õ  isupper(213): 1  tolower(213): 245, õ
d6: Ö  isupper(214): 1  tolower(214): 246, ö
d8: Ø  isupper(216): 1  tolower(216): 248, ø
d9: Ù  isupper(217): 1  tolower(217): 249, ù
da: Ú  isupper(218): 1  tolower(218): 250, ú
db: Û  isupper(219): 1  tolower(219): 251, û
dc: Ü  isupper(220): 1  tolower(220): 252, ü
dd: Ý  isupper(221): 1  tolower(221): 253, ý
de: Þ  isupper(222): 1  tolower(222): 254, þ
e0: à  islower(224): 1  toupper(224): 192, À
e1: á  islower(225): 1  toupper(225): 193, Á
e2: â  islower(226): 1  toupper(226): 194, Â
e3: ã  islower(227): 1  toupper(227): 195, Ã
e4: ä  islower(228): 1  toupper(228): 196, Ä
e5: å  islower(229): 1  toupper(229): 197, Å
e6: æ  islower(230): 1  toupper(230): 198, Æ
e7: ç  islower(231): 1  toupper(231): 199, Ç
e8: è  islower(232): 1  toupper(232): 200, È
e9: é  islower(233): 1  toupper(233): 201, É
ea: ê  islower(234): 1  toupper(234): 202, Ê
eb: ë  islower(235): 1  toupper(235): 203, Ë
ec: ì  islower(236): 1  toupper(236): 204, Ì
ed: í  islower(237): 1  toupper(237): 205, Í
ee: î  islower(238): 1  toupper(238): 206, Î
ef: ï  islower(239): 1  toupper(239): 207, Ï
f0: ð  islower(240): 1  toupper(240): 208, Ð
f1: ñ  islower(241): 1  toupper(241): 209, Ñ
f2: ò  islower(242): 1  toupper(242): 210, Ò
f3: ó  islower(243): 1  toupper(243): 211, Ó
f4: ô  islower(244): 1  toupper(244): 212, Ô
f5: õ  islower(245): 1  toupper(245): 213, Õ
f6: ö  islower(246): 1  toupper(246): 214, Ö
f8: ø  islower(248): 1  toupper(248): 216, Ø
f9: ù  islower(249): 1  toupper(249): 217, Ù
fa: ú  islower(250): 1  toupper(250): 218, Ú
fb: û  islower(251): 1  toupper(251): 219, Û
fc: ü  islower(252): 1  toupper(252): 220, Ü
fd: ý  islower(253): 1  toupper(253): 221, Ý
fe: þ  islower(254): 1  toupper(254): 222, Þ
ff: ÿ  islower(255): 1  toupper(255): 376, Ÿ (toupper bug)
2 errors!
% ./x fr_FR.ISO8859-1 -v | iconv -f LATIN1 -t UTF-8      
Testing locale fr_FR.ISO8859-1 with charmap "ISO8859-1":
c0: À  isupper(192): 1  tolower(192): 224, à
c1: Á  isupper(193): 1  tolower(193): 225, á
c2: Â  isupper(194): 1  tolower(194): 226, â
c3: Ã  isupper(195): 1  tolower(195): 227, ã
c4: Ä  isupper(196): 1  tolower(196): 228, ä
c5: Å  isupper(197): 1  tolower(197): 229, å
c6: Æ  isupper(198): 1  tolower(198): 230, æ
c7: Ç  isupper(199): 1  tolower(199): 231, ç
c8: È  isupper(200): 1  tolower(200): 232, è
c9: É  isupper(201): 1  tolower(201): 233, é
ca: Ê  isupper(202): 1  tolower(202): 234, ê
cb: Ë  isupper(203): 1  tolower(203): 235, ë
cc: Ì  isupper(204): 1  tolower(204): 236, ì
cd: Í  isupper(205): 1  tolower(205): 237, í
ce: Î  isupper(206): 1  tolower(206): 238, î
cf: Ï  isupper(207): 1  tolower(207): 239, ï
d0: Ð  isupper(208): 1  tolower(208): 240, ð
d1: Ñ  isupper(209): 1  tolower(209): 241, ñ
d2: Ò  isupper(210): 1  tolower(210): 242, ò
d3: Ó  isupper(211): 1  tolower(211): 243, ó
d4: Ô  isupper(212): 1  tolower(212): 244, ô
d5: Õ  isupper(213): 1  tolower(213): 245, õ
d6: Ö  isupper(214): 1  tolower(214): 246, ö
d8: Ø  isupper(216): 1  tolower(216): 248, ø
d9: Ù  isupper(217): 1  tolower(217): 249, ù
da: Ú  isupper(218): 1  tolower(218): 250, ú
db: Û  isupper(219): 1  tolower(219): 251, û
dc: Ü  isupper(220): 1  tolower(220): 252, ü
dd: Ý  isupper(221): 1  tolower(221): 253, ý
de: Þ  isupper(222): 1  tolower(222): 254, þ
e0: à  islower(224): 1  toupper(224): 192, À
e1: á  islower(225): 1  toupper(225): 193, Á
e2: â  islower(226): 1  toupper(226): 194, Â
e3: ã  islower(227): 1  toupper(227): 195, Ã
e4: ä  islower(228): 1  toupper(228): 196, Ä
e5: å  islower(229): 1  toupper(229): 197, Å
e6: æ  islower(230): 1  toupper(230): 198, Æ
e7: ç  islower(231): 1  toupper(231): 199, Ç
e8: è  islower(232): 1  toupper(232): 200, È
e9: é  islower(233): 1  toupper(233): 201, É
ea: ê  islower(234): 1  toupper(234): 202, Ê
eb: ë  islower(235): 1  toupper(235): 203, Ë
ec: ì  islower(236): 1  toupper(236): 204, Ì
ed: í  islower(237): 1  toupper(237): 205, Í
ee: î  islower(238): 1  toupper(238): 206, Î
ef: ï  islower(239): 1  toupper(239): 207, Ï
f0: ð  islower(240): 1  toupper(240): 208, Ð
f1: ñ  islower(241): 1  toupper(241): 209, Ñ
f2: ò  islower(242): 1  toupper(242): 210, Ò
f3: ó  islower(243): 1  toupper(243): 211, Ó
f4: ô  islower(244): 1  toupper(244): 212, Ô
f5: õ  islower(245): 1  toupper(245): 213, Õ
f6: ö  islower(246): 1  toupper(246): 214, Ö
f8: ø  islower(248): 1  toupper(248): 216, Ø
f9: ù  islower(249): 1  toupper(249): 217, Ù
fa: ú  islower(250): 1  toupper(250): 218, Ú
fb: û  islower(251): 1  toupper(251): 219, Û
fc: ü  islower(252): 1  toupper(252): 220, Ü
fd: ý  islower(253): 1  toupper(253): 221, Ý
fe: þ  islower(254): 1  toupper(254): 222, Þ
% ./x fr_FR.ISO8859-15 -v | iconv -f ISO8859-15 -t UTF-8 
Testing locale fr_FR.ISO8859-15 with charmap "ISO8859-15":
a6: Š  isupper(166): 1  tolower(166): 168, š
a8: š  islower(168): 1  toupper(168): 166, Š
b4: Ž  isupper(180): 1  tolower(180): 184, ž
b8: ž  islower(184): 1  toupper(184): 180, Ž
bc: Œ  isupper(188): 1  tolower(188): 189, œ
bd: œ  islower(189): 1  toupper(189): 188, Œ
be: Ÿ  isupper(190): 1  tolower(190): 255, ÿ
c0: À  isupper(192): 1  tolower(192): 224, à
c1: Á  isupper(193): 1  tolower(193): 225, á
c2: Â  isupper(194): 1  tolower(194): 226, â
c3: Ã  isupper(195): 1  tolower(195): 227, ã
c4: Ä  isupper(196): 1  tolower(196): 228, ä
c5: Å  isupper(197): 1  tolower(197): 229, å
c6: Æ  isupper(198): 1  tolower(198): 230, æ
c7: Ç  isupper(199): 1  tolower(199): 231, ç
c8: È  isupper(200): 1  tolower(200): 232, è
c9: É  isupper(201): 1  tolower(201): 233, é
ca: Ê  isupper(202): 1  tolower(202): 234, ê
cb: Ë  isupper(203): 1  tolower(203): 235, ë
cc: Ì  isupper(204): 1  tolower(204): 236, ì
cd: Í  isupper(205): 1  tolower(205): 237, í
ce: Î  isupper(206): 1  tolower(206): 238, î
cf: Ï  isupper(207): 1  tolower(207): 239, ï
d0: Ð  isupper(208): 1  tolower(208): 240, ð
d1: Ñ  isupper(209): 1  tolower(209): 241, ñ
d2: Ò  isupper(210): 1  tolower(210): 242, ò
d3: Ó  isupper(211): 1  tolower(211): 243, ó
d4: Ô  isupper(212): 1  tolower(212): 244, ô
d5: Õ  isupper(213): 1  tolower(213): 245, õ
d6: Ö  isupper(214): 1  tolower(214): 246, ö
d8: Ø  isupper(216): 1  tolower(216): 248, ø
d9: Ù  isupper(217): 1  tolower(217): 249, ù
da: Ú  isupper(218): 1  tolower(218): 250, ú
db: Û  isupper(219): 1  tolower(219): 251, û
dc: Ü  isupper(220): 1  tolower(220): 252, ü
dd: Ý  isupper(221): 1  tolower(221): 253, ý
de: Þ  isupper(222): 1  tolower(222): 254, þ
e0: à  islower(224): 1  toupper(224): 192, À
e1: á  islower(225): 1  toupper(225): 193, Á
e2: â  islower(226): 1  toupper(226): 194, Â
e3: ã  islower(227): 1  toupper(227): 195, Ã
e4: ä  islower(228): 1  toupper(228): 196, Ä
e5: å  islower(229): 1  toupper(229): 197, Å
e6: æ  islower(230): 1  toupper(230): 198, Æ
e7: ç  islower(231): 1  toupper(231): 199, Ç
e8: è  islower(232): 1  toupper(232): 200, È
e9: é  islower(233): 1  toupper(233): 201, É
ea: ê  islower(234): 1  toupper(234): 202, Ê
eb: ë  islower(235): 1  toupper(235): 203, Ë
ec: ì  islower(236): 1  toupper(236): 204, Ì
ed: í  islower(237): 1  toupper(237): 205, Í
ee: î  islower(238): 1  toupper(238): 206, Î
ef: ï  islower(239): 1  toupper(239): 207, Ï
f0: ð  islower(240): 1  toupper(240): 208, Ð
f1: ñ  islower(241): 1  toupper(241): 209, Ñ
f2: ò  islower(242): 1  toupper(242): 210, Ò
f3: ó  islower(243): 1  toupper(243): 211, Ó
f4: ô  islower(244): 1  toupper(244): 212, Ô
f5: õ  islower(245): 1  toupper(245): 213, Õ
f6: ö  islower(246): 1  toupper(246): 214, Ö
f8: ø  islower(248): 1  toupper(248): 216, Ø
f9: ù  islower(249): 1  toupper(249): 217, Ù
fa: ú  islower(250): 1  toupper(250): 218, Ú
fb: û  islower(251): 1  toupper(251): 219, Û
fc: ü  islower(252): 1  toupper(252): 220, Ü
fd: ý  islower(253): 1  toupper(253): 221, Ý
fe: þ  islower(254): 1  toupper(254): 222, Þ
ff: ÿ  islower(255): 1  toupper(255): 190, Ÿ

叹息。在不输入、输出或处理任何宽字符的程序中做什么？为什么将 254 与 þ 相关联，将 255 与 ÿ 相关联？这些关联仅保留在此程序未使用的编码中，例如 UCS-4。UCS-4 与您正在做的事情有什么关系，即 AFAICT 处理 UTF-8 字符串？为什么不检查一下 EBCDIC 或 ISO-8859-5 或您想到的任何其他编码中有哪些字符具有这些代码？wctomb

1赞 n. m. could be an AI 10/24/2023

好消息（但也是可悲的事情，取决于你如何看待它）是你将逃脱这个，因为（AFAICT）恰好没有一对有效的 UTF-8 字符串（当然除了 ASCII 字符串）与这个有缺陷的 tolower/toupper 实现相互映射。因此，如果正则表达式和目标字符串都是有效的 UTF-8，则尝试进行不区分大小写的正则表达式匹配的人不太可能受到影响。不过，这纯属巧合。当然，这个实现会愉快地将一个有效的 UTF-8 字符串映射到一个无效的字符串。

上一个：Ajax jquery $.get（）请求更改字符编码并将 žČć 等字符显示为符号或问号

下一个：如何量化字符编码导致的错误数量？

如何处理 toupper（）在具有 UTF-8 语言环境的最新 macOS 中返回的大于 255 的值

How to handle values greater than 255 as returned by toupper() in recent macOS with UTF-8 locales

评论

评论

评论

评论

如何处理 toupper（） 在具有 UTF-8 语言环境的最新 macOS 中返回的大于 255 的值

How to handle values greater than 255 as returned by toupper() in recent macOS with UTF-8 locales

评论

评论

评论

评论

如何处理 toupper（）在具有 UTF-8 语言环境的最新 macOS 中返回的大于 255 的值