使用 CT2CA 将 CString 转换为 UTF-8 会损坏某些字符

Converting CString to UTF-8 with CT2CA is corrupting some characters

提问人:Andrew Truckle 提问时间:6/7/2023 最后编辑:Andrew Truckle 更新时间:6/7/2023 访问量:100

问:

我正在尝试将此值写入 UTF-8 XML 编码的文本文件:

Estudante - Leitura da Bíblia

我正在使用 lambda 函数:

auto AddLabel = [](tinyxml2::XMLDocument& rDoc, tinyxml2::XMLElement* pLabels, LPCSTR szElement, CString strValue, int iIndex = -1)
{
    tinyxml2::XMLElement* pLabel = rDoc.NewElement(szElement);
    if (pLabels != nullptr && pLabel != nullptr)
    {
        pLabel->SetText(CT2CA(strValue));
        if (iIndex != -1)
        {
            pLabel->SetAttribute("Index", iIndex);
        }
        pLabels->InsertEndChild(pLabel);
    }
};

但是当我在Notepad ++中打开XML时,我最终得到了一个损坏的字符:

enter image description here

这是我保存XML文件的方式:

bool CMeetingScheduleAssistantApp::SaveToXML(CString strFileXML, tinyxml2::XMLDocument& rDocXML)
{
    FILE    *fStream = nullptr;
    CString strError, strErrorCode;
    bool    bDisplayError = false;
    int     iErrorNo = -1;

    using namespace tinyxml2;

    // Does the file already exist?
    if (PathFileExists(strFileXML))
    {
        // It does, so try to delete it
        if (!::DeleteFile(strFileXML))
        {
            // Unable to delete!
            AfxMessageBox(theApp.GetLastErrorAsString(), MB_OK | MB_ICONINFORMATION);
            return false;
        }
    }

    // Now try to create a FILE buffer (allows UNICODE filenames)
    const auto eResult = _tfopen_s(&fStream, strFileXML, _T("w"));
    if (eResult != 0 || fStream == nullptr) // Error
    {
        bDisplayError = true;
        _tcserror_s(strErrorCode.GetBufferSetLength(_MAX_PATH), _MAX_PATH, errno);
        strErrorCode.ReleaseBuffer();
    }
    else // Success
    {
        // Now try to save the XML file
        const XMLError eXML = rDocXML.SaveFile(fStream);
        const int fileCloseResult = fclose(fStream);
        if (eXML != XMLError::XML_SUCCESS)
        {
            // Error saving
            bDisplayError = true;
            strErrorCode = rDocXML.ErrorName();
            iErrorNo = rDocXML.ErrorLineNum();
        }

        if (!bDisplayError)
        {
            if (fileCloseResult != 0)
            {
                // There was a problem closing the stream. We should tell the user
                bDisplayError = true;
                _tcserror_s(strErrorCode.GetBufferSetLength(_MAX_PATH), _MAX_PATH, errno);
                strErrorCode.ReleaseBuffer();
            }
        }
    }

    if (bDisplayError)
    {
        if (iErrorNo == -1)
            iErrorNo = errno;

        strError.Format(IDS_TPL_ERROR_SAVE_XML, strFileXML, strErrorCode, iErrorNo);
        AfxMessageBox(strError, MB_OK | MB_ICONINFORMATION);

        return false;
    }

    return true;
}

为什么会这样?我可以确认具有正确的内容作为.我可以确认这一点:strValueCString

if (strValue.Left(7) == L"Leitura")
{
    AfxMessageBox(strValue);
    pLabel->SetText(CT2CA(strValue));
    AfxMessageBox(CA2CT(pLabel->GetText(), CP_UTF8));
}

当我进行调用时,结果具有损坏的特征。那么我怎样才能解决这个问题呢? 需要 UTS-8 连接字符串:CA2CTítinyxml2

void    XMLElement::SetText( const char* inText )
{
    if ( FirstChild() && FirstChild()->ToText() )
        FirstChild()->SetValue( inText );
    else {
        XMLText*    theText = GetDocument()->NewText( inText );
        InsertFirstChild( theText );
    }
}

根据语言的不同,我对字符串中的各种字符也有同样的问题。另一种受影响的语言是波兰语。

如何在不将其损坏为 UTF8 的情况下编写字符串?

visual-c++ utf-8 visual-studio-2022 tinyxml2 atlcom

评论


答:

1赞 Andrew Truckle 6/7/2023 #1

错误很简单:

auto AddLabel = [](tinyxml2::XMLDocument& rDoc, tinyxml2::XMLElement* pLabels, LPCSTR szElement, CString strValue, int iIndex = -1)
{
    tinyxml2::XMLElement* pLabel = rDoc.NewElement(szElement);
    if (pLabels != nullptr && pLabel != nullptr)
    {
        pLabel->SetText(CT2CA(strValue, CP_UTF8));
        if (iIndex != -1)
        {
            pLabel->SetAttribute("Index", iIndex);
        }
        pLabels->InsertEndChild(pLabel);
    }
};

我缺少宏的参数。CP_UTF8CT2CA