将 UTF-8 文本中的非 UTF-8 ASCII 文本转换为各自的字符-解网

问：

我有一个 UTF8 编码的文本，该文本已被破坏并包含一些“cp1252”ASCII 文本。我正在尝试隔离文字并逐个转换它们，但是以下代码不起作用，我不明白为什么......

text = "This text contains some ASCII literal codes like \x9a and \x9e."

# Find all ASCII literal codes in the text
codes = re.findall(r'\\x[0-9a-fA-F]{2}', text)

# Replace each ASCII literal code with its decoded character
for code in codes:
    char = bytes(code, 'ascii').decode('cp1252')
    text = text.replace(code, char)

print(text)

python-3.x utf-8 cp1252

将 UTF-8 文本中的非 UTF-8 ASCII 文本转换为各自的字符

Convert non UTF-8 ASCII literals in otherwise UTF-8 text to their respective character

评论