有没有办法使用 python 将 ANSI（仅限 Windows）编码文件转换为 UTF-8？-解网

问：

我在这里提出一个新问题的原因是，我能找到的所有答案似乎都使用了在 Windows 上运行的代码。
情况是这样的......
我每个月都会收到需要从 ANSI 编码转换为 UTF-8 的新工作文件。我有足够的文件来满足自动化的需要，所以我求助于 python 脚本。直到最近，我还在 Windows 上，一切正常。切换到 Mac 后，我意识到 ANSI 是仅限 Windows 的编码类型，现在我的脚本不再有效。
问：有没有办法在使用 Mac 时将 ANSI 编码的 CSV 转换为 UTF-8 编码？

这是在我的 Windows 机器上工作的代码。

import sys
import os

if len(sys.argv) != 2:
  print(f"Converts the contents of a folder to UTF-8 from ASCI.")
  print(f"USAGE: \n\
    python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
    If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
  sys.exit()

from_encoding = "ANSI"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"


try:
  os.mkdir(target_folder)
except FileExistsError:
  print("Target folder already exists.")
except:
  print("Error making directory!")

for root, dirs, files in os.walk(folder):
    for file in files:
        list_of_files.append(os.path.join(root,file))


for file in list_of_files:
  print(f"Converting {file}")

  original_path = file

  filename = file.split("\\")[-1].split(".")[0]
  extension = file.split("\\")[-1].split(".")[1]
  folder = "\\".join(original_path.split("\\")[0:-1])
  new_filename = filename + "." + extension
  new_path = os.path.join(target_folder, new_filename)

  f= open(original_path, 'r', encoding=from_encoding)
  content= f.read()
  f.close()
  f= open(new_path, 'w', encoding=to_encoding)
  f.write(content)
  f.close()

print(f"Finished converting {len(list_of_files)} files to {target_folder}")

似乎无论我采取什么方法，我的 Mac 都无法识别 ANSI 编码类型。任何帮助将不胜感激。谢谢。

编辑 1：参考从 ANSI 转换为 UTF-8
这个问题有两个答案，对我都不起作用。答案一，我收到 utf8 错误。

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 25101: invalid continuation byte

答案二，我相信根本原因是因为我在 Mac 上，而这个操作系统不理解 mbcs 编码。

LookupError: unknown encoding: mbcs

python macOS 编码 UTF-8 ANSI

我找到了这个问题的答案。
将 ANSI 编解码器更改为 cp1252 后，我的 Mac 可以看到我正在寻找的编解码器。这样就解决了问题。之后我遇到的另一个问题是 Mac 的文件路径有点不同，使用正斜杠而不是反斜杠。
对这个脚本进行了进一步的修改，我想出了一个工作版本。

import sys
import os

if len(sys.argv) != 2:
  print(f"Converts the contents of a folder to UTF-8 from ASCI.")
  print(f"USAGE: \n\
    python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
    If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
  sys.exit()

from_encoding = "cp1252"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"


try:
  os.mkdir(target_folder)
except FileExistsError:
  print("Target folder already exists.")
except:
  print("Error making directory!")

for root, dirs, files in os.walk(folder):
    for file in files:
        list_of_files.append(os.path.join(root,file))


for file in list_of_files:
  print(f"Converting {file}")

  original_path = file

  filename = file.split("/")[-1].split(".")[0]
  extension = file.split("/")[-1].split(".")[1]
  folder = "/".join(original_path.split("/")[0:-1])
  new_filename = filename + "." + extension
  new_path = os.path.join(target_folder, new_filename)

  f= open(original_path, 'r', encoding=from_encoding)
  content= f.read()
  f.close()
  f= open(new_path, 'w', encoding=to_encoding)
  f.write(content)
  f.close()

print(f"Finished converting {len(list_of_files)} files to {target_folder}")

只有很小的变化，但这个版本允许 Mac 理解编码并正确路由。
再次感谢所有帮助过的人！

上一个：Python utf-8 转换为 cp1252

下一个：在 R 中读取超级终端文件（ht 文件）

有没有办法使用 python 将 ANSI（仅限 Windows）编码文件转换为 UTF-8？

Is there a way to convert ANSI (Windows only) encoded files to UTF-8 using python?

评论