提问人:Hellena Crainicu 提问时间:3/6/2023 更新时间:3/8/2023 访问量:141
Python:在特定文件夹上将多个/多个 .docx 文件从 ANSI 转换为 UTF-8
Python: Convert several/multiple .docx file from ANSI to UTF-8 on a particular folder
问:
我不是很好的程序员。但是我想制作一个py代码,可以从特定文件夹将多个/多个.docx文件从ANSI转换为UTF-8。
我将从这个开始。但我不知道如何从文件夹中选择文件。也许有人帮了我一点。
from unidecode import unidecode
python2_text = docx_paragraph.text
unicode_text = python2_text.decode("utf-8", "replace") if isinstance(python2_text , str) else python2_text
unidecode(unicode_text)
答:
1赞
Andreas
3/6/2023
#1
import os
import zipfile
import io
import chardet
# Set the folder path where the .docx files are located
folder_path = os.getcwd()
# Loop through all files in the folder
for filename in os.listdir(folder_path):
if filename.endswith(".docx"):
# Open the .docx file
file_path = os.path.join(folder_path, filename)
try:
with zipfile.ZipFile(file_path) as docx_file:
# Read the contents of the document.xml file
xml_content = docx_file.read('word/document.xml')
except Exception as e:
print(f"Error opening {file_path}: {e}")
continue
# Detect the current encoding of the file
detected_encoding = chardet.detect(xml_content)['encoding']
print(f"{file_path} is encoded in {detected_encoding}")
# If the detected encoding is not UTF-8, save the file in UTF-8 format
if detected_encoding != "utf-8":
new_filename = os.path.splitext(filename)[0] + "_utf8.docx"
new_file_path = os.path.join(folder_path, new_filename)
with zipfile.ZipFile(new_file_path, "w") as docx_file:
# Write the contents of the modified document.xml file
docx_file.writestr('word/document.xml', xml_content.decode(detected_encoding).encode('utf-8'))
print(f"Converted {file_path} from {detected_encoding} to UTF-8 and saved as {new_file_path}")
else:
print(f"{file_path} is already in UTF-8 format")
评论
0赞
Hellena Crainicu
3/6/2023
AttributeError: 'Settings' object has no attribute 'original_encoding'
查看打印屏幕:snipboard.io/iN3fx6.jpg
1赞
Andreas
3/7/2023
你用什么python?问题是针对 3.x,但也许您的意思是 python 2?
0赞
Just Me
3/7/2023
我有同样的错误。我使用 Python 版本 3.1.0 和 PyScripter 版本 4.2.1.0
1赞
Andreas
3/7/2023
你是对的。它是旧版本,它正在工作。那么你能给我一些文件进行测试吗?我现在只有utf-8。
1赞
Andreas
3/7/2023
这些文件显示为 UTF-8。
评论