提问人:archana 提问时间:11/17/2023 最后编辑:Cowarchana 更新时间:11/17/2023 访问量:61
为什么在 Python 中编写新文件时会出现错误?
Why is there a error in writing a new file in Python?
问:
我正在尝试抓取网页并将数据写入txt.file中。 它抛出了一个错误。
UnicodeEncodeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_10484/2570620311.py in <module>
5
6 with open(f'{title}.txt', 'w')as file:
----> 7 file.write(transcript)
8
9 #print(title)
~\anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode character '\ufb02' in position 32153: character maps to <undefined>
法典:
import requests
from bs4 import BeautifulSoup
website="https://subslikescript.com/movie/Titanic-120338"
result=requests.get(website)
content=result.text
soup=BeautifulSoup(content,'lxml')
print(soup.prettify())
box = soup.find('article',class_='main-article')
title=box.find('h1').get_text()
transcript=box.find('div',class_='full-script').get_text(strip=True,separator=' ')
with open(f'{title}.txt', 'w')as file:
file.write(transcript)
答:
-1赞
Sai Prakash Reddy
11/17/2023
#1
您遇到的错误 UnicodeEncodeError: 'charmap' codec can't encode character 表明写入文本文件时编码存在问题。这通常是由于尝试编写系统上的默认编码不支持的 Unicode 字符。
您可以尝试在写入时使用特定编码打开文件。UTF-8 是用于处理 Unicode 字符的常见编码。以下是修改代码的方法:
import requests
from bs4 import BeautifulSoup
website = "https://subslikescript.com/movie/Titanic-120338"
result = requests.get(website)
content = result.text
soup = BeautifulSoup(content, 'lxml')
print(soup.prettify())
box = soup.find('article', class_='main-article')
title = box.find('h1').get_text()
transcript = box.find('div', class_='full-script').get_text(strip=True,
separator=' ')
with open(f'{title}.txt', 'w', encoding='utf-8') as file:
file.write(transcript)
通过在打开文件时指定 encoding='utf-8',您应该能够处理 Unicode 字符而不会遇到编码错误。
我希望这对您有所帮助,祝您学习愉快。
评论
0赞
archana
11/20/2023
感谢您的宝贵反馈
评论
with open(YOUR_FILE, "w", encoding="utf-8") as file