为什么通过熊猫导入数据时会遇到这样的错误?

Why do I face such error when I import data throught pandas?

提问人:SH_IQ 提问时间:10/27/2023 更新时间:10/27/2023 访问量:56

问:

作为初学者,我正在尝试在 jupyter notebook 上使用 pandas 导入名称为 (Facebook_Ads_2.csv) 的数据;输出必须如下所示。

enter image description here

但是当我使用以下 Python 代码行导入它们时:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

T = pd.read_csv('Facebook_Ads_2.csv')

我收到以下错误:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In[4], line 1
----> 1 T = pd.read_csv('Facebook_Ads_2.csv')

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    899 kwds_defaults = _refine_defaults_read(
    900     dialect,
    901     delimiter,
   (...)
    908     dtype_backend=dtype_backend,
    909 )
    910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py:577, in _read(filepath_or_buffer, kwds)
    574 _validate_names(kwds.get("names", None))
    576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
    579 if chunksize or iterator:
    580     return parser

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
   1404     self.options["has_index_names"] = kwds["has_index_names"]
   1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py:1679, in TextFileReader._make_engine(self, f, engine)
   1676     raise ValueError(msg)
   1678 try:
-> 1679     return mapping[engine](f, **self.options)
   1680 except Exception:
   1681     if self.handles is not None:

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:93, in CParserWrapper.__init__(self, src, **kwds)
     90 if kwds["dtype_backend"] == "pyarrow":
     91     # Fail here loudly instead of in cython after reading
     92     import_optional_dependency("pyarrow")
---> 93 self._reader = parsers.TextReader(src, **kwds)
     95 self.unnamed_cols = self._reader.unnamed_cols
     97 # error: Cannot determine type of 'names'

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\parsers.pyx:548, in pandas._libs.parsers.TextReader.__cinit__()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\parsers.pyx:637, in pandas._libs.parsers.TextReader._get_header()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\parsers.pyx:848, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\parsers.pyx:859, in pandas._libs.parsers.TextReader._check_tokenize_status()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\_libs\parsers.pyx:2017, in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 4001: invalid continuation byte

请问有什么帮助吗?

python pandas 读取-csv

评论

1赞 kabanus 10/27/2023
这回答了你的问题吗?UnicodeDecodeError,无效的延续字节
1赞 Tim Roberts 10/27/2023
pandas 假定您的文件使用 UTF-8 编码。您的文件不是 UTF-8。您可以覆盖通话中的编码。read_csv
1赞 John Gordon 10/27/2023
文件究竟是如何创建的?Facebook_Ads_2.csv
0赞 SH_IQ 10/27/2023
@JohnGordon我正在学习一门课程;这个文件是按课程进行的,他没有为我们解决。我正在尝试自己解决它。
1赞 Tim Roberts 10/27/2023
由于这是在 Windows 上,您可以尝试 .pd.read_csv('Facebook_Ads_2.csv', encoding='cp1252')

答:

0赞 Niqua 10/27/2023 #1

我认为这可能是由于其中一行中出现意外的符号而发生的。

从 pandas 版本 1.3.0 开始,有一个处理程序来处理这些类型的错误。

你可以尝试调用参数:encoding_errorsread_csv()


T = pd.read_csv('Facebook_Ads_2.csv', encoding_errors='ignore')

看看它返回了什么,或者你可以尝试其他类型的处理编码错误

如果您的 pandas 版本低于 1.3.0,请告诉我,我们可以尝试为您的问题提出不同的解决方案