提问人:forestbat 提问时间:9/17/2023 最后编辑:forestbat 更新时间:9/17/2023 访问量:46
如何解决kaggle下载数据集时的“MemoryError”?
How to solve "MemoryError" when download dataset by kaggle?
问:
我想从 kaggle 下载数据集,但是当我在本地机器上运行它时,它崩溃了,这是我的代码:
api = kaggle.KaggleApi(json_str)
api.authenticate()
api.datasets_download(owner_slug='headwater', dataset_slug='Camels')
这是崩溃报告:
test_dload_archive.py:8:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\venv\lib\site-packages\kaggle\api\kaggle_api.py:1494: in datasets_download
(data) = self.datasets_download_with_http_info(owner_slug, dataset_slug, **kwargs) # noqa: E501
..\venv\lib\site-packages\kaggle\api\kaggle_api.py:1563: in datasets_download_with_http_info
return self.api_client.call_api(
..\venv\lib\site-packages\kaggle\api_client.py:329: in call_api
return self.__call_api(resource_path, method,
..\venv\lib\site-packages\kaggle\api_client.py:161: in __call_api
response_data = self.request(
..\venv\lib\site-packages\kaggle\api_client.py:351: in request
return self.rest_client.GET(url,
..\venv\lib\site-packages\kaggle\rest.py:247: in GET
return self.request("GET", url,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <kaggle.rest.RESTClientObject object at 0x000001B1FAE01D80>
method = 'GET'
url = 'https://www.kaggle.com/api/v1/datasets/download/headwater/Camels'
query_params = []
headers = {'Accept': 'file', 'User-Agent': 'Swagger-Codegen/1/python'}
body = None, post_params = {}, _preload_content = True, _request_timeout = None
……
if six.PY3:
> r.data = r.data.decode('utf8')
E MemoryError
..\venv\lib\site-packages\kaggle\rest.py:235: MemoryError
我认为这是因为解压缩大文件的内存成本,但是如何解决呢?
更新: 当我在 linux 中时,crash 看起来像这样:
if six.PY3:
> r.data = r.data.decode('utf8')
E UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 14: invalid continuation byte
答:
1赞
Codist
9/17/2023
#1
请注意 rest.py 中的这一行:
r.data = r.data.decode('utf8')
这是非常幼稚的,对于这个特定的数据集来说,这是完全错误的。
您可以使用 cp037 解码此数据集,但为此,您需要适当地编辑 rest.py
评论