如何在现有的 polars 数据帧上重新推断数据类型？-解网

问：

我有以下问题：

我有一个csv文件，在某些行中具有错误的值（字符串而不是整数）。为了解决这个问题，我将其读入极坐标并过滤数据帧。

为了能够读取它，我必须设置，否则读取将失败。不过，这会将每一列读取为字符串。如何重新推断更正后的数据帧的数据类型/架构？我想尽量避免单独设置每一列，因为有很多。infer_schema_length = 0

不幸的是，我无法编辑 csv 本身。

ids_df = pl.read_csv(dataset_path, infer_schema_length=0)

filtered_df = ids_df.filter(~(pl.col("Label") == "Label"))

filtered_df.dtypes

[Utf8,
 Utf8,
 Utf8,
 Utf8,
 Utf8,
 Utf8,
 Utf8,
 Utf8,
 Utf8,
 Utf8,
 ...

感谢您的帮助。

python csv 架构 python-polars

from io import BytesIO
import polars as pl
dataset_path = "./test_data.csv"
ids_df = pl.read_csv(dataset_path, infer_schema_length=0)
print("ids_df",ids_df)

filtered_df = ids_df.filter(~(pl.col("Label") == "Label"))
print("filtered_df", filtered_df)

# Save data to memory as a IO stream
bytes_io = BytesIO()
filtered_df.write_csv(bytes_io)

# Read from IO stream with infer_schema_lenth != 0
new_df = pl.read_csv(bytes_io)
print("new_df", new_df)
bytes_io.close()

如何在现有的 polars 数据帧上重新推断数据类型？

How to re-infer datatypes on existing polars dataframe?

评论

评论