提问人:Aaron 提问时间:11/2/2023 更新时间:11/2/2023 访问量:73
在“R”和 duckdb 中读取大于 5 GB 的 csv 文件时出错
Error reading csv files larger than 5 GB in 'R' and duckdb
问:
我将在 duckdb 中加载每个数据集超过 5Gb 的数据集。我需要一点帮助。我在 VS Code 编辑器中启动 R。几分钟后,r 停止并给出消息,重新打开窗口。我有一个空的example.wal文件。duckbd 数据库的大小为 12 kB。数据集的输出是带标题的 3 列。
谢谢你的帮助。
# Add libraries
library(duckdb)
library(dplyr)
library(DBI)
# write to disk as "Example", other defaults to in memory
con <- DBI::dbConnect(duckdb::duckdb(), "Example")
duckdb::duckdb_read_csv(
conn = con, name = "Example_csv", files = "data/more/Example-2022.csv",
header = TRUE, delim = ",", na.strings = "NA"
)
DBI::dbListTables(con)
当我使用的数据集少于数据时,我收到以下错误消息:
Error: rapi_execute: Failed to run query
Error: Invalid Input Error: Error in file "example.csv", on line 3: expected 1 values per row, but got more. ( file=example.csv
delimiter=','
quote='"'
escape='"' (default)
header=1
sample_size=20480
ignore_errors=0
all_varchar=0)
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 5 appears to contain embedded nulls
6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
7: Database is garbage-collected, use dbDisconnect(con, shutdown=TRUE) or duckdb::duckdb_shutdown(drv) to avoid this.
> Error: Invalid Input Error: Error in file "example.csv", on line 3: expected 1 values per row, but got more.
数据集中的一些行:
DateTime,Beta,Alpha
01/02/2022 22:03:13.151,0.83987,0.84129
01/02/2022 22:05:03.942,0.83959,0.84143
01/02/2022 22:05:09.121,0.83982,0.84124
01/02/2022 22:05:09.286,0.83978,0.8412
答: 暂无答案
上一个:使用 dbt cte 联接多个表
下一个:DuckDB 导出/导入表的子集
评论
dput(readr::read_lines_raw("example.csv", n_max = 5))
duckdb::duckdb_read_csv(conn = duck, name = "Example_csv", files = "newfile.csv", header = TRUE, delim = ",", na.strings = "NA")
read.csv("newfile.csv", nrows=10)
arrow
read.csv
nrows=10
duckdb_read_csv
arrow