提问人:Aaron 提问时间:11/4/2023 最后编辑:Aaron 更新时间:11/11/2023 访问量:139
将 csv 文件导入 duckdb 并在屏幕上显示输出
Import csv files into duckdb and show the output on the screen
问:
这个问题已经问过很多次了,但我一直无法找到满意的解决方案。我使用 Linux 和 R。该软件包显示一些警告消息。执行此脚本时,我收到以下警告消息:dylr
source("/home/code-server/Workspace/test/new.R", encoding = "UTF-8")
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
warning messages from top-level task callback 'vsc.workspace'
Warning message:
Database is garbage-collected, use dbDisconnect(con, shutdown=TRUE) or duckdb::duckdb_shutdown(drv) to avoid this.
源代码:
# Add libraries
library(dplyr)
library(DBI)
library(duckdb)
path <- "data/"
setwd(path)
# write to disk as "Example", other defaults to in memory
con <- DBI::dbConnect(duckdb::duckdb(), "test-2022")
duckdb::duckdb_read_csv(
conn = con, name = "Example_csv", files = "1a.csv",
header = TRUE, delim = ","
)
这里有几行代码来显示表格。不幸的是,这些代码行不起作用。
con <- dbConnect(duckdb::duckdb(), dbdir = "test-2022", read_only = FALSE)
dbListTables(con)
dbDisconnect(con, shutdown = TRUE)
dbWriteTable(con, "test", test)
res = dbGetQuery(con, "SELECT * FROM "test-2022")
print(res)
答:
首先,我假设您的“不起作用”是由于以下错误:
res = dbGetQuery(con, "SELECT * FROM "test-2022")
# Error: unexpected symbol in "res = dbGetQuery(con, "SELECT * FROM "test"
(如果没有,您将需要提供您看到的实际错误文本。
双引号字符串中不能有未转义的双引号,这将是一个解析错误。解决方案通常是对一个(外/内)使用单位,对另一个使用双倍,或者反斜杠转义内引号(当内引号与外引号相同时)。
虽然 R 倾向于对字符串互换使用单引号和双引号,但 SQL 则不然,它需要将带引号的标识符(例如,表/列名)括在双引号中,因此您可以反斜杠转义它们,或者只使用单引号作为外引号。
尝试以下其中一种(或两种?
# double inside single
res <- dbGetQuery(con, 'SELECT * FROM "test-2022" ')
# escaped-double inside double
res <- dbGetQuery(con, "SELECT * FROM \"test-2022\" ")
# R's new-ish raw-strings
res <- dbGetQuery(con, r"(SELECT * FROM "test-2022" )")
(和结束单/双引号之间的尾随空格纯粹是为了在此界面中清楚地分隔它们,不是必需的。它分别与 、 和 一样容易地工作。22"
22"'
22\""
22")"
注意:如果你以前没有听说过 R 的“原始字符串”,这里有一个参考:https://r4ds.hadley.nz/strings.html#sec-raw-strings,以及 R 新闻中“R 4.0.0 中的更改>>用户可见的重大更改”下的原始公告:
有一种新的语法用于指定原始字符常量,类似于 C++ 中使用的语法:任何字符序列都不包含该序列。这样可以更轻松地编写包含反斜杠或单引号和双引号的字符串。有关详细信息,请参见。
r"(...)"
...
)"
?Quotes
果然,在?报价
,我们可以看到
Raw character constants are also available using a syntax similar
to the one used in C++: ‘r"(...)"’ with ‘...’ any character
sequence, except that it must not contain the closing sequence
‘)"’. The delimiter pairs ‘[]’ and ‘{}’ can also be used, and ‘R’
can be used in place of ‘r’. For additional flexibility, a number
of dashes can be placed between the opening quote and the opening
delimiter, as long as the same number of dashes appear between the
closing delimiter and the closing quote.
评论
dbGetQuery
dbWriteTable(con, "test", yourdataframe)
关于您的问题:
- 警告
Warning message: Database is garbage-collected, use dbDisconnect(con, shutdown=TRUE) or duckdb::duckdb_shutdown(drv) to avoid this.
发生这种情况是因为您源的文件没有断开与 的连接。如警告所示,您在该文件的末尾添加。source("/home/code-server/Workspace/test/new.R", encoding = "UTF-8")
con
dbDisconnect(con, shutdown = TRUE)
由于您不这样做,因此稍后会对连接进行垃圾回收,并生成此警告。您可以通过在新会话中运行以下代码来重现此内容:
con <- dbConnect(duckdb::duckdb())
# call garbage collector
gc()
您可以在此处找到有关垃圾回收的更多信息 (https://adv-r.hadley.nz/names-values.html#gc)
- 关于不起作用的代码
con <- dbConnect(duckdb::duckdb(), dbdir = "test-2022", read_only = FALSE)
dbListTables(con)
dbDisconnect(con, shutdown = TRUE)
dbWriteTable(con, "test", test)
res = dbGetQuery(con, "SELECT * FROM "test-2022")
print(res)
这里至少有错误:
关闭连接(使用 ),但随后尝试将表写入同一连接(使用 )。这将不起作用,因为连接已断开。
con
dbDisconnect
test
dbWriteTable
如上所述,由于您使用嵌套的双引号,因此未正确构造查询。此外,当您在上面设置数据库文件时,您正在尝试从具有数据库文件名称的表中导入数据,而不是从中导入数据
dbGetQuery
dbConnect
test
至少,我认为您正在这样做,因为您没有分享有关错误消息的那么多细节。
这是我认为代码应该是什么样子的:
con <- dbConnect(duckdb::duckdb(), dbdir = "test-2022", read_only = FALSE)
dbListTables(con)
dbWriteTable(con, "test", test)
res = dbGetQuery(con, "SELECT * FROM test") # or any other table you want results from
print(res)
dbDisconnect(con, shutdown = TRUE)
我希望这对:)有所帮助
评论
library(dplyr, exclude=c('filter', 'lag', 'intersect', 'setdiff', 'setequal', 'union'))
R
library
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"duckdb_connection", "character", "character"’
test
dbWriteTable(con, "test", test)
con
"duckdb_connection"
"test"
"character"
test
"character"
"data.frame"