使用 dplyr 访问 sql 表/查询时计算行数-解网

问：

计算使用 dplyr 访问 sql 表的行数的有效方法是什么。MWE在下面使用SQLite，但我使用PostgreSQL并遇到同样的问题。基本上 dim（）不是很一致。我用过

dim()

这适用于数据库中的模式（第一种情况），但当我从同一模式的 SQL 查询创建 tbl 时（第二种情况）不是很一致。我的行数是数百万，或者即使只有 1000 行，我也看到了这一点。我得了 NA 或 ??.有什么遗漏吗？

#MWE
test_db <- src_sqlite("test_db.sqlite3", create = T)
library(nycflights13)
flights_sqlite <- copy_to(test_db, flights, temporary = FALSE, indexes = list(
c("year", "month", "day"), "carrier", "tailnum"))

flights_postgres <- tbl(test_db, "flights")

第一种情况（来自直接架构的表）

flights_postgres

 > flights_postgres
 Source: postgres 9.3.5 []
 From: flights [336,776 x 16]

   year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight    origin dest air_time distance hour minute
  1  2013     1   1      517         2      830        11      UA  N14228   1545    EWR  IAH      227     1400    5     17
  2  2013     1   1      533         4      850        20      UA  N24211   1714    LGA  IAH      227     1416    5     33

#using dim()
> dim(flights_postgres)
[1] 336776     16

以上工作并获取行数的计数。第二种情况（来自 SQL 查询的表）

 ## use the flights schema above but can also be used to create other variables (like lag, lead)   in run time
 flight_postgres_2 <- tbl(test_db, sql("SELECT * FROM flights"))

  >flight_postgres_2
 Source: postgres 9.3.5 []
 From: <derived table> [?? x 16]

  year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight     origin dest air_time distance hour minute
   1  2013     1   1      517         2      830        11      UA  N14228   1545    EWR  IAH      227     1400    5     17
   2  2013     1   1      533         4      850        20      UA  N24211   1714    LGA  IAH      227     1416    5     33

> 
> dim(flight_postgres_2)
[1] NA 16

正如你所看到的，它要么打印为？？或 NA。所以不是很有帮助。

我通过使用 collect（）或然后使用 as.data.frame（）将输出转换为数据帧来检查维度来解决这个问题。但是，这两种方法可能不是理想的解决方案，因为大量行可能需要时间。

mysql r postgresql sqlite dplyr

> flight_postgres_2 %>% summarize(n()) %>% explain()
<SQL>
SELECT "n()"
FROM (SELECT COUNT() AS "n()"
FROM (SELECT * FROM flights) AS "zzz11") AS "zzz13"


<PLAN>
  selectid order from                                                         detail
1        0     0    0 SCAN TABLE flights USING COVERING INDEX flights_year_month_day

上一个：当存在许多列时，使用 readr：：read_csv（）导入数据时覆盖列类型

下一个：R dplyr：：使用字符串变量重命名和选择

使用 dplyr 访问 sql 表/查询时计算行数

Count number of rows when using dplyr to access sql table/query

评论