在 ':=' 中为 'glue()' 提供 data.table 的环境

Provide data.table's environment for `glue()` in `:=`

提问人:Vasily A 提问时间:11/12/2023 更新时间:11/15/2023 访问量:121

问:

我试图弄清楚是否有使用 data.table 的好方法:glue()j

library(data.table)
library(glue)
data(iris)
dt.iris <- data.table(iris)


dt.iris[, myText := glue('The species is {Species} with sepal length of {Sepal.Length}')] 
# Error in eval(parse(text = text, keep.source = FALSE), envir) : 
#   object 'Species' not found

如果我指出,我可以使用它:.envir = .SD

dt.iris[, myText := glue('The species is {Species} with sepal length of {Sepal.Length}', .envir = .SD)]
# works OK

但我想知道我是否能找到一些方法,而无需每次都添加这个。也许是这样:

glue1 <- function(...) glue(..., .envir = ???) 
data.table r-glue

评论

1赞 r2evans 11/12/2023
我想我理解你为什么要试图减少,所以这是推断,但我认为这可能不值得:(1)列确定是内部的一种有意行为,绕过它并不总是像人们想象的那样简单;(2)由于您已经在加载,因此没有节省包装“膨胀”(不是说它很膨胀);(3)如果/当你想出一种目前能够欺骗出一种方法来获得的方法时,如果它最终被破坏,那么错误可能并不完全清楚,这使得调试变得更加困难。gluedtenvir=.SD.SDdata.tableglue.SD
1赞 r2evans 11/12/2023
由此,我建议最声明性/透明性、最直接、也许更面向未来的方法(如@HieuNguyen的回答)是最好的方法。我认为(五个字符)额外的打字真的没有那么繁重。glue::glue_data(.SD, "...").SD,

答:

1赞 Allan Cameron 11/12/2023 #1

你可以做

gluedt <- function(...) glue::glue(..., .envir = parent.frame(3)$x)

测试,我们有:

library(data.table)

data(iris)
dt.iris <- data.table(iris)
       
dt.iris[, myText := gluedt('The species is {Species} with sepal length of {Sepal.Length}')]

dt.iris
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#>   1:          5.1         3.5          1.4         0.2    setosa
#>   2:          4.9         3.0          1.4         0.2    setosa
#>   3:          4.7         3.2          1.3         0.2    setosa
#>   4:          4.6         3.1          1.5         0.2    setosa
#>   5:          5.0         3.6          1.4         0.2    setosa
#>  ---                                                            
#> 146:          6.7         3.0          5.2         2.3 virginica
#> 147:          6.3         2.5          5.0         1.9 virginica
#> 148:          6.5         3.0          5.2         2.0 virginica
#> 149:          6.2         3.4          5.4         2.3 virginica
#> 150:          5.9         3.0          5.1         1.8 virginica
#>                                                 myText
#>   1:    The species is setosa with sepal length of 5.1
#>   2:    The species is setosa with sepal length of 4.9
#>   3:    The species is setosa with sepal length of 4.7
#>   4:    The species is setosa with sepal length of 4.6
#>   5:      The species is setosa with sepal length of 5
#>  ---                                                  
#> 146: The species is virginica with sepal length of 6.7
#> 147: The species is virginica with sepal length of 6.3
#> 148: The species is virginica with sepal length of 6.5
#> 149: The species is virginica with sepal length of 6.2
#> 150: The species is virginica with sepal length of 5.9

创建于 2023-11-11 使用 reprex v2.0.2

评论

0赞 Vasily A 11/12/2023
不幸的是,如果我有任何部分,这不起作用:idt.iris[Sepal.Width>4, myText := gluedt('The species is {Species} with sepal length of {Sepal.Length}')]
3赞 jay.sf 11/12/2023 #2

为什么不简单地使用 ,sprintf

> library(data.table)
> dt.iris[, myText := sprintf('The species is %s with sepal length of %.2g', 
+                               Species, Sepal.Length)]

或者,虽然速度要慢得多。paste

> dt.iris[, myText := paste('The species is', Species, 'with sepal length of', Sepal.Length)] 
> dt.iris
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.1         3.5          1.4         0.2    setosa
  2:          4.9         3.0          1.4         0.2    setosa
  3:          4.7         3.2          1.3         0.2    setosa
  4:          4.6         3.1          1.5         0.2    setosa
  5:          5.0         3.6          1.4         0.2    setosa
 ---                                                            
146:          6.7         3.0          5.2         2.3 virginica
147:          6.3         2.5          5.0         1.9 virginica
148:          6.5         3.0          5.2         2.0 virginica
149:          6.2         3.4          5.4         2.3 virginica
150:          5.9         3.0          5.1         1.8 virginica
                                                myText
  1:    The species is setosa with sepal length of 5.1
  2:    The species is setosa with sepal length of 4.9
  3:    The species is setosa with sepal length of 4.7
  4:    The species is setosa with sepal length of 4.6
  5:      The species is setosa with sepal length of 5
 ---                                                  
146: The species is virginica with sepal length of 6.7
147: The species is virginica with sepal length of 6.3
148: The species is virginica with sepal length of 6.5
149: The species is virginica with sepal length of 6.2
150: The species is virginica with sepal length of 5.9

基准

library(data.table)
dt.iris <- as.data.table(iris)
dt.iris.l <- dt.iris[sample.int(nrow(dt.iris), 1e6, replace=TRUE), ]
gluedt <- function(...) glue::glue(..., .envir = parent.frame(3)$x)
microbenchmark::microbenchmark(
  sprintf=dt.iris.l[, myText := sprintf('The species is %s with sepal length of %.2g', 
                              Species, Sepal.Length)],
  paste=dt.iris.l[, myText := paste('The species is', Species, 'with sepal length of', Sepal.Length)] ,
  gluedt=dt.iris.l[, myText := gluedt('The species is {Species} with sepal length of {Sepal.Length}')],
  times=3L,
  check='identical'
)

$ Rscript --vanilla foo.R
Unit: milliseconds
    expr      min        lq      mean    median        uq       max neval cld
 sprintf  748.210  755.7418  758.8391  763.2735  764.1537  765.0338     3 a  
   paste 1545.685 1547.1562 1549.3632 1548.6278 1551.2025 1553.7771     3  b 
  gluedt 1426.333 1437.6870 1443.4343 1449.0413 1451.9851 1454.9289     3   c

数据:

> dt.iris <- as.data.table(iris)

评论

0赞 Vasily A 11/12/2023
是的,是一种可能的解决方法,但就我而言,我需要制作包含大量变量的字符串,因此格式提供了更好的可读性sprintfglue
0赞 jay.sf 11/12/2023
@VasilyA 然后使用 ,请参阅更新。paste
0赞 Vasily A 11/12/2023
只是试图避免那些众多的引用-逗号-逗号引用,再次,使其更易于阅读。谢谢你的建议!
1赞 jay.sf 11/12/2023
@VasilyA 大括号或使用语法突出显示明确区分文本和变量是否提供更好的可读性可能是一个品味问题,也是一个打字量问题。但是,速度要快得多,请参阅基准。sprintf
2赞 Hieu Nguyen 11/12/2023
glue 可以提供更好的可读性,但肯定会为变量提供不同类型的格式设置的灵活性(例如,您想要“5.0”还是“5”)。尽管我的回答,我仍然认为这是最好的选择。sprintfsprintf
1赞 Hieu Nguyen 11/12/2023 #3

我的方法是简单地使用:glue_data

dt.iris[Sepal.Width > 4, myText := glue_data(.SD, "The species is {Species} with sepal length of {Sepal.Length}")]

我认为这是由于将所有内容视为一个字符串的方式,而不是像往常一样将字符串和变量分开,以便正常工作。
另一种方法是使用元编程:
glue"The species is {Species} with sepal length of {Sepal.Length}"pastesprintfdata.table

gluedt <- function(...) substitute(glue(..., .envir = .SD))
dt.iris[Sepal.Width > 4, myText := eval(gluedt("The species is {Species} with sepal length of {Sepal.Length}"))]
1赞 G. Grothendieck 11/12/2023 #4

使用 (from data.table) 给出一个 all data.table 解决方案,或者 (from dplyr) 或 (from collapse) 代替 .如果我们提供 data.table 输入,我们仍然会得到一个 data.table 结果。transform.data.tablemutatefmutate[.data.table

dt.iris |>
  transform(myText = glue('The species is {Species} with sepal length of {Sepal.Length}'))

library(dplyr)
dt.iris |>
  mutate(myText = glue('The species is {Species} with sepal length of {Sepal.Length}'))

library(collapse)
dt.iris |>
  fmutate(myText = glue('The species is {Species} with sepal length of {Sepal.Length}'))
0赞 langtang 11/15/2023 #5

您还可以使用set

set(
  dt.iris,
  j="myText",
  value=glue("The species is {dt.iris$Species} with sepal length of {dt.iris$Sepal.Length}")
)