R 的原生管道 '|>' 和 magrittr 管道 '%>%' 有什么区别？-解网

问：

在 R 4.1（2021 年 5 月）中，引入了一个本机管道运算符，它比以前的实现“更精简”。我已经注意到原生管道和 magrittr 管道之间的一个区别，即有效但不能并且必须写成 .使用本机管道运算符时是否需要注意更多差异和陷阱？|>%>%2 %>% sqrt2 |> sqrt2 |> sqrt()

管 magrittr R-FAQ

microbenchmark::microbenchmark(
  sqrt(1), 
  2 |> sqrt(), 
  3 %>% sqrt()
)

# Unit: nanoseconds
#          expr  min     lq    mean median   uq   max neval
#       sqrt(1)  117  126.5  141.66  132.0  139   246   100
#       sqrt(2)  118  129.0  156.16  134.0  145  1792   100
#  3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736   100

您将看到传递给的表达式是如何解析为的。这也可以在2 |> sqrt()microbenchmarksqrt(2)

quote(2 |> sqrt())
# sqrt(2)

43赞 Dirk is no longer here 6/3/2021 #3

在 R 4.1.0 中添加的基础 R 管道“只是”进行功能组合。也就是说，我们可以看到它的用法实际上与函数调用相同：|>

> 1:5 |> sum()             # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
>

这会产生一些后果：

它使它更快一点
它使它更简单、更强大
它使它更具限制性：这里需要 parens 才能进行适当的调用sum()
它限制了“隐式”数据参数的使用

这导致可能使用当前“可用但未激活”（您需要为其设置环境变量，并且可能会针对 R 4.2.0 更改）。=>_R_USE_PIPEBIND_

（这首先是作为对在这里复制这个问题的问题的答案而提供的，我只是按照建议复制了它。

编辑：随着关于“什么是”的后续问题的出现，这里有一个快速的跟进。请注意，此运算符可能会更改。=>

> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)

Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))

Coefficients:
(Intercept)         disp  
     40.872       -0.135  

> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
>

这里特别好。deparse(substitute(...))

如果我不得不猜测（我不是 R-core），那是因为这些运算符（， etc）重写了语法，以便变成，并且他们不想转换为双倍的计算（其中第二个是冗余和双倍时间调用）。不过，这只是一个猜测。@GitHunter0|>longcalc() |> quux(x = _)quux(x = longcalc())longcalc() |> quux(x=_, y=)quux(x=longcalc(), y=longcalc())

103赞 GKi 5/2/2022 #5

主题	马格里特 2.0.3	基础 4.3.0
算子	`%>%` `%<>%` `%$%` `%!>%` `%T>%`	`\|>`（自 4.1.0 起）
函数调用	`1:3 %>% sum()`	`1:3 \|> sum()`
	`1:3 %>% sum`	需要方括号/括号
	1:3 %>% `+`(4)	不支持某些功能
插入第一个空白位置	`mtcars %>% lm(formula = mpg ~ disp)`	`mtcars \|> lm(formula = mpg ~ disp)`
占位符	`.`	`_`（自 4.2.0 起）
	`mtcars %>% lm(mpg ~ disp, data = . )`	`mtcars \|> lm(mpg ~ disp, data = _ )`
	`mtcars %>% lm(mpg ~ disp, . )`	需要命名参数
	`1:3 %>% setNames(., .)`	只能出现一次
	`1:3 %>% {sum(sqrt(.))}`	不允许嵌套调用
提取调用	`mtcars %>% .$cyl` `mtcars %>% {.$cyl[[3]]}`或 `mtcars %$% cyl[[3]]`	`mtcars \|> _$cyl`（自 4.3.0 起） `mtcars \|> _$cyl[[3]]`
环境	`%>%`具有附加功能环境使用：`"x" %!>% assign(1)`	`"x" \|> assign(1)`
创建函数	`top6 <- . %>% sort() %>% tail()`	不可能
速度	较慢，因为函数调用的开销	更快，因为语法转换

与（匿名）函数结合使用时，许多差异和限制会消失：|>
1 |> (\(.) .)()
-3:3 |> (\(.) sum(2*abs(.) - 3*.^2))()

另请看：如何纯粹在基础 R（“基础管道”）中管道？和 %>%、%<>%、%$%、%！>% 和 %T>% 的五个 Magrittr 管道的区别和用例是什么？

需要括号

library(magrittr)

1:3 |> sum
#Error: The pipe operator requires a function call as RHS

1:3 |> sum()
#[1] 6

1:3 |> approxfun(1:3, 4:6)()
#[1] 4 5 6

1:3 %>% sum
#[1] 6

1:3 %>% sum()
#[1] 6

1:3 %>% approxfun(1:3, 4:6)  #But in this case empty parentheses are needed
#Error in if (is.na(method)) stop("invalid interpolation method") :
1:3 %>% approxfun(1:3, 4:6)()
#[1] 4 5 6

不支持某些功能，但是有些仍然可以通过将它们放在括号中来调用，通过函数调用它们，使用占位符，在函数中调用它或定义指向函数的链接。::

1:3 |> `+`(4)
#Error: function '+' not supported in RHS call of a pipe

1:3 |> (`+`)(4)
#[1] 5 6 7

1:3 |> base::`+`(4)
#[1] 5 6 7

1:3 |>  `+`(4, e2 = _)
#[1] 5 6 7

1 |> (`+`)(2) |> (`*`)(3) #(1 + 2) * 3  or `*`(`+`(1, 2), 3) and NOT 1 + 2 * 3
#[1] 9

1:3 |> (\(.) . + 4)()
#[1] 5 6 7

fun <- `+`
1:3 |> fun(4)
#[1] 5 6 7

1:3 %>% `+`(4)
#[1] 5 6 7

占位符需要命名参数

2 |> setdiff(1:3, _)
#Error: pipe placeholder can only be used as a named argument

2 |> setdiff(1:3, y = _)
#[1] 1 3

2 |> (\(.) setdiff(1:3, .))()
#[1] 1 3

2 %>% setdiff(1:3, .)
#[1] 1 3

2 %>% setdiff(1:3, y = .)
#[1] 1 3

此外，对于具有（dot-dot-dot）参数的可变参数函数，需要将占位符用作命名参数。..._

"b" |>  paste("a", _, "c")
#Error: pipe placeholder can only be used as a named argument

"b" |>  paste("a", . = _, "c")
#[1] "a b c"

"b" |>  (\(.) paste("a", ., "c"))()
#[1] "a b c"

占位符只能出现一次

1:3 |> setNames(nm = _)
#1 2 3 
#1 2 3 

1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") : 
#  pipe placeholder may only appear once

1:3 |> (\(.) setNames(., .))()
#1 2 3 
#1 2 3 

1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3 
#1 2 3 

1:3 |> list(. = _) |> with(setNames(., .))
#1 2 3
#1 2 3

1:3 %>% setNames(object = ., nm = .)
#1 2 3
#1 2 3

1:3 %>% setNames(., .)
#1 2 3 
#1 2 3

不允许嵌套调用

1:3 |> sum(sqrt(x=_))
#Error in sum(1:3, sqrt(x = "_")) : invalid use of pipe placeholder

1:3 |> (\(.) sum(sqrt(.)))()
#[1] 4.146264

1:3 %>% {sum(sqrt(.))}
#[1] 4.146264

提取调用
自 4.3.0 起的实验性功能。现在，占位符还可以在正向管道表达式的 rhs 中用作提取调用中的第一个参数，例如 .更一般地说，它可以用作提取链的头部，例如_|>_$coef_$coef[[2]]*

mtcars |> _$cyl
mtcars |> _[["cyl"]]
mtcars |> _[,"cyl"]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> _$cyl[[4]]
#[1] 6

mtcars %>% .$cyl
mtcars %>% .[["cyl"]]
mtcars %>% .[,"cyl"]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

#mtcars %>% .$cyl[4] #gives mtcars[[4]]
mtcars %>% .$cyl %>% .[4]
#[1] 6

没有额外的环境

assign("x", 1)
x
#[1] 1

"x" |> assign(2)
x
#[1] 2

"x" |> (\(x) assign(x, 3))()
x
#[1] 2

1:3 |> assign("x", value=_)
x
#[1] 1 2 3

"x" %>% assign(4)
x
#[1] 1 2 3

4 %>% assign("x", .)
x
#[1] 1 2 3

"x" %!>% assign(4) #Use instead the eager pipe
x
#[1] 4

5 %!>% assign("x", .)
x
#[1] 5

创建函数

top6 <- . %>% sort() %>% tail()
top6(c(1:10,10:1))
#[1]  8  8  9  9 10 10

其他可能性：
可以使用 Bizarro 管道实现不同的管道运算符和不同的占位符，这不是覆盖管道（参见缺点）->.;.

1:3 ->.; sum(.)
#[1] 6

mtcars ->.; .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars ->.; .$cyl[4]
#[1] 6

1:3 ->.; setNames(., .)
#1 2 3 
#1 2 3 

1:3 ->.; sum(sqrt(x=.))
#[1] 4.146264

"x" ->.; assign(., 5)
x
#[1] 5

6 ->.; assign("x", .)
x
#[1] 6

1:3 ->.; . + 4
#[1] 5 6 7

1 ->.; (`+`)(., 2) ->.; (`*`)(., 3)
#[1] 9

1 ->.; .+2 ->.; .*3
#[1] 9

和评估不同。

x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

x ->.; f1(.) ->.; f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

x |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

或者定义一个自定义管道运算符，该运算符在新环境中设置为 lhs 的值并计算其中的 rhs。但在这里，无法创建或更改调用环境中的值。.

`:=` <- \(lhs, rhs) eval(substitute(rhs), list(. = lhs))

mtcars := .$cyl[4]
#[1] 6

1:3 := setNames(., .)
#1 2 3 
#1 2 3 

1:3 := sum(sqrt(x=.))
#[1] 4.146264

"x" := assign(., 6)
x
#Error: object 'x' not found

1 := .+2 := .*3
#[1] 9

因此，另一种尝试是将 lhs 分配给调用环境中的占位符，并评估调用环境中的 rhs。但是，如果它已经存在，这里将从调用环境中删除。..

`?` <- \(lhs, rhs) {
  on.exit(if(exists(".", parent.frame())) rm(., envir = parent.frame()))
  assign(".", lhs, envir=parent.frame())
  eval.parent(substitute(rhs))
}

mtcars ? .$cyl[4]
#[1] 6

1:3 ? setNames(., .)
#1 2 3 
#1 2 3 

1:3 ? sum(sqrt(x=.))
#[1] 4.146264

"x" ? assign(., 6)
x
#[1] 6

1 ? .+2 ? .*3
#[1] 9

另一种可能性是用 lhs 替换所有，以便在评估期间不再作为名称存在。..

`%|>%` <- \(lhs, rhs)
  eval.parent(eval(call('substitute', substitute(rhs), list(. = lhs))))

mtcars %|>% .$cyl[4]
[1] 6

1:3 %|>% setNames(., .)
1 2 3 
1 2 3

1:3 %|>% sum(sqrt(x=.))
[1] 4.146264

"x" %|>% assign(., 6)
x
#[1] 6

1 %|>% .+2 %|>% .*3
#[1] 7

所用运算符的名称会影响运算符的优先级：请参阅相同的函数，但使用名称 %>% 会导致使用名称：= 时的结果不同。
有关更多高级选项，请参阅：编写自己的/自定义管道运算符。

速度

library(magrittr)

`:=` <- \(lhs, rhs) eval(substitute(rhs), list(. = lhs))

`?` <- \(lhs, rhs) {
  on.exit(if(exists(".", parent.frame())) rm(., envir = parent.frame()))
  assign(".", lhs, envir=parent.frame())
  eval.parent(substitute(rhs))
}

`%|>%` <- \(lhs, rhs)
  eval.parent(eval(call('substitute', substitute(rhs), list(. = lhs))))


x <- 42
bench::mark(min_time = 0.2, max_iterations = 1e8
, x
, identity(x)
, "|>" = x |> identity()
, "|> _" = x |> identity(x=_)
, "->.;" = {x ->.; identity(.)}
, "|> f()" = x |> (\(y) identity(y))()
, "%>%" = x %>% identity
, ":=" = x := identity(.)
, "list." = x |> list() |> setNames(".") |> with(identity(.))
, "%|>%" = x %|>% identity(.)
, "?" = x ? identity(.)
)

结果

   expression       min   median `itr/sec` mem_alloc `gc/sec`   n_itr  n_gc
   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>   <int> <dbl>
 1 x            31.08ns   48.2ns 19741120.        0B     7.46 2646587     1
 2 identity(x) 491.04ns 553.09ns  1750116.        0B    27.0   323575     5
 3 |>          497.91ns 548.08ns  1758553.        0B    27.3   322408     5
 4 |> _        506.87ns 568.92ns  1720374.        0B    26.9   320003     5
 5 ->.;        725.03ns 786.04ns  1238488.        0B    21.2   233864     4
 6 |> f()      972.07ns   1.03µs   929926.        0B    37.8   172288     7
 7 %>%           2.76µs   3.05µs   315448.        0B    37.2    59361     7
 8 :=            3.02µs   3.35µs   288025.        0B    37.0    54561     7
 9 list.         5.19µs   5.89µs   166721.        0B    36.8    31752     7
10 %|>%          6.01µs   6.86µs   143294.        0B    37.0    27076     7
11 ?             30.9µs  32.79µs    30074.        0B    31.3     5768     6

R 的原生管道 '|>' 和 magrittr 管道 '%>%' 有什么区别？

What are the differences between R's native pipe `|>` and the magrittr pipe `%>%`?

评论

评论

评论

评论

R 的原生管道 '|&gt;' 和 magrittr 管道 '%&gt;%' 有什么区别？

What are the differences between R's native pipe `|>` and the magrittr pipe `%>%`?

评论

评论

评论

评论

R 的原生管道 '|>' 和 magrittr 管道 '%>%' 有什么区别？