当我在“dplyr”之后加载“plyr”时,为什么summarize或mutate不适用于group_by?

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

提问人:Ignacio 提问时间:9/30/2014 最后编辑:HenrikIgnacio 更新时间:11/7/2020 访问量:11338

问:

注意:这个问题的标题已经过编辑,使其成为函数掩盖其对应项时问题的规范问题。问题的其余部分保持不变。plyrdplyr


假设我有以下数据:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

使用旧的,我可以使用以下代码创建一个小表来汇总我的数据:plyr

require(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))

输出如下所示:

  group sex  mean    sd
1     A   F 49.68  5.68
2     A   M 32.21  6.27
3     B   F 31.87  9.80
4     B   M 37.54  9.73
5     C   F 40.61 15.21
6     C   M 36.33 11.33

我正在尝试将我的代码移动到运算符。我的代码采用 DF,然后按组和性别对其进行分组,然后对其进行总结。那是:dplyr%>%

dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

但我的输出是:

  mean   sd
1 35.56 9.92

我做错了什么?

dplyr plyr r-常见问题

评论


答:

25赞 Carlos Cinelli 9/30/2014 #1

这里的问题是你先加载 dplyr 然后加载 plyr,所以 plyr 的函数屏蔽了 dplyr 的函数。发生这种情况时,您会收到以下警告:summarisesummarise

library(plyr)
    Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

    arrange, desc, failwith, id, mutate, summarise, summarize

因此,为了使您的代码正常工作,请分离 plyr 或重新启动 R 并首先加载 plyr,然后加载 dplyr(或仅加载 dplyr):detach(package:plyr)

library(dplyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

  group sex  mean    sd
1     A   F 41.51  8.24
2     A   M 32.23 11.85
3     B   F 38.79 11.93
4     B   M 31.00  7.92
5     C   F 24.97  7.46
6     C   M 36.17  9.11

或者你可以在代码中显式调用 dplyr 的 summarise,这样无论你如何加载包,都会调用正确的函数:

dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

评论

13赞 hadley 9/30/2014
我不明白为什么很少有人注意到这个警告:/
2赞 Gregor Thomas 3/4/2016
@hadleyfortunes::fortune(9)
4赞 A5C1D2H2I1M1N2O1R2T1 9/30/2014 #2

您的代码正在调用,而不是由于您加载“plyr”和“dplyr”的顺序。plyr::summarisedplyr::summarise

演示:

library(dplyr) ## I'm guessing this is the order you loaded
library(plyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
#    mean   sd
# 1 36.88 9.76
dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
# Source: local data frame [6 x 4]
# Groups: group
# 
#   group sex  mean    sd
# 1     A   F 32.17  6.30
# 2     A   M 30.98  7.37
# 3     B   F 38.20  7.67
# 4     B   M 33.12 12.24
# 5     C   F 43.91 10.31
# 6     C   M 47.53  8.25