提问人:Sam 提问时间:3/22/2020 更新时间:3/23/2020 访问量:2088
ggplot2 警告 drop = FALSE 时“删除了包含缺失值的 x 行”
ggplot2 warning "removed x rows containing missing values" when drop = FALSE
问:
我正在使用 ggplot2 创建一个并排条形图。我的代码在 时生成正确的绘图。但是,我有一个值为 0 的级别,我想将其包含在 x 轴上。当我设置时,我收到警告:并且另一个具有非零值的类别在图上显示为零。scale_x_discrete(drop = T)
scale_x_discrete(drop = F)
Removed x rows containing missing values (geom_bar).
这是我的数据的重现:
library("tidyverse")
df <- data.frame(
location = c(rep("in", 231), rep("out", 83)),
status = c(rep("normal", 73), rep("mild", 42), rep("moderate", 20), rep("fever", 4),
rep("normal", 70), rep("mild", 41), rep("moderate", 62), rep("fever", 2)))
df$status <- factor(df$status, levels = c("normal", "mild", "moderate", "severe", "fever"))
df %>%
ggplot(aes(x = status,
y = ..count../tapply(..count.., ..x.., sum)[..x..],
fill = location)) +
geom_bar(position = "dodge") +
scale_y_continuous(labels = scales::percent) +
scale_x_discrete(drop=F) +
NULL
我一直在研究这个问题,但真的无法解决问题。
答:
无法解释未绘制的非零值。这是一个使用 dplyr 函数的解决方案group_by
#calculate totals and then calculate the %
df %>% group_by(status, location) %>% summarise(value=n()) %>%
group_by(status) %>% mutate(result=value/sum(value)) %>%.
ggplot(aes(x = status,
y = result,
fill = location)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::percent) +
scale_x_discrete(drop=F)
请注意,现在geom_col而不是geom_bar。
评论
geom_bar
您的代码不起作用,因为即使缺少类别,和 中仍然不存在。这可以通过绘图和 来看出。drop = FALSE
..count..
..x..
..count..
..x..
library("tidyverse")
df <- data.frame(
location = c(rep("in", 231), rep("out", 83)),
status = c(rep("normal", 73), rep("mild", 42), rep("moderate", 20), rep("fever", 4),
rep("normal", 70), rep("mild", 41), rep("moderate", 62), rep("fever", 2)))
df$status <- factor(df$status, levels = c("normal", "mild", "moderate", "severe", "fever"))
情节..count..
df %>%
ggplot(aes(x = status,
y = ..count..,
fill = location)) +
geom_bar(position = "dodge") +
scale_x_discrete(drop=F)
不存在缺失的类别,我们可以从仅显示一个值的事实中推断出,即 是向量..count..
normal
..count..
..count.. <- c(143, 64, 19, 20, 62, 4, 2)
情节..x..
df %>%
ggplot(aes(x = status,
y = ..x..,
fill = location)) +
geom_bar(position = "dodge") +
scale_x_discrete(drop=F)
与缺失的类别一样,不存在 是向量..count..
..x..
..x..
..x.. <- c(1, 2, 2, 3, 3, 5, 5)
为什么代码不起作用
作为第一步,我计算它为我们提供了一个长度为 4 的向量(非缺失状态类别的总计数):tapply(..count.., ..x.., sum)
tapply(..count.., ..x.., sum)
#> 1 2 3 5
#> 143 83 82 6
现在,通过以下方式提取元素[..x..]
tapply(..count.., ..x.., sum)[..x..]
#> 1 2 2 3 3 <NA> <NA>
#> 143 83 83 82 82 NA NA
或
..count.. / tapply(..count.., ..x.., sum)[..x..]
#> 1 2 2 3 3 <NA> <NA>
#> 1.0000 0.7711 0.2289 0.2439 0.7561 NA NA
因此,您的代码导致最后两个类别缺少两个,这解释了警告。原因是我们试图从长度 4 向量中提取两倍的第 5 个元素,从而取回 NA。Removed 2 rows containing missing values (geom_bar)
..x.. <- c(1, 2, 2, 3, 3, 5, 5)
tapply(..count.., ..x.., sum)
万一一切正常,因为在这种情况下,而是一样的。drop=TRUE
..x.. <- c(1, 2, 2, 3, 3, 4, 4)
..count..
溶液
这个问题可以通过转换为字符向量来解决。在这种情况下,我们按名称提取元素:..x..
library("tidyverse")
df <- data.frame(
location = c(rep("in", 231), rep("out", 83)),
status = c(rep("normal", 73), rep("mild", 42), rep("moderate", 20), rep("fever", 4),
rep("normal", 70), rep("mild", 41), rep("moderate", 62), rep("fever", 2)))
df$status <- factor(df$status, levels = c("normal", "mild", "moderate", "severe", "fever"))
# Convert ..x.. to character
df %>%
ggplot(aes(x = status,
y = ..count.. / tapply(..count.., ..x.., sum)[as.character(..x..)],
fill = location)) +
geom_bar(position = "dodge") +
scale_x_discrete(drop=F)
由 reprex 软件包 (v0.3.0) 于 2020-03-23 创建
评论