将annotation_custom值与 ggplot 网格线对齐

Align annotation_custom values with ggplot gridlines

提问人:Nate 提问时间:12/8/2022 最后编辑:stefanNate 更新时间:12/8/2022 访问量:126

问:

不确定除了“试错”之外,是否可以通过任何其他方式完成此操作,但我正在尝试对齐绘图右侧的 y 轴标签、水平网格线和相应的自定义注释值(样本大小),以便可以直接阅读整个绘图。通过较少的组绘制更容易实现这一点,但要使每个样本大小之间的距离正确,以及字体大小,并且要使所有内容都很好地对齐是具有挑战性的,需要数小时。我只是想知道是否有更快/更简单的方法可以做到这一点。

我目前的方法:

按要在 y 轴上绘制的组数(“by”值)的数来运行此操作 示例:N=25,1/25 = 0.04 -> 这些是每个样本数量值之间的距离

format(round(seq(-0.996:0, by = 0.04),3), scientific = F)

复制结果..

[1] "-0.996" "-0.956" "-0.916" "-0.876" "-0.836" "-0.796" "-0.756" "-0.716" "-0.676" "-0.636"
[11] "-0.596" "-0.556" "-0.516" "-0.476" "-0.436" "-0.396" "-0.356" "-0.316" "-0.276" "-0.236"
[21] "-0.196" "-0.156" "-0.116" "-0.076" "-0.036"

并将其粘贴到此处:

# ...annotation_custom(grid::textGrob(pivot_df$n, x = 1.035, y = c(0.996, 0.956, 0.916, 0.876, 
# 0.836, 0.796, 0.756, 0.716, 0.676, 0.636, 0.596, 0.556, 0.516, 0.476, 0.436, 0.396, 0.356, 
# 0.316, 0.276, 0.236, 0.196, 0.156, 0.116, 0.076, 0.036),...

在这个图中。然后“盯着”结果,但如果不能很好地对齐,请重新做所有事情......

ggplot(data=subset(df, !is.na(sal)), 
       aes(y = reorder(species, -sal, FUN = median), x = sal)) + 
  geom_boxplot(outlier.shape = 1, outlier.size = 1, orientation = "y") + 
  coord_cartesian(clip = "off") + 
  annotation_custom(grid::textGrob(pivot_df$n, 
                                   x = 1.035,
                                   y = c(0.996, 0.956, 0.916, 0.876, 0.836, 0.796, 0.756, 0.716, 0.676,
                                         0.636, 0.596, 0.556, 0.516, 0.476, 0.436, 0.396, 0.356, 0.316,
                                         0.276, 0.236, 0.196, 0.156, 0.116, 0.076, 0.036),
                                   gp = grid::gpar(cex = 0.3))) +
  annotation_custom(grid::textGrob(expression(bold(underline("N"))),
                                   x = 1.035, 
                                   y = 1.02,
                                   gp = grid::gpar(cex = 0.5))) + 
ylab("") + 
xlab("") + 
  theme(axis.text.y   = element_text(size=7, face="italic"),
        axis.text.x   = element_text(size=7),
        axis.title.x  = element_text(size=9,face="bold"),
        axis.line = element_line(colour = "black"),
        panel.background = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1), 
        panel.grid.major = element_line(colour = "#E0E0E0"),
        plot.title = element_text(hjust = 0.5)) + 
  theme(plot.margin = margin(21, 40, 20, 20))

enter image description here

这就是它运行良好时应该有的样子,但是到达这里真的很乏味,如果有 90+ 组要在 y 轴上绘制,则需要很长时间。有更好的方法吗?enter image description here

示例数据:

library(dplyr)
library(ggplot2)

df <- data.frame(species = LETTERS[seq(from = 1, to = 25)],
                 sal = rnorm(n=5000, mean = 27, sd = 8),
                 num = sample(x = 1:10, size  = 5000, replace = TRUE))

pivot_df <- df %>% 
  group_by(species) %>% 
  summarize(n = n(),median_sal = median(sal, na.rm = T)) %>%
  arrange(median_sal)
R ggPlot2 注解网格 线

评论


答:

1赞 stefan 12/8/2022 #1

通过减少摆弄来获得正确位置的一种选择是使用连续的 y 刻度,它允许重复的轴,并可用于添加您的标签。annotation_custom

为此,您必须首先将 .然后,转换为数字并将数字映射到美学上。然后使用“scale_y_continuousaxis.ticks.length”的 and 参数将辅助轴标签向右移动。reorderspeciesybreakslabelsto add the labels for the primary and the secondary axis. Also note the use of

library(tidyverse)

df$species <- reorder(df$species, -df$sal, FUN = median)
df$species_num <- as.numeric(df$species)
breaks_x_left <- sort(unique(df$species_num))
labels_x_left <- levels(df$species)

pivot_df <- df %>%
  group_by(species_num) %>%
  summarize(n = n(), median_sal = median(sal, na.rm = T)) %>%
  arrange(median_sal)

labels_x_right <- pivot_df |> select(species_num, n) |> tibble::deframe()

ggplot(
  data = subset(df, !is.na(sal)),
  aes(y = species_num, x = sal, group = species)
) +
  geom_boxplot(outlier.shape = 1, outlier.size = 1, orientation = "y") +
  scale_y_continuous(
    breaks = sort(unique(df$species_num)), 
    labels = levels(df$species),
    expand = c(0, .6),
    sec.axis = dup_axis(labels = labels_x_right)
  ) +
  coord_cartesian(clip = "off") +
  annotation_custom(
    grid::textGrob(
      expression(bold(underline("N"))),
    x = unit(1, "npc") + unit(20, "pt"),
    y = unit(1, "npc") + unit(4, "pt"),
    hjust = 0,
    vjust = 0,
    gp = grid::gpar(cex = 0.5)
  )) +
  ylab("") +
  xlab("") +
  theme(
    axis.text.y = element_text(size = 7, face = "italic", hjust = 0),
    axis.ticks.y.right = element_blank(),
    axis.ticks.length.y.right = unit(15, "pt"),
    axis.text.y.right = element_text(hjust = 0),
    axis.text.x = element_text(size = 7),
    axis.title.x = element_text(size = 9, face = "bold"),
    axis.line = element_line(colour = "black"),
    panel.background = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill = NA, size = 1),
    panel.grid.major = element_line(colour = "#E0E0E0"),
    plot.title = element_text(hjust = 0.5)
  ) +
  theme(plot.margin = margin(21, 20, 20, 20))

enter image description here

评论

0赞 Nate 12/8/2022
太好了,非常感谢!快速提问,我认为这与我的 R 版本有关:“|>”符号,旧版本是“%in%”?他们做同样的事情?
1赞 stefan 12/8/2022
噢。是的。原生管道 “|>” 是在 R 4.1 中引入的,通常可以用 magrittr 管道 %>% 替换。(但请注意,在其他目录中并非总是如此)。