在不同日期制作分组堆积条形图

Making a grouped stacked barchart over different dates

提问人:Nuller 提问时间:11/16/2023 更新时间:11/16/2023 访问量:28

问:

背景:我想在 R 中创建以下堆积条形图,如下所示:enter image description here

数据:我所拥有的数据由不同票面利率的债券组成。在这些债券中,存在三个债务人类别:“小额、中型和大型贷款”。在给定日期,这些债券中的每一个都有不同的名义价值,由债券内债务人类别的总和组成。我想绘制的是这些贷款规模类别中的息票债券分布如何随时间变化(如图所示)。

可重现的代码首先,我只能提供代码片段。真正的数据框本身太大了,无法在这里布置。然而,我试着展示我自己做了什么,并展示我是如何失败的。

library(ggplot2)
library(dplyr)

df = structure(list(Dato = c("04-23", "04-23", "04-23", "04-23", "04-23", 
                             "04-23", "04-23", "04-23", "04-23", "04-23", "04-23", "04-23", 
                             "04-23", "04-23", "04-23", "04-23", "04-23", "04-23", "07-23", 
                             "07-23", "07-23", "07-23", "07-23", "07-23", "07-23", "07-23", 
                             "07-23", "07-23", "07-23", "07-23", "07-23", "07-23", "07-23", 
                             "07-23", "07-23", "07-23", "10-23", "10-23", "10-23", "10-23", 
                             "10-23", "10-23", "10-23", "10-23", "10-23", "10-23", "10-23", 
                             "10-23", "10-23", "10-23", "10-23", "10-23", "10-23", "10-23"
), Fondskode = c("DK0002027036", "DK0002027036", "DK0004616950", 
                 "DK0004616950", "DK0004616950", "DK0007200976", "DK0009410425", 
                 "DK0009410425", "DK0009410425", "DK0009524431", "DK0009524431", 
                 "DK0009524431", "DK0009529315", "DK0009529315", "DK0009529315", 
                 "DK0009539116", "DK0009539116", "DK0009539116", "DK0002027036", 
                 "DK0002027036", "DK0004616950", "DK0004616950", "DK0004616950", 
                 "DK0007200976", "DK0009410425", "DK0009410425", "DK0009410425", 
                 "DK0009524431", "DK0009524431", "DK0009524431", "DK0009529315", 
                 "DK0009529315", "DK0009529315", "DK0009539116", "DK0009539116", 
                 "DK0009539116", "DK0002027036", "DK0002027036", "DK0004616950", 
                 "DK0004616950", "DK0004616950", "DK0007200976", "DK0009410425", 
                 "DK0009410425", "DK0009410425", "DK0009524431", "DK0009524431", 
                 "DK0009524431", "DK0009529315", "DK0009529315", "DK0009529315", 
                 "DK0009539116", "DK0009539116", "DK0009539116"), KuponRente = c(3, 
                                                                                 3, -0.5, -0.5, -0.5, 4, 6, 6, 6, 1, 1, 1, 1, 1, 1, 5, 5, 5, 3, 
                                                                                 3, -0.5, -0.5, -0.5, 4, 6, 6, 6, 1, 1, 1, 1, 1, 1, 5, 5, 5, 3, 
                                                                                 3, -0.5, -0.5, -0.5, 4, 6, 6, 6, 1, 1, 1, 1, 1, 1, 5, 5, 5), 
DistributionNew = c("Medium Loans", "Small Loans", "Large Loans", 
                    "Medium Loans", "Small Loans", "Small Loans", "Large Loans", 
                    "Medium Loans", "Small Loans", "Large Loans", "Medium Loans", 
                    "Small Loans", "Large Loans", "Medium Loans", "Small Loans", 
                    "Large Loans", "Medium Loans", "Small Loans", "Medium Loans", 
                    "Small Loans", "Large Loans", "Medium Loans", "Small Loans", 
                    "Small Loans", "Large Loans", "Medium Loans", "Small Loans", 
                    "Large Loans", "Medium Loans", "Small Loans", "Large Loans", 
                    "Medium Loans", "Small Loans", "Large Loans", "Medium Loans", 
                    "Small Loans", "Medium Loans", "Small Loans", "Large Loans", 
                    "Medium Loans", "Small Loans", "Small Loans", "Large Loans", 
                    "Medium Loans", "Small Loans", "Large Loans", "Medium Loans", 
                    "Small Loans", "Large Loans", "Medium Loans", "Small Loans", 
                    "Large Loans", "Medium Loans", "Small Loans"), TotalSumNominel = c(14871078, 
                                                                                       230519644, 124229666, 105950539, 191869576, 34229, 166832711, 
                                                                                       415771109, 75355256, 10450320996, 30789310193, 4276173735, 
                                                                                       4277192442, 2200535956, 1145694826, 727112293, 32524077976, 
                                                                                       6350025138, 14544895, 221211378, 133635759, 97870396, 188590028, 
                                                                                       26114, 166287052, 402943367, 78061232, 10294401373, 28473616784, 
                                                                                       4071460911, 4182694064, 2000153569, 1107766445, 1199628400, 
                                                                                       46958195327, 9033691732, 13221795, 212722212, 110535953, 
                                                                                       88797232, 185525718, 26114, 165736626, 388110231, 75004980, 
                                                                                       10188496496, 26486190527, 3893875368, 4094368265, 1832338499, 
                                                                                       1083867006, 1506440905, 54391483677, 10106559492)), class = c("grouped_df", 
                                                                                                                                                     "tbl_df", "tbl", "data.frame"), row.names = c(NA, -54L), groups = structure(list(
                                                                                                                                                       Dato = c("04-23", "04-23", "04-23", "04-23", "04-23", "04-23", 
                                                                                                                                                                "04-23", "07-23", "07-23", "07-23", "07-23", "07-23", "07-23", 
                                                                                                                                                                "07-23", "10-23", "10-23", "10-23", "10-23", "10-23", "10-23", 
                                                                                                                                                                "10-23"), Fondskode = c("DK0002027036", "DK0004616950", "DK0007200976", 
                                                                                                                                                                                        "DK0009410425", "DK0009524431", "DK0009529315", "DK0009539116", 
                                                                                                                                                                                        "DK0002027036", "DK0004616950", "DK0007200976", "DK0009410425", 
                                                                                                                                                                                        "DK0009524431", "DK0009529315", "DK0009539116", "DK0002027036", 
                                                                                                                                                                                        "DK0004616950", "DK0007200976", "DK0009410425", "DK0009524431", 
                                                                                                                                                                                        "DK0009529315", "DK0009539116"), KuponRente = c(3, -0.5, 
                                                                                                                                                                                                                                        4, 6, 1, 1, 5, 3, -0.5, 4, 6, 1, 1, 5, 3, -0.5, 4, 6, 1, 
                                                                                                                                                                                                                                        1, 5), .rows = structure(list(1:2, 3:5, 6L, 7:9, 10:12, 13:15, 
                                                                                                                                                                                                                                                                      16:18, 19:20, 21:23, 24L, 25:27, 28:30, 31:33, 34:36, 
                                                                                                                                                                                                                                                                      37:38, 39:41, 42L, 43:45, 46:48, 49:51, 52:54), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                                                                                                                                                                    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
                                                                                                                                                                                                                                                                                                                                                    ), row.names = c(NA, -21L), .drop = TRUE))
df_summarized <- df %>%
  group_by(Dato, DistributionNew) %>%
  mutate(
    TotalSumNominelPerDate = sum(TotalSumNominel, na.rm = TRUE),
    PercentageOfTotal = (TotalSumNominel / TotalSumNominelPerDate) * 100
  ) 




ggplot(df_summarized, aes(x = DistributionNew, y = PercentageOfTotal, fill = factor(KuponRente))) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9), aes(group = interaction(DistributionNew, Dato))) +
  scale_fill_brewer(palette = "Set1", name = "KuponRente") +
  theme_bw() +
  labs(title = "Stacked Bar Chart by DistributionNew for each Date",
       x = "DistributionNew",
       y = "Percentage of Total",
       fill = "KuponRente") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
  scale_x_discrete(expand = expansion(add = c(0.5, 0.5))) 

其中可重现代码产生以下图形:enter image description here

我真的不明白。我似乎不能将它们堆叠在一起,让不同的“KuponRente”的每个 % 填充堆叠的条形图,最高可达 100%。我一定做错了什么,但是我不知道该怎么做。

下面我还附上了图像,我用完整的数据集(而不仅仅是我提供的片段)生成了图像:enter image description here

简而言之,我的问题:如何让 ggplot 根据我的小图纸正确堆叠我的数据?

r ggplot2 dplyr

评论

0赞 Jon Spring 11/16/2023
ggplot2 中的分组堆积图有点棘手,因为您要求两种不同的位置调整:1) Dato 组之间的水平躲避,以及 2) KuponRente 值之间的堆叠。潜在重复:stackoverflow.com/questions/61191011/....一个简单的解决方案可能是将 DistributionNew 放在 facets 中,x 使用 Dato,然后使用 position_stack 而不是 dodge。

答:

1赞 stefan 11/16/2023 #1

问题在于,你不能同时拥有堆叠和闪避的条形图,即在代码中,条形图被闪避,但彼此叠加绘制。

接近所需结果的一种选择是使用分面,即通过以下方式制作堆叠条形图和分面:DistributionNew

library(ggplot2)
library(dplyr)

df_summarized <- df %>%
  group_by(Dato, DistributionNew, KuponRente) %>%
  summarise(
    TotalSumNominel = sum(TotalSumNominel, na.rm = TRUE)
  ) |> 
  mutate(PercentageOfTotal = TotalSumNominel / sum(TotalSumNominel) * 100)

ggplot(df_summarized, aes(
  x = Dato, y = PercentageOfTotal,
  fill = factor(KuponRente)
)) +
  geom_col() +
  scale_fill_brewer(palette = "Set1", name = "KuponRente") +
  scale_x_discrete(expand = expansion(add = c(0.5, 0.5)), position = "top") +
  scale_y_continuous(expand = c(0, 0)) +
  facet_wrap(~DistributionNew, strip.position = "bottom") +
  theme_bw() +
  theme(
    axis.text.x.top = element_text(angle = 90, vjust = 0.5),
    strip.placement = "outside",
    strip.background.x = element_blank()
  ) +
  labs(
    title = "Stacked Bar Chart by DistributionNew for each Date",
    x = "DistributionNew",
    y = "Percentage of Total",
    fill = "KuponRente"
  )

enter image description here