提问人:twistedgiraff3 提问时间:11/17/2023 最后编辑:twistedgiraff3 更新时间:11/17/2023 访问量:56
如何创建包含 2 条线的折线图,其中 x 是年份,y 是占总数的比例而不是原始计数?
How do I create a line graph with 2 lines where x is year and y is proportion of total rather than raw count?
问:
我正在处理 2 种处方药的数据。有仿制药和名牌药。它们就像同一种药物,只是一种更便宜。我的数据按几个变量细分,包括季度、年份、该时间段内的处方数量和药物名称(作为字符变量)。我目前的代码如下。
alldrugs %>%
filter(product_name == "DIMETHYL F" | product_name == "TECFIDERA ") %>%
mutate(yr_q = yq(paste(year, quarter)), number_of_prescriptions = as.numeric(x = number_of_prescriptions)) %>%
group_by(product_name, yr_q)%>%
summarize(Prescription.count = sum(number_of_prescriptions)) %>%
ggplot(aes(x = yr_q, y = Prescription.count)) + geom_line(aes(colour=product_name), size = 1.4) +
xlab("2020-2022") +
ylab("Number of Prescriptions") +
labs(colour = "Generic vs Name Brand") +
theme_bw()
问题是,我需要 y 轴不是处方数量,而是该时间段内占总数的比例。即对于它应该代表的 tecfidera 系列(# 该时间段内的 tecfidera 处方/tecfidera 的总数 + 二甲基 f)。
如果您能帮我做类似的事情,但创建一个按州和年份细分比例(即 tecfidera/总 tec+ 二甲基)的表格,则加分。(我有状态变量)
我在想也许是某种类型的,但我认为这并不能解决问题。mutate(dummy.product_name = case_when(product_name == 'tecfidera' ~ 1, product_name == "dimethyl f" ~ 0))
或者折线图是错误的选择,我正在玩弄:
ggplot(alldrugs, aes( x = year, y = number_of_prescriptions, fill = product_name)) +
+ geom_bar(stat = "identity", width = .5, position = "dodge") +
+ facet_grid(~year)
但这真的很丑陋,尽管我认为它有潜力。
非常感谢您的任何帮助!
这是我认为是 25 个观察结果的子集的 dputstructure(list(state = c("AK", "CA", "OR", "MN", "NY", "UT", "AK", "AK", "CA", "NJ", "AK", "NY", "AK", "AK", "AK", "AK", "AK", "AK", "SC", "NC", "NM", "AK", "AK", "AK", "CA"), year = c("2020", "2020", "2020", "2020", "2021", "2021", "2021", "2021", "2022", "2022", "2022", "2022", "2021", "2020", "2021", "2021", "2020", "2020", "2021", "2021", "2021", "2021", "2022", "2022", "2022" ), quarter = c("1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3", "4", "1", "4", "1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3"), product_name = c("GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "GILENYA ", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F", "DIMETHYL F"), number_of_prescriptions = c("10", "1", "4", "6", "1", "7", "2", "9", "3", "7", "6", "4", "3", "2", "4", "9", "8", "2", "1", "10", "6", "9", "10", "3", "8")), row.names = c(NA, 25L), class = "data.frame")
https://docs.google.com/spreadsheets/d/1vNfqdm5gPXL21PXjkPe512ZFcBzU-qUI7baQACRO7ag/edit#gid=0
也链接到数据的电子表格。
答:
alldrugs %>%
count(product_name, yr_q = lubridate::yq(paste(year, quarter)),
wt = as.numeric(number_of_prescriptions)) %>%
mutate(share = n / sum(n), .by = yr_q) %>%
ggplot(aes(x = yr_q, y = share)) +
geom_line(aes(colour=product_name), size = 1.4) +
xlab("2020-2022") +
ylab("Number of Prescriptions") +
labs(colour = "Generic vs Name Brand") +
theme_bw()
评论
dput(alldrugs)
dput(head(alldrugs, 20))
yq
dint