R 中频率表中的四舍五入-解网

问：

我想知道是否有精通 R/RMarkdown 的人能够指导我解决问题。我正在寻找一个频率表，到目前为止，我一直在使用 arsenal 包的 tableby，因为它可以轻松方便地集成到 RMarkdown docx/html 中。然而，我被要求提供四舍五入的频率（最接近的 5 或 10），并一直在试图找到方法来做到这一点，但没有取得多大成功。

我生成了一个虚假的简单数据集，因为出于保密原因我无法共享我的数据，这就是我做普通表格的方式。

set.seed(1234)

library(dplyr)
library(arsenal)

x1 <- c(rep("Man",40),rep("Woman",60)) %>% as.factor()
x2 <- sample(c("Sick","Healthy"),100,replace=TRUE) %>% as.factor()

df <- data.frame(x1,x2)

Control_notrounded <- tableby.control(digits=0,digits.pct=2,cat.stats=c("countpct","Nmiss2"))

table <- tableby(x1~x2,control=Control_notrounded,data=df)
print(summary(table))

但是，即使使用传统的舍入函数通过传递 digits=-1 来四舍五入到最接近的 10，这似乎也不是该函数的有效方法，因为我收到一条警告，指示数字必须为 >=0。

Control_rounded <- tableby.control(digits=-1,digits.pct=2,cat.stats=c("countpct","Nmiss2"))
table2 <- tableby(x1~x2,control=Control_rounded,data=df)
print(summary(table2))

有什么办法可以做到吗？否则，会有人有一个替代包，允许创建具有四舍五入值的相对简单的频率表吗？

r-markdown 频率分布

set.seed(1234)
library(dplyr)
library(gtsummary)
library(stringr)

x1 <- c(rep("Man",40),rep("Woman",60)) %>% as.factor()
x2 <- sample(c("Sick","Healthy"),100,replace=TRUE) %>% as.factor()
df <- data.frame(x1,x2)

round_5_gtsummary <- function(table) {
  round_5 <- function(x) { round(x/5)*5 }
  round_5_get_summary <- function(x, N, decimals = 1) {
    x <- stringr::str_remove(x, " \\([<]*[0-9]*[,]*[0-9]*[.]*[0-9]*%\\)$")
    x <- as.numeric(stringr::str_remove(x, ","))
    if (x > N-5) {
      N <- round_5(N)
      return(paste0(">", N-5, "(>", round((N-5)/N*100, decimals), "%)"))
    } else if (x >= 5) {
      return(paste0(round_5(x), " (", round(round_5(x)/round_5(N)*100,decimals),"%)"))
    } else {
      return(paste0("<", 5," (<", round(5/round_5(N)*100,decimals),"%)"))
    }
  }
  body <- table$table_body
  stats_column_indices <- which(grepl("^stat_", colnames(body)))
  Ns <- table$table_styling$header$modify_stat_n[c(stats_column_indices)]
  table$table_styling$header$label[c(stats_column_indices)] <- paste0("**", table$table_styling$header$modify_stat_level[c(stats_column_indices)], "**", ", N = ", round_5(Ns))
  for (column_no in stats_column_indices) {
    column <- pull(body, column_no)
    cat_indices <- (body$var_type == "categorical" | body$var_type == "dichotomous" | body$label == "Unknown") & !is.na(body$stat_1)
    N <- table$table_styling$header$modify_stat_n[column_no]
    column[cat_indices] <- sapply(column[cat_indices], round_5_get_summary, N = N)
    table$table_body[column_no] <- column
  }
  return(table)
}

df %>% tbl_summary(by = "x1") %>% 
  add_overall(last = TRUE) %>% 
  round_5_gtsummary()  %>%
  add_p()

结果：

更新：带权重

在这里，我使用 WeightIt 来制作一些权重：

library(WeightIt)
df$w <- weightit(x1~x2, data = df, estimand = "ATT", focal = "Man")$weights

使用 survey 创建 svydesign 对象。然后应用于：tbl_svysummary()

library(survey)
df %>% survey::svydesign(~1, data = ., weights = ~w) %>%
  tbl_svysummary(by = "x1", include=c(x2)) %>%
  add_overall(last = TRUE) %>%
  round_5_gtsummary() %>%
  add_p()

更新2：基于DANIEL SJÖBERG的回应

若要使用内置参数分别舍入计数和百分比，可以执行以下操作：tbl_summary(digits=)

library(gtsummary)
library(dplyr)
set.seed(1234)

round_5 <- function(vec) {
  fun <- function(x) {
    if (x < 1) { return(round(x*100/5)*5)
    } else { return(round(x/5)*5) }
  }
  vec <- purrr::map_vec(vec, .f = fun)
}

df <- data.frame(
  x1 = c(rep("Man", 40), rep("Woman", 60)) %>% as.factor(),
  x2 = sample(c("Sick", "Healthy"), 100, replace = TRUE) %>% as.factor()
)

df %>% 
  tbl_summary(
    by = "x1",
    digits = all_categorical() ~ round_5
  ) %>% 
  add_overall(last = TRUE) %>% 
  add_p()

结果：

请注意，此版本不会在四舍五入计数后重新计算百分比;相反，它只是将两者分开四舍五入。

R 中频率表中的四舍五入

Rounding in frequency tables in R

评论

评论