如何强制 0 计数摘要单元格并将总计添加到列和行（R tidyverse）-解网

问：

好的，下面是我的数据示例：

GRADE_LVL	COURSE_NAME	COURSE_CODE	STUDENT_GENDER	种族	结果
12	物理	03165	雄	白	通过
12	物理	03165	女性	白	通过
12	物理	03165	非二进制	黑人或非裔美国人	通过
9	代数 I	02052	女性	多种族	通过
10	代数 I	02052	女性	白	失败

我需要报告 3 种性别（男性、女性、非二元性别）和 7 个种族（西班牙裔或拉丁裔、美洲印第安人或阿拉斯加原住民、亚洲人、夏威夷原住民/其他太平洋岛民、黑人或非裔美国人、白人和多种族）。

我正在尝试在 R 中编写一个函数，该函数将为传递给它的一组参数生成人口统计事实表。我希望函数的输出产生如下所示的提示：

通过代数I的高中生

	西班牙的	亚裔	黑	白	多种族	总
雄	0	7	2	13	4	26
女性	1	3	1	12	3	20
非二进制	0	0	0	1	0	1
总	1	10	3	26	7	47

注意：这仅包含与上述示例数据片段中的虚拟值无关的虚拟值。我缩短了列名以节省屏幕上的空间。

这是我到目前为止拥有的代码：

data <- dbGetQuery('secrets')
highSchool = c('09', '10', '11', '12')
passingOnly <- quo(OUTCOME == 'Pass')
algebra1 <- quo(COURSE_CODE == '02052')
ethnicCategories <- factor(c(
                              'Hispanic or Latino', 
                              'American Indian or Alaska Native',
                              'Asian',
                              'Native Hawaiian/Other Pacific Islander',
                              'Black or African American',
                              'White',
                              'Multiracial'
                            ))
genderCategories <- factor(c('Female', 'Male', 'Nonbinary'))

demographicBreakout <- function(filterConditions, gradeLevels) {
      data %>%
      filter( {{ filterConditions }} ) %>%
      filter(GRADE_LVL %in% gradeLevels) %>%
      select(STUDENT_GENDER, ETHNIC_DESC) %>%
      group_by(STUDENT_GENDER, ETHNIC_DESC) %>%
      summarise(COUNT = n()) %>%
      pivot_wider(
        names_from = ETHNIC_DESC, 
        values_from = COUNT, 
        values_fill = 0
      ) %>%
      rename_at("STUDENT_GENDER", ~"Gender")
}

report <- demographicBreakout(
          filterConditions = !!quo(!!algebra1 & !!passingOnly),
          gradeLevels = highSchool
      )

此代码生成如下所示的提示：

性	西班牙的	亚裔	黑	白	多种族
女性	2	15	2	26	9
雄	12	23	1	43	11

到目前为止，这看起来不错，但即使计数为 0，我也需要在表中显示所有人口统计类别。我尝试在 and 语句之间将以下代码片段添加到我的函数中：demographicBreakoutsummarizepivot_wider

      complete(
        ETHNIC_DESC = ethnicCategories,
        STUDENT_GENDER = genderCategories, 
        fill = list(COUNT = 0)
      ) %>%

添加此代码会导致以下错误：

Error in `reframe()`:
ℹ In argument: `complete(data = pick(everything()), ..., fill = fill, explicit = explicit)`.
ℹ In group 1: `STUDENT_GENDER = Female`.
Caused by error in `dplyr::full_join()`:
! Join columns in `y` must be present in the data.
✖ Problem with `STUDENT_GENDER`.

我无法解决此错误。除了此语句之外，我还需要在行上运行类似操作，以便显示计数。最重要的是，我仍然需要添加行和列总数。complete()nonbinary

任何帮助克服我目前的障碍将不胜感激。

r tidyverse 总结

ethnicCategories <- c('Hispanic or Latino', 
                    'American Indian or Alaska Native',
                    'Asian',
                    'Native Hawaiian/Other Pacific Islander',
                    'Black or African American',
                    'White',
                    'Multiracial')
genderCategories <- c('Female', 'Male', 'Nonbinary')

demographicBreakout <- function(filterConditions, gradeLevels) {
  data |>
    filter({{filterConditions}} & GRADE_LVL %in% gradeLevels) |>
    mutate(Gender = factor(STUDENT_GENDER, levels = genderCategories),
            e = factor(ETHNIC_DESC, levels = ethnicCategories)) |>
    count(Gender, e, .drop = FALSE) |>
    pivot_wider(names_from = e, values_from = n)
}

例：

demographicBreakout(
          filterConditions = !!quo(COURSE_CODE == '02052'),
          gradeLevels = c('09', '10', '11', '12')
      ) |> print(width = Inf)

输出：

# A tibble: 3 × 8
  Gender    `Hispanic or Latino` `American Indian or Alaska Native` Asian
  <fct>                    <int>                              <int> <int>
1 Female                       0                                  0     0
2 Male                         0                                  0     0
3 Nonbinary                    0                                  0     0
  `Native Hawaiian/Other Pacific Islander` `Black or African American` White
                                     <int>                       <int> <int>
1                                        0                           0     1
2                                        0                           0     0
3                                        0                           0     0
  Multiracial
        <int>
1           0
2           0
3           0

谢谢！这奏效了！在阅读了您使用的函数的文档后，我是否正确地理解该函数仅自动返回 and 参数中提到的列，从而消除了对我的语句的需要？我也不明白您提供的语句中的值从何而来。pivot_wider()names_fromvalues_fromselect()npivot_widervalues_from = n

0赞 Mark 11/14/2023

回复：“pivot_wider（）函数自动仅返回 names_from 和 values_from 参数中提到的列，从而消除了对我的 select（）语句的需要” < - 不完全是！pivot_wider通常会保留所有相同的数据。您可以使用 id_cols 参数让它删除您不使用的列

1赞 Mark 11/14/2023

回复：N从何而来？count（）函数的默认值是要调用的 count 列。如果需要，您可以更改名称，就像 stefan 的代码一样，但这不是必需的n

0赞 Mark 11/14/2023

请参见：dplyr.tidyverse.org/reference/count.html#ref-examples

1赞 Mark 11/14/2023

上一个：如何将基本 R 语法转换为 dplyr/tidyverse

下一个：无法使用 tidyverse 在 R 中过滤和绘制过去六天的数据

如何强制 0 计数摘要单元格并将总计添加到列和行（R tidyverse）

How to force 0-count summary cells and add totals to columns and rows (R tidyverse)

评论

评论

如何强制 0 计数摘要单元格并将总计添加到列和行 （R tidyverse）

How to force 0-count summary cells and add totals to columns and rows (R tidyverse)

评论

评论

如何强制 0 计数摘要单元格并将总计添加到列和行（R tidyverse）