提问人:sdS 提问时间:10/25/2023 最后编辑:jpsmithsdS 更新时间:10/25/2023 访问量:44
在 R 中处理细微的“选择所有适用项”问题
Dealing with nuanced 'Select All That Apply' question in R
问:
我正在尝试从“选择所有适用项”问题中操作一列。因此,条目的长度因行/受访者而异。除一个(“跟踪时间/顺序”)外,所有响应选项后面都带有括号和该响应选项的唯一标识符(请参阅下面的代码)。为了说明这一点,我有两个问题,一个是关于优势的,另一个是关于某个学习工具的挑战。
df <- data.frame(ID = 1:6, response_strength = c("Language (L) Attention (A)", "Movement Control (MC)", "Language (L) Getting Along with Others (G) Attention (A) Memory (M)", "Memory (M) Complex Thinking (C) Spatial Thinking (S)", "Memory (M) Spatial Thinking (S)", "Language (L) Attention (A)"), response_challenge = c("Movement Control (MC)", Language (L) Attention (A)", "Complex Thinking (C)", "Attention (A)", "Getting Along with Others (G) Keeping Track of Time/Order", "Keeping Track of Time/Order Movement Control (MC)"))
我的目标是转换为长格式并有一个输出表,显示选择给定响应选项的百分比,如下所示: (请注意:以下代码是为了说明目的而创建的,因此百分比不准确)
df2 <- data.frame(survey_question = c("response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge"), response = c("Movement Control (MC)", "Language (L)", "Attention (A)", "Getting Along with Others (G)", "Complex Thinking (C)", "Spatial Thinking (S)","Keeping Track of Time/Order", "Movement Control (MC)", "Language (L)", "Attention (A)", "Getting Along with Others (G)", "Complex Thinking (C)", "Spatial Thinking (S)", "Keeping Track of Time/Order"), n = c(1, 2, 4, 5, 3, 1, 2, 1, 2, 4, 5, 3, 1, 2), percent = c(.33, .67, 1.0, .33, .67, 1.0, .33, .67, 1.0, .33, .67, 1.0, .33, .67))
输出
survey_question response n percent
1 response_strength Movement Control (MC) 1 0.33
2 response_strength Language (L) 2 0.67
3 response_strength Attention (A) 4 1.00
4 response_strength Getting Alone with Others (G) 5 0.33
5 response_strength Complex Thinking (C) 3 0.67
6 response_strength Spatial Thinking (S) 1 1.00
7 response_strength Keeping Track of Time/Order 2 0.33
8 response_challenge Movement Control (MC) 1 0.67
9 response_challenge Language (L) 2 1.00
10 response_challenge Attention (A) 4 0.33
11 response_challenge Getting Alone with Others (G) 5 0.67
12 response_challenge Complex Thinking (C) 3 1.00
13 response_challenge Spatial Thinking (S) 1 0.33
14 response_challenge Keeping Track of Time/Order 2 0.67
我只是被困在最好的前进道路上。任何帮助都是值得赞赏的!
答:
1赞
Mark
10/25/2023
#1
以下是我的做法,使用向后看,并且:pivot_longer()
mutate()
# load package
library(tidyverse)
# first, we make the response_strength and response_challenge columns longer, making a "question" and "response" column
# str_remove removes the "response_" bit at the beginning which serves no purpose
df |> pivot_longer(-ID, names_to = "question", values_to = "response", names_transform = \(x) str_remove(x, "response_")) |>
# next we split the values, looking behind for a closing bracket, or the word Order
# Since this isn't your real data, you may have to edit this to make it work with the real code
mutate(response = str_split(response, "(?<=\\)|Order) ")) |>
# turn each response into it's own row
unnest_longer(response) |>
# create the n column and percent column
mutate(n = n(), percent = n / sum(n), .by = c(ID, question))
输出:
# A tibble: 23 × 5
ID question response n percent
<int> <chr> <chr> <int> <dbl>
1 1 strength Language (L) 2 0.5
2 1 strength Attention (A) 2 0.5
3 1 challenge Movement Control (MC) 1 1
4 2 strength Movement Control (MC) 1 1
5 2 challenge Language (L) 2 0.5
6 2 challenge Attention (A) 2 0.5
7 3 strength Language (L) 4 0.25
8 3 strength Getting Along with Others (G) 4 0.25
9 3 strength Attention (A) 4 0.25
10 3 strength Memory (M) 4 0.25
# ℹ 13 more rows
笔记:
有关回望的更多信息,您可以在此处阅读有关它们的信息。 李
上一个:为什么找不到新的数据帧?
评论