在 R 中处理细微的“选择所有适用项”问题

Dealing with nuanced 'Select All That Apply' question in R

提问人:sdS 提问时间:10/25/2023 最后编辑:jpsmithsdS 更新时间:10/25/2023 访问量:44

问:

我正在尝试从“选择所有适用项”问题中操作一列。因此,条目的长度因行/受访者而异。除一个(“跟踪时间/顺序”)外,所有响应选项后面都带有括号和该响应选项的唯一标识符(请参阅下面的代码)。为了说明这一点,我有两个问题,一个是关于优势的,另一个是关于某个学习工具的挑战。

df <- data.frame(ID = 1:6, response_strength = c("Language (L) Attention (A)", "Movement Control (MC)", "Language (L) Getting Along with Others (G) Attention (A) Memory (M)", "Memory (M) Complex Thinking (C) Spatial Thinking (S)", "Memory (M) Spatial Thinking (S)", "Language (L) Attention (A)"), response_challenge = c("Movement Control (MC)", Language (L) Attention (A)", "Complex Thinking (C)", "Attention (A)", "Getting Along with Others (G) Keeping Track of Time/Order", "Keeping Track of Time/Order Movement Control (MC)"))

我的目标是转换为长格式并有一个输出表,显示选择给定响应选项的百分比,如下所示: (请注意:以下代码是为了说明目的而创建的,因此百分比不准确)

df2 <- data.frame(survey_question = c("response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge"), response = c("Movement Control (MC)", "Language (L)", "Attention (A)", "Getting Along with Others (G)", "Complex Thinking (C)", "Spatial Thinking (S)","Keeping Track of Time/Order", "Movement Control (MC)", "Language (L)", "Attention (A)",                                                                                            "Getting Along with Others (G)", "Complex Thinking (C)", "Spatial Thinking (S)", "Keeping Track of Time/Order"), n = c(1, 2, 4, 5, 3, 1, 2, 1, 2, 4, 5, 3, 1, 2),                                                                                     percent = c(.33, .67, 1.0, .33, .67, 1.0, .33, .67, 1.0, .33, .67, 1.0, .33, .67))

输出

survey_question                      response n percent
1   response_strength         Movement Control (MC) 1    0.33
2   response_strength                  Language (L) 2    0.67
3   response_strength                 Attention (A) 4    1.00
4   response_strength Getting Alone with Others (G) 5    0.33
5   response_strength          Complex Thinking (C) 3    0.67
6   response_strength          Spatial Thinking (S) 1    1.00
7   response_strength   Keeping Track of Time/Order 2    0.33
8  response_challenge         Movement Control (MC) 1    0.67
9  response_challenge                  Language (L) 2    1.00
10 response_challenge                 Attention (A) 4    0.33
11 response_challenge Getting Alone with Others (G) 5    0.67
12 response_challenge          Complex Thinking (C) 3    1.00
13 response_challenge          Spatial Thinking (S) 1    0.33
14 response_challenge   Keeping Track of Time/Order 2    0.67

我只是被困在最好的前进道路上。任何帮助都是值得赞赏的!

r 数据操作 stringr gsub

评论

0赞 jpsmith 10/25/2023
你在代码中的什么地方被卡住了?如果您编辑问题以包含代码,我们可以帮助您进行故障排除。祝你好运!
0赞 sdS 10/25/2023
不是我被卡住了,而是我不知道从哪里开始。代码块只是为了帮助读者可视化我正在处理的数据以及我希望输出的外观!

答:

1赞 Mark 10/25/2023 #1

以下是我的做法,使用向后看,并且:pivot_longer()mutate()

# load package
library(tidyverse)

# first, we make the response_strength and response_challenge columns longer, making a "question" and "response" column
# str_remove removes the "response_" bit at the beginning which serves no purpose
df |> pivot_longer(-ID, names_to = "question", values_to = "response", names_transform = \(x) str_remove(x, "response_")) |> 

  # next we split the values, looking behind for a closing bracket, or the word Order
  # Since this isn't your real data, you may have to edit this to make it work with the real code
  mutate(response = str_split(response, "(?<=\\)|Order) ")) |>

  # turn each response into it's own row
  unnest_longer(response) |>

  # create the n column and percent column
  mutate(n = n(), percent = n / sum(n), .by = c(ID, question))

输出:

# A tibble: 23 × 5
      ID question  response                          n percent
   <int> <chr>     <chr>                         <int>   <dbl>
 1     1 strength  Language (L)                      2    0.5 
 2     1 strength  Attention (A)                     2    0.5 
 3     1 challenge Movement Control (MC)             1    1   
 4     2 strength  Movement Control (MC)             1    1   
 5     2 challenge Language (L)                      2    0.5 
 6     2 challenge Attention (A)                     2    0.5 
 7     3 strength  Language (L)                      4    0.25
 8     3 strength  Getting Along with Others (G)     4    0.25
 9     3 strength  Attention (A)                     4    0.25
10     3 strength  Memory (M)                        4    0.25
# ℹ 13 more rows

笔记:

有关回望的更多信息,您可以在此处阅读有关它们的信息。 李