在 R 中处理细微的“选择所有适用”问题-解网

问：

我正在尝试操作“选择所有适用”问题中的一列。因此，条目的长度因行/受访者而异。除一个（“跟踪时间/顺序”）外，所有响应选项后跟括号和该响应选项的唯一标识符（请参阅下面的代码）。为了说明这一点，我有两个问题，一个是关于某个学习工具的优势，另一个是关于某种学习工具的挑战。

df <- data.frame(ID = 1:6, response_strength = c("Language (L) Attention (A)", "Movement Control (MC)", "Language (L) Getting Along with Others (G) Attention (A) Memory (M)", "Memory (M) Complex Thinking (C) Spatial Thinking (S)", "Memory (M) Spatial Thinking (S)", "Language (L) Attention (A)"), response_challenge = c("Movement Control (MC)", Language (L) Attention (A)", "Complex Thinking (C)", "Attention (A)", "Getting Along with Others (G) Keeping Track of Time/Order", "Keeping Track of Time/Order Movement Control (MC)"))

我的目标是转换为长格式，并有一个输出表，显示选择给定响应选项的百分比，如下所示：（请注意：以下代码是为说明目的而创建的，因此百分比将不准确)

df2 <- data.frame(survey_question = c("response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_strength", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge", "response_challenge"), response = c("Movement Control (MC)", "Language (L)", "Attention (A)", "Getting Along with Others (G)", "Complex Thinking (C)", "Spatial Thinking (S)","Keeping Track of Time/Order", "Movement Control (MC)", "Language (L)", "Attention (A)",                                                                                            "Getting Along with Others (G)", "Complex Thinking (C)", "Spatial Thinking (S)", "Keeping Track of Time/Order"), n = c(1, 2, 4, 5, 3, 1, 2, 1, 2, 4, 5, 3, 1, 2),                                                                                     percent = c(.33, .67, 1.0, .33, .67, 1.0, .33, .67, 1.0, .33, .67, 1.0, .33, .67))

输出

survey_question                      response n percent
1   response_strength         Movement Control (MC) 1    0.33
2   response_strength                  Language (L) 2    0.67
3   response_strength                 Attention (A) 4    1.00
4   response_strength Getting Alone with Others (G) 5    0.33
5   response_strength          Complex Thinking (C) 3    0.67
6   response_strength          Spatial Thinking (S) 1    1.00
7   response_strength   Keeping Track of Time/Order 2    0.33
8  response_challenge         Movement Control (MC) 1    0.67
9  response_challenge                  Language (L) 2    1.00
10 response_challenge                 Attention (A) 4    0.33
11 response_challenge Getting Alone with Others (G) 5    0.67
12 response_challenge          Complex Thinking (C) 3    1.00
13 response_challenge          Spatial Thinking (S) 1    0.33
14 response_challenge   Keeping Track of Time/Order 2    0.67

我只是被困在最好的前进道路上。任何帮助都是值得赞赏的！

R 数据操作 stringr gsub

# load package
library(tidyverse)

# first, we make the response_strength and response_challenge columns longer, making a "question" and "response" column
# str_remove removes the "response_" bit at the beginning which serves no purpose
df |> pivot_longer(-ID, names_to = "question", values_to = "response", names_transform = \(x) str_remove(x, "response_")) |> 

  # next we split the values, looking behind for a closing bracket, or the word Order
  # Since this isn't your real data, you may have to edit this to make it work with the real code
  mutate(response = str_split(response, "(?<=\\)|Order) ")) |>

  # turn each response into it's own row
  unnest_longer(response) |>

  # create the n column and percent column
  mutate(n = n(), percent = n / sum(n), .by = c(ID, question))

输出：

# A tibble: 23 × 5
      ID question  response                          n percent
   <int> <chr>     <chr>                         <int>   <dbl>
 1     1 strength  Language (L)                      2    0.5 
 2     1 strength  Attention (A)                     2    0.5 
 3     1 challenge Movement Control (MC)             1    1   
 4     2 strength  Movement Control (MC)             1    1   
 5     2 challenge Language (L)                      2    0.5 
 6     2 challenge Attention (A)                     2    0.5 
 7     3 strength  Language (L)                      4    0.25
 8     3 strength  Getting Along with Others (G)     4    0.25
 9     3 strength  Attention (A)                     4    0.25
10     3 strength  Memory (M)                        4    0.25
# ℹ 13 more rows

笔记：

有关回望的更多信息，您可以在此处阅读有关它们的信息。李

上一个：为什么找不到新的数据帧？

下一个：如何根据另一个 df 的条件填充 pandas df？

在 R 中处理细微的“选择所有适用”问题

Dealing with nuanced 'Select All That Apply' question in R

评论