如何在 R 中分隔自由文本字段时保持组关联-解网

问：

我正在尝试将自由文本字段分离出来，直到单个单词/短语，同时还要保持它们与组的关联，以便我以后可以在图形中分层

这是我的原始代码。我正在尝试添加一个“年份”变量，这样我就可以根据学生所在的年级对不同的研究兴趣进行分层。我希望每个单词都有一个总的 n，以及每年的 n

我的数据集示例：

请列出.your.research.interest	年
疫苗、结核病、艾滋病毒	第一年
结核病，慢性病	第二年

library(tidyverse)
library(tidytext)

data_research_words <- unlist(strsplit(data_research$Please.list.your.research.interests, ", "))

text_df <- tibble(line=1:97, data_research_words)

text_count <- text_df %>% 
  count(data_research_words, sort=TRUE)

R 文本 tidyverse tidytext

library(tidyverse)

# split on commas, to create a separate row for each list element
df <- df |>
  separate_longer_delim("Please.list.your.research.interests", ", ")

# then get the count for each research interest
df |> count(Please.list.your.research.interests)

# ...and the same, but separated also by years
df |> count(Year, Please.list.your.research.interests)

输出：

  Please.list.your.research.interests n
1                    Chronic Diseases 1
2                                 HIV 1
3                                  TB 2
4                            Vaccines 1

      Year Please.list.your.research.interests n
1 1st year                                 HIV 1
2 1st year                                  TB 1
3 1st year                            Vaccines 1
4 2nd year                    Chronic Diseases 1
5 2nd year                                  TB 1

上一个：缺少值组合的完整数据帧

下一个：R tidyverse - 如何创建对列求和的行

如何在 R 中分隔自由文本字段时保持组关联

How to Keep Group Associated when Separating Free Text Fields in R

评论