提问人:kartik trivedi 提问时间:11/6/2023 最后编辑:Markkartik trivedi 更新时间:11/7/2023 访问量:46
使用 group_by 功能显示每个类别的前 5 个关键字
Displaying top 5 keywords for every category using group_by function
问:
我正在尝试为我拥有以下代码的每个类别的产品在评论中找到前 5 个关键字
# Group by category and count keyword frequencies
keyword_counts <- filtered_data %>%
group_by(category, keyword) %>%
summarise(n = n()) %>%
arrange(desc(n))
# Find the top 5 keywords in each category
top_keywords_by_category <- keyword_counts %>%
group_by(category) %>%
top_n(5, wt = n) %>%
ungroup() # Ungroup the data
# Print the table
print(top_keywords_by_category)
提供此输出的
category keyword n
<chr> <chr> <int>
1 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… product 354
2 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… cable 277
3 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… chargi… 200
4 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… quality 179
5 Computers&Accessories|Accessories&Peripherals|Cables&Accessori… nice 147
6 Electronics|WearableTechnology|SmartWatches watch 129
7 Electronics|Mobiles&Accessories|Smartphones&BasicMobiles|Smart… phone 127
8 Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions tv 117
9 Electronics|WearableTechnology|SmartWatches product 102
10 Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions product 80
虽然我想要的结果
Category Computers&Accessories
Keyword n
1 Product 354
2 Cable 277
3 Chargi... 200
4 Quality 179
5 Nice 147
答:
0赞
r2evans
11/6/2023
#1
虽然这些数据无趣,但它应该向您展示如何使用 .tidyr::separate_rows
quux <- structure(list(category = c("Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Computers&Accessories|Accessories&Peripherals|Cables&Accessori…", "Electronics|WearableTechnology|SmartWatches", "Electronics|Mobiles&Accessories|Smartphones&BasicMobiles|Smart…", "Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions", "Electronics|WearableTechnology|SmartWatches", "Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions"),
keyword = c("product", "cable", "chargi…", "quality", "nice", "watch", "phone", "tv", "product", "product"),
n = c(354L, 277L, 200L, 179L, 147L, 129L, 127L, 117L, 102L, 80L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
library(dplyr)
quux %>%
tidyr::separate_rows(category, sep = "\\|") %>%
count(category, keyword) %>%
arrange(desc(n))
# # A tibble: 32 × 3
# category keyword n
# <chr> <chr> <int>
# 1 Electronics product 2
# 2 Accessories&Peripherals cable 1
# 3 Accessories&Peripherals chargi… 1
# 4 Accessories&Peripherals nice 1
# 5 Accessories&Peripherals product 1
# 6 Accessories&Peripherals quality 1
# 7 Cables&Accessori… cable 1
# 8 Cables&Accessori… chargi… 1
# 9 Cables&Accessori… nice 1
# 10 Cables&Accessori… product 1
# # ℹ 22 more rows
# # ℹ Use `print(n = ...)` to see more rows
从这里,您可以进行前 5 名的过滤和透视:
quux %>%
tidyr::separate_rows(category, sep = "\\|") %>%
count(category, keyword) %>%
slice_max(n = 5, order_by = n, with_ties = FALSE) %>%
tidyr::pivot_wider(names_from = category, values_from = n, values_fill = list(n = 0))
# # A tibble: 4 × 3
# keyword Electronics `Accessories&Peripherals`
# <chr> <int> <int>
# 1 product 2 1
# 2 cable 0 1
# 3 chargi… 0 1
# 4 nice 0 1
评论
filtered_data
dput(head(filtered_data, 25))
dput(.)