提问人:bvowe 提问时间:7/29/2022 最后编辑:Zheyuan Libvowe 更新时间:7/29/2022 访问量:246
dplyr:将宽整为长 [复制]
dplyr: reshape wide to long [duplicate]
问:
HAVE = data.frame("STUDENT"=c(1, 2, 3),
"CLASS"=c('A', 'B', 'C'),
"SCORE1"=c(50, 79, 61),
"SCORE2"=c(74, 100, 70),
"SCORE3"=c(78, 65, 87),
"TEST1"=c(80, 96, 93),
"TEST2"=c(59, 57, 89),
"TEST3"=c(63, 53, 92))
WANT = data.frame("STUDENT"=c(1, 1, 1, 2, 2, 2, 3, 3, 3),
"CLASS"=c('A','A','A','B','B','B','C','C','C'),
"SEMESTER"=c(1, 2, 3, 1, 2, 3, 1, 2, 3),
"SCORE"=c(50, 74, 78, 79, 100, 65, 61, 70, 87),
"TEST"=c(80, 59, 63, 96, 57, 53, 93, 89, 92))
试验-
WANT = tidyr::pivot_longer(HAVE, cols = -c("STUDENT", "CLASS"), names_to = c('SEMESTER', '.value'),
names_prefix = c("SCORE", "TEST"))
答:
4赞
akrun
7/29/2022
#1
我们需要 or 在列名中找到分隔符。在这里,列名应该在非数字 () 和数字 () 之间拆分 - 我们为此使用正则表达式环视(或用于捕获字符)names_sep
names_pattern
\\D
\\d
names_pattern = "^(\\D+)(\\d+)$")
library(tidyr)
pivot_longer(HAVE, cols = -c(STUDENT, CLASS),
names_to = c(".value", "SEMESTER"), names_sep = "(?<=\\D)(?=\\d)")
-输出
# A tibble: 9 × 5
STUDENT CLASS SEMESTER SCORE TEST
<dbl> <chr> <chr> <dbl> <dbl>
1 1 A 1 50 80
2 1 A 2 74 59
3 1 A 3 78 63
4 2 B 1 79 96
5 2 B 2 100 57
6 2 B 3 65 53
7 3 C 1 61 93
8 3 C 2 70 89
9 3 C 3 87 92
3赞
TarJae
7/29/2022
#2
这里有正则表达式和正则表达式:names_pattern
(\\w+)(\\d)
\\w+
...一个或多个 (+) 字字符。单词字符是字母、数字或下划线。这组字符也可以由正则表达式字符集 [a-zA-Z0-9_] 表示- '\d'...一位数
tidyr::pivot_longer(HAVE,
cols = -c(STUDENT, CLASS),
names_to = c('.value', 'Semester'),
names_pattern = '(\\w+)(\\d)')
STUDENT CLASS Semester SCORE TEST
<dbl> <chr> <chr> <dbl> <dbl>
1 1 A 1 50 80
2 1 A 2 74 59
3 1 A 3 78 63
4 2 B 1 79 96
5 2 B 2 100 57
6 2 B 3 65 53
7 3 C 1 61 93
8 3 C 2 70 89
9 3 C 3 87 92
下一个:R dplyr 添加行总数
评论