创建一个数据框,其中包含 2 个变量的成对比较,其中顺序在 R 中无关紧要

Create a data frame with pairwise comparisons of 2 variables where order don't matter in R

提问人:M. Beausoleil 提问时间:8/5/2023 最后编辑:ThomasIsCodingM. Beausoleil 更新时间:8/5/2023 访问量:29

问:

我有变量,我想进行所有成对比较,但删除比较相等的行(例如,“A”==“A”),并且只保留一个只有顺序变化的比较,所以保持“A”与“B”或“B”与“A”。

我有这个代码在R中执行此操作:

sp.all.var = c(LETTERS[1:10])
length(sp.all.var)^2


df.pairwise = expand.grid(sp.all.var,sp.all.var)

nrow(df.pairwise)

df.pairwise.sub1 = df.pairwise[df.pairwise$Var1!=df.pairwise$Var2,]

df.pairwise.sub1$compare = apply(df.pairwise.sub1, 1, function(x) paste(sort(x), collapse = "-"))
nrow(df.pairwise.sub1)

df.pairwise.sub2 = df.pairwise.sub1[!duplicated(df.pairwise.sub1$compare), ]

nrow(df.pairwise.sub2)

我想知道是否有一种方法可以以更简单的方式做到这一点(是否有内置功能可以做到这一点?是否有软件包?

R 比较 组合

评论


答:

2赞 jblood94 8/5/2023 #1

你可能想要.combn

combn(LETTERS[1:10], 2, paste, collapse = "-")
#>  [1] "A-B" "A-C" "A-D" "A-E" "A-F" "A-G" "A-H" "A-I" "A-J" "B-C" "B-D" "B-E"
#> [13] "B-F" "B-G" "B-H" "B-I" "B-J" "C-D" "C-E" "C-F" "C-G" "C-H" "C-I" "C-J"
#> [25] "D-E" "D-F" "D-G" "D-H" "D-I" "D-J" "E-F" "E-G" "E-H" "E-I" "E-J" "F-G"
#> [37] "F-H" "F-I" "F-J" "G-H" "G-I" "G-J" "H-I" "H-J" "I-J"

或者作为:data.frame

as.data.frame(t(combn(LETTERS[1:10], 2, \(x) c(x, paste(x, collapse = "-")))))
#>    V1 V2  V3
#> 1   A  B A-B
#> 2   A  C A-C
#> 3   A  D A-D
#> 4   A  E A-E
#> 5   A  F A-F
#> 6   A  G A-G
#> 7   A  H A-H
#> 8   A  I A-I
#> 9   A  J A-J
#> 10  B  C B-C
#> 11  B  D B-D
#> 12  B  E B-E
#> 13  B  F B-F
#> 14  B  G B-G
#> 15  B  H B-H
#> 16  B  I B-I
#> 17  B  J B-J
#> 18  C  D C-D
#> 19  C  E C-E
#> 20  C  F C-F
#> 21  C  G C-G
#> 22  C  H C-H
#> 23  C  I C-I
#> 24  C  J C-J
#> 25  D  E D-E
#> 26  D  F D-F
#> 27  D  G D-G
#> 28  D  H D-H
#> 29  D  I D-I
#> 30  D  J D-J
#> 31  E  F E-F
#> 32  E  G E-G
#> 33  E  H E-H
#> 34  E  I E-I
#> 35  E  J E-J
#> 36  F  G F-G
#> 37  F  H F-H
#> 38  F  I F-I
#> 39  F  J F-J
#> 40  G  H G-H
#> 41  G  I G-I
#> 42  G  J G-J
#> 43  H  I H-I
#> 44  H  J H-J
#> 45  I  J I-J
1赞 ThomasIsCoding 8/5/2023 #2

你也可以用rep + sequence

x <- LETTERS[1:10]
paste0(
    rep(x, (length(x) - 1):0), "-",
    x[sequence((length(x) - 1):0, from = 2:length(x))]
)

这给了

 [1] "A-B" "A-C" "A-D" "A-E" "A-F" "A-G" "A-H" "A-I" "A-J" "B-C" "B-D" "B-E"
[13] "B-F" "B-G" "B-H" "B-I" "B-J" "C-D" "C-E" "C-F" "C-G" "C-H" "C-I" "C-J"
[25] "D-E" "D-F" "D-G" "D-H" "D-I" "D-J" "E-F" "E-G" "E-H" "E-I" "E-J" "F-G"
[37] "F-H" "F-I" "F-J" "G-H" "G-I" "G-J" "H-I" "H-J" "I-J"