提问人:atawfik14 提问时间:11/3/2023 最后编辑:jblood94atawfik14 更新时间:11/7/2023 访问量:58
在 R 向量中查找重复的大量元素
Finding repeated bulk of elements in R vectors
问:
我想知道是否有一种简单的方法可以在多个向量(两个或多个)中找到所有重复的向量元素(一个或多个)。
例如,给定以下字符向量:
France <- c("John", "Michael" , "Bob", "Oliver")
Italy <- c("Hugo", "Oliver", "Soren", "Jackson", "John")
Spain <- c("Joseph", "David", "Bob", "John", "Leo", "Michael")
Austria <- c("Oliver", "Giovanni", "Luka")
我想说的是:
“约翰和奥利弗前往法国和意大利”,以及 “John、Michael 和 Bob 前往法国和西班牙”,以及 “奥利弗去过法国、意大利和奥地利”
我正在考虑使用二进制矩阵来解析它,使用行作为名称和列作为国家/地区,然后遍历,但我认为它的计算成本很高,我正在寻找一种使用可用方法的有效方法。
答:
1赞
MrFlick
11/3/2023
#1
您可以在 igraph 库的一些帮助下执行此操作。例如,首先让我们重塑数据
France <- c("John", "Michael" , "Bob", "Oliver")
Italy <- c("Hugo", "Oliver", "Soren", "Jackson", "John")
Spain <- c("Joseph", "David", "Bob", "John", "Leo", "Michael")
Austria <- c("Oliver", "Giovanni", "Luka")
pairs <- stack(tibble::lst(France, Italy, Spain, Austria))
head(pairs)
# values ind
# 1 John France
# 2 Michael France
# 3 Bob France
# 4 Oliver France
# 5 Hugo Italy
# 6 Oliver Italy
然后让图表提供帮助
library(igraph)
adjm <- pairs |>
graph_from_data_frame(directed=FALSE) |>
as_adjacency_matrix()
adjm[rownames(adjm) %in% pairs$values, colnames(adjm) %in% pairs$ind]
# France Italy Spain Austria
# John 1 1 1 0
# Michael 1 0 1 0
# Bob 1 0 1 0
# Oliver 1 1 0 1
# Hugo 0 1 0 0
# Soren 0 1 0 0
# Jackson 0 1 0 0
# Joseph 0 0 1 0
# David 0 0 1 0
# Leo 0 0 1 0
# Giovanni 0 0 0 1
# Luka 0 0 0 1
1赞
Onyambu
11/3/2023
#2
my_list = list(France =France, Italy = Italy, Spain = Spain, Austria = Austria)
table(stack(my_list))
ind
values France Italy Spain Austria
Bob 1 0 1 0
David 0 0 1 0
Giovanni 0 0 0 1
Hugo 0 1 0 0
Jackson 0 1 0 0
John 1 1 1 0
Joseph 0 0 1 0
Leo 0 0 1 0
Luka 0 0 0 1
Michael 1 0 1 0
Oliver 1 1 0 1
Soren 0 1 0 0
1赞
jblood94
11/3/2023
#3
对于诸如“约翰和奥利弗前往法国和意大利”之类的陈述,您正在寻找最大两边形的集合。
library(igraph)
trips <- list(France = France, Italy = Italy, Spain = Spain, Austria = Austria)
m <- 1L - table(stack(trips))
g <- complementer(graph_from_incidence_matrix(m))
cl <- max_cliques(g)
lapply(cl[vapply(cl, \(x) (s <- sum(x > nrow(m))) > 1L && s < length(x), FALSE)], sort)
#> [[1]]
#> + 4/16 vertices, named, from ffcf6aa:
#> [1] Oliver France Italy Austria
#>
#> [[2]]
#> + 4/16 vertices, named, from ffcf6aa:
#> [1] John Oliver France Italy
#>
#> [[3]]
#> + 4/16 vertices, named, from ffcf6aa:
#> [1] John France Italy Spain
#>
#> [[4]]
#> + 5/16 vertices, named, from ffcf6aa:
#> [1] Bob John Michael France Spain
评论