提问人:Randy Lai 提问时间:3/22/2014 最后编辑:BraiamRandy Lai 更新时间:11/7/2022 访问量:28148
如何在R中生成对象的排列或组合?
How to generate permutations or combinations of object in R?
问:
如何从对象生成对象序列?我正在寻找一种方法来进行排列或组合,有/没有替换,使用不同和非不同的项目(又名多集)。r
n
这与十二种方式有关。“独特”的解决方案可以以十二倍的方式包括在内,而“非独特”则不包括在内。
答:
编辑:我已经更新了答案以使用更有效的软件包安排
开始使用安排
Arrangements包含一些用于排列和组合的高效生成器和迭代器。已经证明,它的性能优于大多数现有的同类包装。可以在此处找到一些基准。arrangements
以下是上述问题的答案
# 1) combinations: without replacement: distinct items
combinations(5, 2)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 3
[6,] 2 4
[7,] 2 5
[8,] 3 4
[9,] 3 5
[10,] 4 5
# 2) combinations: with replacement: distinct items
combinations(5, 2, replace=TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
[6,] 2 2
[7,] 2 3
[8,] 2 4
[9,] 2 5
[10,] 3 3
[11,] 3 4
[12,] 3 5
[13,] 4 4
[14,] 4 5
[15,] 5 5
# 3) combinations: without replacement: non distinct items
combinations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "c"
# 4) combinations: with replacement: non distinct items
combinations(x = c("a", "b", "c"), k = 2, replace = TRUE) # as `freq` does not matter
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "b"
[5,] "b" "c"
[6,] "c" "c"
# 5) permutations: without replacement: distinct items
permutations(5, 2)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 1
[6,] 2 3
[7,] 2 4
[8,] 2 5
[9,] 3 1
[10,] 3 2
[11,] 3 4
[12,] 3 5
[13,] 4 1
[14,] 4 2
[15,] 4 3
[16,] 4 5
[17,] 5 1
[18,] 5 2
[19,] 5 3
[20,] 5 4
# 6) permutations: with replacement: distinct items
permutations(5, 2, replace = TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 1 5
[6,] 2 1
[7,] 2 2
[8,] 2 3
[9,] 2 4
[10,] 2 5
[11,] 3 1
[12,] 3 2
[13,] 3 3
[14,] 3 4
[15,] 3 5
[16,] 4 1
[17,] 4 2
[18,] 4 3
[19,] 4 4
[20,] 4 5
[21,] 5 1
[22,] 5 2
[23,] 5 3
[24,] 5 4
[25,] 5 5
# 7) permutations: without replacement: non distinct items
permutations(x = c("a", "b", "c"), freq = c(2, 1, 1), k = 2)
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "c"
[6,] "c" "a"
[7,] "c" "b"
# 8) permutations: with replacement: non distinct items
permutations(x = c("a", "b", "c"), k = 2, replace = TRUE) # as `freq` doesn't matter
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "b"
[6,] "b" "c"
[7,] "c" "a"
[8,] "c" "b"
[9,] "c" "c"
与其他套餐比较
与现有软件包相比,使用几乎没有什么优势。arrangements
整体框架:您不必为不同的方法使用不同的包。
这是非常有效的。有关一些基准,请参阅 https://randy3k.github.io/arrangements/articles/benchmark.html。
它内存效率高,能够生成所有 13 个!1 到 13 的排列,由于矩阵大小的限制,现有包将无法这样做。迭代器的方法允许用户逐个获取排列。
getnext()
生成的排列是按字典顺序排列的,某些用户可能需要。
评论
<package>::<function>
R* 中的组合数学片段
下面,我们研究具有生成组合和排列功能的软件包。如果我遗漏了任何包裹,请原谅我并发表评论,或者更好的是,编辑这篇文章。
分析大纲:
- 介绍
- 设置
- 组合
- 排列
- 多组
- 总结
- 记忆
在开始之前,我们注意到,一次选择 m 次替换不同与非色调项目的组合/排列是等效的。之所以如此,是因为当我们有替代品时,它不是特定的。因此,无论特定元素最初出现多少次,输出都将具有该元素的实例重复 1 到 m 次。
1. 引言
包:
gtools
combinat
multicool
partitions
RcppAlgos
arrangements
utils
我没有包括或因为它们并不是真正要解决这些类型的问题。我也没有包括更新,因为某些情况使我的计算机崩溃。permute
permutations
gRbase
|—————————————概览 —————————————-|
|_________________| gtools | combinat | multicool | partitions |
| comb rep | Yes | | | |
| comb NO rep | Yes | Yes | | |
| perm rep | Yes | | | |
| perm NO rep | Yes | Yes | Yes | Yes |
| perm multiset | | | Yes | Yes |
| comb multiset | | | | |
| accepts factors | | Yes | | |
| m at a time | Yes | Yes/No | | |
| general vector | Yes | Yes | Yes | |
| iterable | | | Yes | |
| parallelizable | | | | |
| multi-threaded | | | | |
| big integer | | | | |
|_________________| arrangements | RcppAlgos | utils |
| comb rep | Yes | Yes | |
| comb NO rep | Yes | Yes | Yes |
| perm rep | Yes | Yes | |
| perm NO rep | Yes | Yes | |
| perm multiset | Yes | Yes | |
| comb multiset | Yes | Yes | |
| accepts factors | Yes | Yes | Yes |
| m at a time | Yes | Yes | Yes |
| general vector | Yes | Yes | Yes |
| iterable | Yes | Yes | |
| parallelizable | Yes | Yes | |
| big integer | Yes | Yes | |
| multi-threaded | | Yes | |
任务 和 是指“一次 m 个”生成结果并重新排列“一般向量”的能力,而不是 。在实践中,我们通常关注的是找到一般向量的重排,因此在可能的情况下,下面的所有检查都将反映这一点。m at a time
general vector
1:n
2. 设置
所有基准测试均在 3 种不同的设置上运行。
- 2022 款 Macbook Air Apple M2 24 GB
- 2020 Macbook Pro i7 16 GB
- 2022 Windows Surface i5 16 GB
library(microbenchmark)
## print up to 4 digits to keep microbenchmark output tidy
options(digits = 4)
options(width = 90)
numThreads <- min(as.integer(RcppAlgos::stdThreadMax() / 2), 6)
numThreads
#> [1] 4
pkgs <- c("gtools", "combinat", "multicool", "partitions",
"RcppAlgos", "arrangements", "utils", "microbenchmark")
sapply(pkgs, packageVersion, simplify = FALSE)
#> $gtools
#> [1] '3.9.3'
#>
#> $combinat
#> [1] '0.0.8'
#>
#> $multicool
#> [1] '0.1.12'
#>
#> $partitions
#> [1] '1.10.7'
#>
#> $RcppAlgos
#> [1] '2.6.0'
#>
#> $arrangements
#> [1] '1.1.9'
#>
#> $utils
#> [1] '4.2.1'
#>
#> $microbenchmark
#> [1] '1.4.7'
列出的结果是从设置 #1(即 Macbook Air M2)中获得的。Macbook Pro 上的结果类似,但在 Windows 设置中,多线程效果较差。在某些情况下,在 Windows 安装程序上,串行执行速度更快。我们将使用该模式调用所有函数,因此不需要调用。package::function
library
3. 组合
首先,我们检查一次没有替换选择m的组合。
RcppAlgos
combinat
gtools
arrangements
utils
如何:
set.seed(13)
tVec1 <- sort(sample(100, 20))
m <- 10
t1 <- RcppAlgos::comboGeneral(tVec1, m) ## returns matrix with m columns
t3 <- combinat::combn(tVec1, m) ## returns matrix with m rows
t4 <- gtools::combinations(20, m, tVec1) ## returns matrix with m columns
identical(t(t3), t4) ## must transpose to compare
#> [1] TRUE
t5 <- arrangements::combinations(tVec1, m)
identical(t1, t5)
#> [1] TRUE
t6 <- utils::combn(tVec1, m) ## returns matrix with m rows
identical(t(t6), t4) ## must transpose to compare
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::comboGeneral(tVec1, m,
nThreads = numThreads),
cbRcppAlgosSer = RcppAlgos::comboGeneral(tVec1, m),
cbGtools = gtools::combinations(20, m, tVec1),
cbCombinat = combinat::combn(tVec1, m),
cbUtils = utils::combn(tVec1, m),
cbArrangements = arrangements::combinations(tVec1, m),
unit = "relative"
)
#> Warning in microbenchmark(cbRcppAlgosPar = RcppAlgos::comboGeneral(tVec1, : less accurate
#> nanosecond times to avoid potential integer overflows
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.0000 100
#> cbRcppAlgosSer 2.712 2.599 1.280 2.497 2.477 0.2666 100
#> cbGtools 739.325 686.803 271.623 679.894 661.560 11.7178 100
#> cbCombinat 173.836 166.265 76.735 176.052 191.945 4.9075 100
#> cbUtils 179.895 170.345 71.639 169.661 178.385 3.8900 100
#> cbArrangements 2.717 2.995 1.975 2.835 2.935 0.8195 100
现在,我们一次检查具有替换选择 m 的组合。
RcppAlgos
gtools
arrangements
如何:
set.seed(97)
tVec2 <- sort(rnorm(12))
m <- 9
t1 <- RcppAlgos::comboGeneral(tVec2, m, repetition = TRUE)
t3 <- gtools::combinations(12, m, tVec2, repeats.allowed = TRUE)
identical(t1, t3)
#> [1] TRUE
t4 <- arrangements::combinations(tVec2, m, replace = TRUE)
identical(t1, t4)
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::comboGeneral(tVec2, m, TRUE,
nThreads = numThreads),
cbRcppAlgosSer = RcppAlgos::comboGeneral(tVec2, m, TRUE),
cbGtools = gtools::combinations(12, m, tVec2, repeats.allowed = TRUE),
cbArrangements = arrangements::combinations(tVec2, m, replace = TRUE),
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.0000 1.000 1.0000 1.0000 100
#> cbRcppAlgosSer 1.987 2.382 0.9173 2.347 1.2776 0.9733 100
#> cbGtools 670.126 517.561 103.8726 523.135 177.0940 12.0440 100
#> cbArrangements 2.300 2.582 0.8294 2.542 0.9212 1.0089 100
4. 排列
首先,我们检查一次没有替换选择m的排列。
RcppAlgos
gtools
arrangements
如何:
tVec3 <- as.integer(c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29))
t1 <- RcppAlgos::permuteGeneral(tVec3, 6)
t3 <- gtools::permutations(10, 6, tVec3)
identical(t1, t3)
#> [1] TRUE
t4 <- arrangements::permutations(tVec3, 6)
identical(t1, t4)
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::permuteGeneral(tVec3, 6,
nThreads = numThreads),
cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec3, 6),
cbGtools = gtools::permutations(10, 6, tVec3),
cbArrangements = arrangements::permutations(tVec3, 6),
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.0000 100
#> cbRcppAlgosSer 1.204 1.553 1.522 1.523 1.509 0.5722 100
#> cbGtools 357.078 308.978 288.396 301.611 292.862 64.8564 100
#> cbArrangements 2.356 2.361 2.183 2.292 2.224 0.4605 100
接下来,我们检查不用一般向量替换的排列(返回所有排列)。
RcppAlgos
gtools
combinat
multicool
arrangements
如何:
tVec3Prime <- tVec3[1:9]
## For RcppAlgos, arrangements, & gtools (see above)
t4 <- combinat::permn(tVec3Prime) ## returns a list of vectors
## convert to a matrix
t4 <- do.call(rbind, t4)
t5 <- multicool::allPerm(multicool::initMC(tVec3Prime)) ## returns a matrix with n columns
all.equal(t4[do.call(order,as.data.frame(t4)),],
t5[do.call(order,as.data.frame(t5)),])
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::permuteGeneral(tVec3Prime,
nThreads = numThreads),
cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec3Prime),
cbGtools = gtools::permutations(9, 9, tVec3Prime),
cbCombinat = combinat::permn(tVec3Prime),
cbMulticool = multicool::allPerm(multicool::initMC(tVec3Prime)),
cbArrangements = arrangements::permutations(tVec3Prime),
times = 25,
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.0 25
#> cbRcppAlgosSer 1.555 2.187 2.616 2.190 2.274 10.3 25
#> cbGtools 1913.125 1850.589 1893.918 1877.707 1915.601 2124.5 25
#> cbCombinat 508.360 512.182 562.042 532.123 595.722 715.3 25
#> cbMulticool 103.061 103.694 128.480 118.169 127.118 208.3 25
#> cbArrangements 3.216 3.583 13.566 3.544 3.561 139.2 25
现在,我们检查排列,而不替换(返回所有排列)。1:n
RcppAlgos
gtools
combinat
multicool
partitions
arrangements
如何:
t1 <- partitions::perms(9) ## returns an object of type 'partition' with n rows
identical(t(as.matrix(t1)), RcppAlgos::permuteGeneral(9))
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::permuteGeneral(9, nThreads = numThreads),
cbRcppAlgosSer = RcppAlgos::permuteGeneral(9),
cbGtools = gtools::permutations(9, 9),
cbCombinat = combinat::permn(9),
cbMulticool = multicool::allPerm(multicool::initMC(1:9)),
cbPartitions = partitions::perms(9),
cbArrangements = arrangements::permutations(9),
times = 25,
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.000 25
#> cbRcppAlgosSer 1.583 2.218 2.587 2.261 2.591 1.814 25
#> cbGtools 2010.303 1855.443 1266.853 1898.458 1903.055 217.422 25
#> cbCombinat 511.196 515.197 392.252 533.737 652.125 86.305 25
#> cbMulticool 108.152 103.188 80.469 108.227 120.804 23.504 25
#> cbPartitions 6.139 6.018 7.167 5.993 6.403 9.446 25
#> cbArrangements 4.089 3.797 3.135 3.791 3.760 1.858 25
最后,我们用替换来检查排列。
RcppAlgos
gtools
arrangements
如何:
t1 <- RcppAlgos::permuteGeneral(tVec3, 5, repetition = TRUE)
t3 <- gtools::permutations(10, 5, tVec3, repeats.allowed = TRUE)
t4 <- arrangements::permutations(x = tVec3, k = 5, replace = TRUE)
identical(t1, t3)
#> [1] TRUE
identical(t1, t4)
#> [1] TRUE
考虑到到目前为止的结果,下一个基准有点令人惊讶。
microbenchmark(
cbRcppAlgosPar = RcppAlgos::permuteGeneral(
tVec3, 5, TRUE, nThreads = numThreads
),
cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec3, 5, TRUE),
cbGtools = gtools::permutations(10, 5, tVec3, repeats.allowed = TRUE),
cbArrangements = arrangements::permutations(tVec3, 5, replace = TRUE),
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.000 100
#> cbRcppAlgosSer 1.307 1.669 1.465 1.561 1.513 1.015 100
#> cbGtools 6.364 6.188 5.448 5.762 5.434 1.625 100
#> cbArrangements 2.584 2.442 1.824 2.265 2.135 0.117 100
这不是错别字...... 几乎和其他编译的函数一样快。我鼓励读者去看看它的源代码,因为它是最优雅的编程展示之一(或其他)。gtools::permutations
gtools::permutations
R
5. 多组
首先,我们检查多集的组合。
RcppAlgos
arrangements
要查找多集的组合/排列,请使用参数指定源向量 的每个元素 , 重复的次数。RcppAlgos
freqs
v
set.seed(496)
myFreqs <- sample(1:5, 12, replace = TRUE)
## This is how many times each element will be repeated
tVec4 <- as.integer(c(1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233))
t1 <- RcppAlgos::comboGeneral(tVec4, 12, freqs = myFreqs)
t3 <- arrangements::combinations(tVec4, 12, freq = myFreqs)
identical(t1, t3)
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::comboGeneral(
tVec4, 12, freqs = myFreqs, nThreads = numThreads
),
cbRcppAlgosSer = RcppAlgos::comboGeneral(tVec4, 12, freqs = myFreqs),
cbArrangements = arrangements::combinations(tVec4, 12, freq = myFreqs),
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.0000 100
#> cbRcppAlgosSer 3.197 3.012 2.003 2.831 2.681 0.1658 100
#> cbArrangements 9.391 7.830 4.901 7.252 6.731 0.3140 100
对于一次选择 m 的多集排列,我们有:
RcppAlgos
arrangements
如何:
set.seed(123)
tVec5 <- sort(runif(5))
t1 <- RcppAlgos::permuteGeneral(tVec5, 8, freqs = rep(4, 5))
t3 <- arrangements::permutations(tVec5, 8, freq = rep(4, 5))
identical(t1, t3)
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::permuteGeneral(
tVec5, 8, freqs = rep(4, 5), nThreads = numThreads
),
cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec5, 8, freqs = rep(4, 5)),
cbArrangements = arrangements::permutations(tVec5, 8, freq = rep(4, 5)),
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.000 100
#> cbRcppAlgosSer 3.336 3.326 2.990 3.330 3.517 2.106 100
#> cbArrangements 3.751 3.746 3.346 3.757 3.840 2.305 100
对于返回所有排列的多集排列,我们有:
RcppAlgos
multicool
partitions
arrangements
如何:
tVec6 <- (1:5)^3
## For multicool, you must have the elements explicitly repeated
tVec6Prime <- rep(tVec6, times = rep(2, 5))
## for comparison
t1 <- RcppAlgos::permuteGeneral(tVec6, freqs = rep(2, 5))
t2 <- partitions::multiset(tVec6Prime)
t3 <- multicool::allPerm(multicool::initMC(tVec6Prime))
t4 <- arrangements::permutations(tVec6, freq = rep(2, 5))
## the package partitions, returns class of integer
## whereas RcppAlgos preserves class of tVec6 (i.e. numeric)
all.equal(t1, t(as.matrix(t2)))
#> [1] TRUE
identical(t1[do.call(order,as.data.frame(t1)),],
t3[do.call(order,as.data.frame(t3)),])
#> [1] TRUE
identical(t1, t4)
#> [1] TRUE
基准:
microbenchmark(
cbRcppAlgosPar = RcppAlgos::permuteGeneral(
tVec6, freqs = rep(2, 5), nThreads = numThreads
),
cbRcppAlgosSer = RcppAlgos::permuteGeneral(tVec6, freqs = rep(2, 5)),
cbMulticool = multicool::allPerm(multicool::initMC(tVec6Prime)),
cbPartitions = partitions::multiset(tVec6Prime),
cbArrangements = arrangements::permutations(tVec6, freq = rep(2, 5)),
unit = "relative"
)
#> Unit: relative
#> expr min lq mean median uq max neval
#> cbRcppAlgosPar 1.000 1.000 1.000 1.000 1.000 1.0000 100
#> cbRcppAlgosSer 2.485 2.141 2.289 2.584 2.511 1.0250 100
#> cbMulticool 80.195 66.237 45.540 64.971 66.057 3.5655 100
#> cbPartitions 5.731 4.786 3.925 4.719 4.953 0.4558 100
#> cbArrangements 2.999 2.907 3.270 2.966 2.906 3.1214 100
6. 总结
两者都是用于重新排列向量元素的成熟包。有了更多选项(请参阅上面的概述),有了 ,您可以重新排列 .有了 ,就可以重新排列多组。尽管出于这个问题的目的而受到限制,但它是一个强大的功能,具有用于处理整数分区的高效功能。gtools
combinat
gtools
combinat
factors
multicool
partitions
arrangements
- 输出按字典顺序排列。
- 允许用户通过参数(“row : row-major”、“colmnn : column-major” 和 “list : list”)指定格式。
layout
- 提供方便的方法,例如在使用迭代器时 &。
collect
getnext
- 允许通过 生成多个组合/排列。注意 (via)和(via)也能够做到这一点。
2^31 - 1
getnext
RcppAlgos
nextItem
multicool
nextPerm
- GMP 支持允许探索载体的组合/排列,并具有许多结果。
观察:
icomb <- arrangements::icombinations(1000, 7)
icomb$getnext()
#> [1] 1 2 3 4 5 6 7
icomb$getnext(d = 5)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 8
#> [2,] 1 2 3 4 5 6 9
#> [3,] 1 2 3 4 5 6 10
#> [4,] 1 2 3 4 5 6 11
#> [5,] 1 2 3 4 5 6 12
当您只想要几个组合/排列时,此功能非常好。使用传统方法,您必须生成所有组合/排列,然后生成子集。这将使前面的例子变得不可能,因为结果不止于此(即 = 194280608456793000)。10^17
ncombinations(1000, 7, bigz = TRUE)
此功能以及对 中的生成器的改进使其在内存方面非常有效。arrangements
RcppAlgos
- 输出按字典顺序排列。
- 有一些方便的约束功能,我们不会在这里讨论,因为它们与这个问题无关。我只想指出,利用这些功能可以解决的问题类型是创建此包的动机(分区、子集和等)。
- GMP 支持允许探索载体的组合/排列,并具有许多结果。
- Produce results in parallel using the or arguments.
Parallel
nThreads
- Similar to , there is a argument for applying a function to each result (See also ).
combn
FUN
FUN.VALUE
- Provides flexible and merory efficient iterators that allow for bidirectional iteration as well as random access.
nextItem
||: Retrieve the next lexicographical result(s)nextNIter
nextRemaining
prevItem
||: Retrieve the previous lexicographical result(s)prevNIter
prevRemaining
front
||: Random access methodsback
[[
- Allows for easy generation of more than results from any starting place.
2^31 - 1
Observe:
iter <- RcppAlgos::comboIter(1000, 7)
# first combinations
iter@nextIter()
#> [1] 1 2 3 4 5 6 7
# next 5 combinations
iter@nextNIter(5)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 8
#> [2,] 1 2 3 4 5 6 9
#> [3,] 1 2 3 4 5 6 10
#> [4,] 1 2 3 4 5 6 11
#> [5,] 1 2 3 4 5 6 12
# from the current state, the previous combination
iter@prevIter()
#> [1] 1 2 3 4 5 6 11
# the last combination
iter@back()
#> [1] 994 995 996 997 998 999 1000
# the 5th combination
iter[[5]]
#> [1] 1 2 3 4 5 6 11
# you can even pass a vector of indices
iter[[c(1, 3, 5)]]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 7
#> [2,] 1 2 3 4 5 6 9
#> [3,] 1 2 3 4 5 6 11
# start iterating from any index
iter[[gmp::pow.bigz(2, 31)]]
#> [1] 1 2 3 17 138 928 954
# get useful info about the current state
iter@summary()
#> $description
#> [1] "Combinations of 1000 choose 7"
#>
#> $currentIndex
#> Big Integer ('bigz') :
#> [1] 2147483648
#>
#> $totalResults
#> Big Integer ('bigz') :
#> [1] 194280608456793000
#>
#> $totalRemaining
#> Big Integer ('bigz') :
#> [1] 194280606309309352
## get next ieteration
iter@nextIter()
#> [1] 1 2 3 17 138 928 955
In case you were wondering how each package scales, I will leave you with this final example that measures how fast and the packages can generate over 100 million results. Note, is left out here as it will throw the error: . We also leave out as it takes quite some time to execute. Curiously, the differences in memory usage between and is quite bizarre given that they are only marginally different (see under the “Authors” section).RcppAlgos
arrangements
gtools::combinations
evaluation nested too deeply...
combn
utils::combn
combinat::combn
?utils::combn
Observe:
set.seed(2187)
tVec7 <- sort(sample(10^7, 10^3))
## 166,167,000 Combinations
system.time(RcppAlgos::comboGeneral(tVec7, 3))
#> user system elapsed
#> 0.386 0.105 0.490
system.time(arrangements::combinations(x = tVec7, k = 3))
#> user system elapsed
#> 0.439 0.105 0.545
## 124,251,000 Permuations
system.time(RcppAlgos::permuteGeneral(tVec7[1:500], 3))
#> user system elapsed
#> 0.141 0.076 0.218
system.time(arrangements::permutations(x = tVec7[1:500], k = 3))
#> user system elapsed
#> 0.386 0.077 0.463
7. Memory
When executing as well as , the memory will jump almost 2 Gbs before calling . This seems about right as ). However, when executing , the memory behavior was eratic (e.g. sometimes it would use all 16 Gb of memory and other times it would only spike a couple of Gbs). When I tested this on the Windows set-up, it would often crash.comboGeneral
arrangements::combinations
gc
#rows * #nols * bytesPerCell / 2^30 bytes = choose(1000,3) * 3 * 4 / 2^30 bytes = (166167000 * 3 * 4)/2^30 = 1.857 Gbs
combn
We can confirm this using along with . Observe:Rprof
summaryRporf
Rprof("RcppAlgos.out", memory.profiling = TRUE)
t1 <- RcppAlgos::comboGeneral(tVec7, 3)
Rprof(NULL)
head(summaryRprof("RcppAlgos.out", memory = "both")$by.total, n = 1)
#> total.time total.pct mem.total self.time self.pct
#> "RcppAlgos::comboGeneral" 0.42 100 1902 0.42 100
Rprof("arrangements.out", memory.profiling = TRUE)
t3 <- arrangements::combinations(tVec7, 3)
Rprof(NULL)
head(summaryRprof("arrangements.out", memory = "both")$by.total, n = 1)
#> total.time total.pct mem.total self.time self.pct
#> "arrangements::combinations" 0.5 100 1902 0.5 100
With & , registers just over .RcppAlgos
arrangements
mem.total
1900 Mb
And here is the memory profile on a smaller vector.
tVec7Prime <- tVec7[1:300]
Rprof("combinat.out", memory.profiling = TRUE)
t3 <- combinat::combn(tVec7Prime, 3)
Rprof(NULL)
head(summaryRprof("combinat.out", memory = "both")$by.total, n = 1)
#> total.time total.pct mem.total self.time self.pct
#> "combinat::combn" 2.1 100 1055 1.98 94.29
Rprof("utils.out", memory.profiling = TRUE)
t4 <- utils::combn(tVec7Prime, 3)
Rprof(NULL)
head(summaryRprof("utils.out", memory = "both")$by.total, n = 1)
#> total.time total.pct mem.total self.time self.pct
#> "utils::combn" 1.6 100 2059 1.6 100
Rprof("gtools.out", memory.profiling = TRUE)
t5 <- gtools::combinations(300, 3, tVec7Prime)
Rprof(NULL)
head(summaryRprof("gtools.out", memory = "both")$by.total, n = 1)
#> total.time total.pct mem.total self.time self.pct
#> "rbind" 1.62 100 6659 1.46 90.12
Interestingly, and use different amounts of memory and take different amounts of time to execute. This does not hold up with smaller vectors:utils::combn
combinat::combn
microbenchmark(combinat::combn(2:13, 6), utils::combn(2:13, 6))
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> combinat::combn(2:13, 6) 313.4 326.7 329.4 328.1 330.4 370.6 100
#> utils::combn(2:13, 6) 378.3 393.1 397.0 395.2 399.2 451.2 100
And with the total memory used is a little over 3x as much as . It should be noted that for these 3 packages, I obtained different results every-time I ran them (e.g. for sometimes I would get 9000 Mb and then I would get 13000 Mb).gtools
utils
combinat::combn
Still, none can match OR . Both only use 51 Mb when ran on the example above.RcppAlgos
arrangements
benchmark script: https://github.com/jwood000/RcppAlgos/blob/main/scripts/SO_Comb_Perm_in_R.R
*:向 Miklós Bóna 的 A Walk through Combinatorics 致敬
评论
permuteGeneral()
expand.grid(1:10,1:100,1:5)
mylist = list(list(c(1,2,3,3,4),c(10,20,30,30,40,40,40,55)),list(c(2,4,6,6),1:10,1:50))
sapply(mylist,expand.grid)
permuteGeneral()
expand.grid()
combn
expand.grid
我将从教程中复制文本,以防该链接中的信息丢失。
对于 BIBD,在 k 个观测值的 b 块中有 v 个重复 r 次的处理。第五个参数 lambda 记录了设计中每对处理发生的模块数。
我们首先在会话中加载 crossdes 包:
require(crossdes)
函数 find.BIB 用于生成具有特定数量的处理、块(设计的行)和每个块的元素(设计的列)的块设计。
考虑一个示例,其中有五个处理,分为四个块,每个块有三个元素。我们可以通过以下方式创建块设计:
> find. BIB(5, 4, 3)
[,1] [,2] [,3]
[1,] 1 3 4
[2,] 2 4 5
[3,] 2 3 5
[4,] 1 2 5
此设计不是 BIBD,因为处理在设计中重复的次数并不相同,我们可以使用 isGYD 函数进行检查。对于此示例:
> isGYD(find. BIB(5, 4, 3))
[1] The design is neither balanced w.r.t. rows nor w.r.t. columns.
这证实了我们从设计中看到的内容。
本教程继续提供七种处理、七个块和三个元素,但我不会包括这些元素。
评论
上一个:R:如何处理没有日期的时间?
评论