如何在 R 中执行嵌套重采样 - 多样性指数不确定性-解网

问：

我目前正在尝试应用 Herrmann 等人（2022）中描述的重采样方法，其中包含外部和内部重采样循环：我的数据集由 100 行 x 60 列组成。行对应于尝试 ID（每行一次尝试），列对应于物种（每列一个物种）。每个单元格表示每个物种 i 中每个尝试的个体计数 K = NiK

NiK 可能随重采样而变化，但给定尝试中所有混杂物种的个体总数（称为 NK，对应于行和）需要恒定。

让我们生成一个类似的数据帧：

df <- matrix(sample(0:15, 6000, replace = T), ncol = 60)
df

这是我用于外部循环的代码：

#Outer loop
#Resampling among the 100 tries 1000 times
seed(1234)
outer = matrix(replicate(1000, sample(1:nrow(df), nrow(df), replace = T)), ncol = 100)

现在添加内部循环，我被卡住了。内循环在每次尝试中对给定物种的个体数量进行重采样，而外循环在内循环生成的新数据集中重新采样尝试：

i = 物种数

r = 外部重采样 ID

inner <- NULL
complete <- NULL
for (r in 1:1000) {
  for (i in 1:ncol(df)) {
    inner = cbind(inner,sample(df[,i], nrow(df), replace = T))
  }
  complete = rbind(complete, inner[outer[r,],])
  inner <- NULL
}
complete

循环工作正常，可以通过内部和外部循环生成 100 次尝试的 1000 次重采样，但我目前不知道如何将每行限制为总和 N K，这可以针对给定的尝试 K 计算：

rowSums(df[K,])

有谁知道一种函数或方法，可以允许在使用内循环在每列中重新采样时保持 NK 常数？

提前感谢您的帮助，并希望我的英语和我的解释可以理解。

r 嵌套约束采样重采样

set.seed(1234)
K <- 100L # "tries"
N <- 60L # species
R <- 1e3L # resamples
df <- matrix(sample(0:15, N*K, replace = 1), K, N)
rs <- rowSums(df)

outer <- mapply( # outer sampling
  \(x) rowSums(
    # inner sampling
    mapply(\(i) rmultinom(1, rs[i], df[i,]), sample(nrow(df), K, 1))
  ),
  1:R
)

dim(outer)
#> [1]   60 1000

现在的每一列都对应一个外部重采样。outer

0赞 CaptainKrabs 8/29/2023 #2

感谢 jblood94 的快速回答！也许我误解了你的答案，但我的目标是用常量行和对列进行重采样，而不是用常量行和对行进行重采样。

我终于找到了解决问题的方法。我正在用错误的数据帧解决问题。

事实上，我的原始数据集看起来更像是：

ID <- sort(rep(1:100,times=c(sample(1:10, 100, replace = T))))
df <- data.frame(ID, Species.name= sample(letters[1:26], length(ID), replace = T))
df

with ID = 尝试 ID 并 Species.name 物种的识别。

使用我获得了一个类似的数据帧，如我原来的帖子中所述：acast()

 library(reshape2)
 acast(df, ID~Species.name)

适用于计算多样性指数。

尽管如此，使用 df，在保持每次尝试的个体数的同时执行嵌套采样更容易，因为个体等于一行！

工作代码：

 library(reshape2)
   library(dplyr)
#Get species complete list
sptot <- unique(df$Species.name)
length(sptot)

#Get try ID
nID <- unique((df$ID))
length(nID)

#Extract number of individuals per try to keep it constant
groupsize = df %>%
  group_by(ID)%>%
  summarise(n = n())
groupsize

#Test suitability of the inner resampling
outer = data.frame(ID = as.integer(sample(nID, length(nID), replace = T)), Ntry = 
seq(1,length(nID),1))
outerdata = left_join(outer, df, by = "ID", relationship = "many-to- 
many")

df2 = outerdata %>%
  group_by(Ntry, ID)%>%
  summarise(n = n())%>%
  left_join(groupsize, by = "ID")%>%
  mutate(diff = n.x - n.y)
mean(df2$diff) #Same number of row (individuals) per try

df3 = outerdata %>%  
  left_join(groupsize, by = "ID") %>%
  group_by(Ntry) %>%
  mutate(Sp = sample(Species.name, replace = T)) %>%
  filter(length(Sp) == n) %>%
  ungroup()
df3

check = df3 %>%
  group_by(ID, Ntry)%>%
  summarise(nfacor = length(unique(Species.name)),
            nfacsamp = length(unique(Sp)))

check$V = ifelse(check$nfacsamp <= check$nfacor, "OK", "FALSE")
check #no more unique species in resampling than in observed


#Create the loop
inner <- NULL
outer <- NULL
complete <- NULL
for (i in 1:1000) {
  #Outer resampling
  outer = data.frame(ID = as.integer(sample(nID, length(nID), replace = T), Ntry = seq(1,length(nID),1))
  outerdata = left_join(outer, df, by = "ID", relationship = "many-to-many")
  
  
  #Inner resampling
  inner = outerdata %>%  
    left_join(groupsize, by = "ID") %>%
    group_by(Ntry) %>%
    mutate(Sp = sample(Species.name, replace = T)) %>%
    filter(length(Sp) == n) %>%
    ungroup()
  inner = inner[,c("Ntry", "Sp")]
  
  #Adding missing column (i.e missing species since it is possible that 
  #some species are not resampled in a given iteration)
  sampcol <- colnames(acast(inner, Ntry ~ Sp, fun.aggregate = length))
  misscol = sub(" ", "\\.", setdiff(as.character(sptot), sampcol))
  complete = data.frame(Ntry= rownames(acast(inner, Ntry ~ Sp, fun.aggregate = length)), acast(inner, Ntry ~ Sp, fun.aggregate = length), bloc = i)
  complete[,misscol] <- 0
}
complete

然后，我们得到一个 1000 x 100 行的数据帧，其列数对应于物种，非常适合计算多样性指数。令人讨厌的是错误消息，但它不会影响结果。acast()

根据 Herrmann 等人（2022），我们现在需要：1）对每个区块的列求和以获得 1000 行，每个集合物种计数在重新采样尝试中，2）然后计算每个多样性指数的 1000 个值，并得到 Efron 95% CI。

希望这个线程能帮助某人进行类似的分析！

上一个：predict（）函数生成的值与 GLMER 中的手动计算不同

下一个：嵌套后无法访问因变量以拟合 LMER 模型

如何在 R 中执行嵌套重采样 - 多样性指数不确定性

How to perform nested resampling in R - Diversity index uncertainty

评论