R 中类别变量的两个以上水平的显著性检验-解网

问：

我正在尝试确定两组之间具有 8 个级别的分类变量的频率是否存在显着差异。在这种情况下，两组被问到他们最喜欢的颜色，有 8 个选择。我想知道第 1 组的人选择颜色的频率与第 2 组的人选择相同颜色的频率是否存在显着差异。

也就是说，64.2%的Grp 1选择了橙色，而第2组则为53%。这种差异是否显著？这是一个使用 tabpct（）的频率表

tabpct(all_data$Colors, all_data$Group, graph = F)

Column percent 
                         all_data$Group
all_data$Colors         Grp 1   %     Grp 2   %
           Red          3    (1.3)    2    (1.0)
           Blue         19   (8.4)    10   (5.0)
           Yellow       1    (0.4)    2    (1.0)
           Green        4    (1.8)    5    (2.5)
           Purple       1    (0.4)    2    (1.0)
           Orange       145  (64.2)   106  (53.0)
           Pink         1    (0.4)    1    (0.5)
           Brown       52   (23.0)   72   (36.0)
           Total        226  (100)    200  (100)

我确信有一种更简单的方法，但我似乎无法弄清楚。任何帮助将不胜感激！

我尝试对方差分析进行建模并对其进行 TukeyHSD 测试，但尽管没有 NA、NaN、Inf 或 0，但我还是得到了错误：

ColorComp <- aov(Color ~ Group, data = all_data)
TukeyHSD(ColorComp)

> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
> NA/NaN/Inf in 'y'
> In addition: Warning message:
> In storage.mode(v) <- "double" : NAs introduced by coercion

我也尝试过同样的错误回归。

R 比较分类数据方差分析频率分析

read.table(text=txt, head=TRUE)
  Colors Grp1     X. Grp2   X..1
1    Red    3  (1.3)    2  (1.0)
2   Blue   19  (8.4)   10  (5.0)
3 Yellow    1  (0.4)    2  (1.0)
4  Green    4  (1.8)    5  (2.5)
5 Purple    1  (0.4)    2  (1.0)
6 Orange  145 (64.2)  106 (53.0)
7   Pink    1  (0.4)    1  (0.5)
8  Brown   52 (23.0)   72 (36.0)
> dat <-read.table(text=txt, head=TRUE)
> fisher.test(dat[c(2,4)])

    Fisher's Exact Test for Count Data

data:  dat[c(2, 4)]
p-value = 0.06452
alternative hypothesis: two.sided

可以进行卡方检验，但其有效性值得怀疑。

chisq.test(dat[c(2,4)])

    Pearson's Chi-squared test

data:  dat[c(2, 4)]
X-squared = 11.512, df = 7, p-value = 0.1178

Warning message:
In chisq.test(dat[c(2, 4)]) : Chi-squared approximation may be incorrect

0赞 DaveArmstrong 11/8/2023 #2

这是使用的结果：simulate.p.valuechisq.test()

mat <- matrix(c(3  ,  2, 
19 ,  10,
1  ,  2, 
4  ,  5, 
1  ,  2, 
145,  106,
1  ,  1, 
52  , 72), ncol=2, byrow=TRUE) 
colnames(mat) <- c("Grp1", "Grp2")
rownames(mat) <- c("Red",    "Blue",   "Yellow", "Green",  "Purple", "Orange", "Pink",   "Brown")
mat
#>        Grp1 Grp2
#> Red       3    2
#> Blue     19   10
#> Yellow    1    2
#> Green     4    5
#> Purple    1    2
#> Orange  145  106
#> Pink      1    1
#> Brown    52   72

chisq.test(mat, simulate.p.value=TRUE, B=10000)
#> 
#>  Pearson's Chi-squared test with simulated p-value (based on 10000
#>  replicates)
#> 
#> data:  mat
#> X-squared = 11.512, df = NA, p-value = 0.09839

^{创建于 2023-11-07 with reprex v2.0.2}

上一个：C 语言 - 如何在没有任何输入的情况下用 9 个 3x3 矩阵制作一个类似数独的棋盘

下一个：Python 是否支持短路？

R 中类别变量的两个以上水平的显著性检验

Significance tests across more than two levels of a categorical variable in R

评论