如何对混淆矩阵的 cutree 结果重新排序?

how to reorder cutree result for confusion matrix?

提问人:Christiano dos Santos 提问时间:10/24/2023 最后编辑:Christiano dos Santos 更新时间:10/24/2023 访问量:29

问:

我正在使用 300 个样本的红外光谱数据进行 HCA,因此我的数据帧为 300x3606。变量范围为:new_hca_data <- new_hca_data_all[,6:3606]

hc.new <- hclust(dist(new_hca_data_region_sc, method = "euclidean"), method = 'ward.D')
clusterCut <- cutree(hc.new, k =5) 
dput(clusterCut)
c(a1 = 1L, a2 = 1L, a3 = 1L, a4 = 1L, a5 = 1L, a6 = 1L, a7 = 1L, 
a8 = 1L, a9 = 1L, a10 = 1L, a11 = 1L, a12 = 1L, a13 = 1L, a14 = 1L, 
a15 = 1L, a16 = 1L, a17 = 1L, a18 = 1L, a19 = 1L, a20 = 1L, a21 = 1L, 
a22 = 1L, a23 = 1L, a24 = 1L, a25 = 1L, a26 = 1L, a27 = 1L, a28 = 1L, 
a29 = 1L, a30 = 1L, a31 = 1L, a32 = 1L, a33 = 1L, a34 = 1L, a35 = 1L, 
a36 = 1L, a37 = 1L, a38 = 1L, a39 = 1L, a40 = 1L, a41 = 1L, a42 = 1L, 
a43 = 1L, a44 = 1L, a45 = 1L, a46 = 1L, a47 = 1L, a48 = 1L, a49 = 1L, 
a50 = 1L, a51 = 1L, a52 = 1L, a53 = 1L, a54 = 1L, a55 = 1L, a56 = 1L, 
a57 = 1L, a58 = 1L, a59 = 1L, a60 = 1L, a61 = 1L, a62 = 1L, a63 = 1L, 
a64 = 1L, b1 = 1L, b2 = 2L, b3 = 2L, b4 = 2L, b5 = 2L, b6 = 2L, 
b7 = 2L, b8 = 2L, b9 = 2L, b10 = 2L, b11 = 2L, b12 = 2L, b13 = 2L, 
b14 = 1L, b15 = 2L, b16 = 2L, b17 = 2L, b18 = 2L, b19 = 2L, b20 = 2L, 
b21 = 2L, b22 = 2L, b23 = 2L, b24 = 2L, b25 = 2L, b26 = 2L, b27 = 2L, 
b28 = 2L, b29 = 2L, b30 = 2L, b31 = 2L, b32 = 2L, b33 = 2L, b34 = 2L, 
b35 = 2L, b36 = 2L, b37 = 2L, b38 = 2L, b39 = 2L, b40 = 1L, b41 = 2L, 
b42 = 2L, b43 = 2L, b44 = 2L, b45 = 2L, b46 = 2L, b47 = 2L, b48 = 2L, 
b49 = 2L, b50 = 2L, b51 = 2L, b52 = 2L, c1 = 3L, c2 = 4L, c3 = 1L, 
c4 = 3L, c5 = 3L, c6 = 3L, c7 = 3L, c8 = 3L, c9 = 3L, c10 = 3L, 
c11 = 3L, c12 = 3L, c13 = 3L, c14 = 3L, c15 = 3L, c16 = 3L, c17 = 4L, 
c18 = 1L, c19 = 3L, c20 = 3L, c21 = 3L, c22 = 3L, c23 = 3L, c24 = 3L, 
c25 = 3L, c26 = 3L, c27 = 3L, c28 = 3L, c29 = 3L, c30 = 3L, c31 = 3L, 
c32 = 4L, c33 = 1L, c34 = 3L, c35 = 3L, c36 = 3L, c37 = 3L, c38 = 3L, 
c39 = 3L, c40 = 3L, c41 = 3L, c42 = 3L, c43 = 3L, c44 = 3L, c45 = 3L, 
c46 = 3L, c47 = 4L, c48 = 1L, c49 = 3L, c50 = 3L, c51 = 3L, c52 = 3L, 
c53 = 3L, c54 = 3L, c55 = 3L, c56 = 3L, c57 = 3L, c58 = 3L, c59 = 3L, 
c60 = 3L, k1 = 5L, k2 = 5L, k3 = 5L, k4 = 5L, k5 = 5L, k6 = 5L, 
k7 = 5L, k8 = 5L, k9 = 5L, k10 = 5L, k11 = 5L, k12 = 5L, k13 = 5L, 
k14 = 1L, k15 = 5L, k16 = 5L, k17 = 5L, k18 = 5L, k19 = 5L, k20 = 5L, 
k21 = 5L, k22 = 5L, k23 = 5L, k24 = 5L, k25 = 5L, k26 = 5L, k27 = 5L, 
k28 = 5L, k29 = 5L, k30 = 1L, k31 = 5L, k32 = 5L, k33 = 5L, k34 = 5L, 
k35 = 5L, k36 = 5L, k37 = 5L, k38 = 5L, k39 = 5L, k40 = 5L, k41 = 5L, 
k42 = 5L, k43 = 5L, k44 = 5L, k45 = 5L, k46 = 1L, k47 = 5L, k48 = 5L, 
k49 = 5L, k50 = 5L, k51 = 5L, k52 = 5L, k53 = 5L, k54 = 5L, k55 = 5L, 
k56 = 5L, k57 = 5L, k58 = 5L, k59 = 5L, k60 = 5L, k61 = 5L, k62 = 1L, 
k63 = 5L, k64 = 5L, f1 = 4L, f2 = 4L, f3 = 4L, f4 = 4L, f5 = 4L, 
f6 = 4L, f7 = 4L, f8 = 4L, f9 = 4L, f10 = 4L, f11 = 4L, f12 = 4L, 
f13 = 4L, f14 = 4L, f15 = 4L, f16 = 4L, f17 = 4L, f18 = 4L, f19 = 4L, 
f20 = 4L, f21 = 4L, f22 = 4L, f23 = 4L, f24 = 4L, f25 = 4L, f26 = 4L, 
f27 = 4L, f28 = 4L, f29 = 4L, f30 = 4L, f31 = 4L, f32 = 4L, f33 = 4L, 
f34 = 4L, f35 = 4L, f36 = 4L, f37 = 4L, f38 = 4L, f39 = 4L, f40 = 4L, 
f41 = 4L, f42 = 4L, f43 = 4L, f44 = 4L, f45 = 4L, f46 = 4L, f47 = 4L, 
f48 = 4L, f49 = 4L, f50 = 4L, f51 = 4L, f52 = 4L, f53 = 4L, f54 = 4L, 
f55 = 4L, f56 = 4L, f57 = 4L, f58 = 4L, f59 = 4L, f60 = 4L)

我确实有一些列,我知道其中有 5 个类,我想制作一个混淆矩阵 (CM) 来检查准确性。 我是这样做的:

nome_clusters <- c("ANF", "BZD", "CAN", "FEN", "CAT")
tabela_ordenada <- table(clusterCut, new_hca_data_all$grupo)
tabela_ordenada  # CAT & FEN columns are in wrong order
clusterCut ANF BZD CAN CAT FEN
         1  64   3   4   4   0
         2   0  49   0   0   0
         3   0   0  52   0   0
         4   0   0   4   0  60
         5   0   0   0  60   0
ordem_colunas <- nome_clusters
tabela_ordenada <- tabela_ordenada[, ordem_colunas]
print(tabela_ordenada)
      
clusterCut ANF BZD CAN FEN CAT
         1  64   3   4   0   4
         2   0  49   0   0   0
         3   0   0  52   0   0
         4   0   0   4  60   0
         5   0   0   0   0  60

现在我可以执行 CM 分析了。

hca_gr
[1] "anf" "bzd" "can" "cat" "fen"
matriz_confusao <- as.table(matriz_confusao)
colnames(matriz_confusao) <- hca_gr
rownames(matriz_confusao) <- hca_gr
# Calcule as métricas de avaliação
avaliacao <- confusionMatrix(matriz_confusao)
# Show results
print(avaliacao)
Confusion Matrix and Statistics

    anf bzd can cat fen
anf  64   3   4   0   4
bzd   0  49   0   0   0
can   0   0  52   0   0
cat   0   0   4  60   0
fen   0   0   0   0  60

Overall Statistics
                                          
               Accuracy : 0.95            
                 95% CI : (0.9189, 0.9717)
    No Information Rate : 0.2133          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9374          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: anf Class: bzd Class: can Class: cat Class: fen
Sensitivity              1.0000     0.9423     0.8667     1.0000     0.9375
Specificity              0.9534     1.0000     1.0000     0.9833     1.0000
Pos Pred Value           0.8533     1.0000     1.0000     0.9375     1.0000
Neg Pred Value           1.0000     0.9880     0.9677     1.0000     0.9833
Prevalence               0.2133     0.1733     0.2000     0.2000     0.2133
Detection Rate           0.2133     0.1633     0.1733     0.2000     0.2000
Detection Prevalence     0.2500     0.1633     0.1733     0.2133     0.2000
Balanced Accuracy        0.9767     0.9712     0.9333     0.9917     0.9688

问题是:clusterCut 结果以某种方式对聚类进行修饰,我必须弄清楚列的顺序是什么,然后重新排序它们以制作混淆矩阵,否则结果是错误的。 在这种情况下,很容易弄清楚,但我有其他结果,就不那么容易了。 clusterCut 向量包含名称,约定为:ai(i=1-60) = anf, b = bzd, c = can, f = fen, k=cat。new_hca_data_all$sigla 是我的 df 中的列,它是它的来源。

我的问题是:我怎样才能使用我所拥有的信息制作一个修复此顺序的代码,而不会像我一样猜测它?

r 分析 分层聚类

评论


答: 暂无答案