“最佳策略：在模型中设置分类阈值与在预测期间设置分类阈值”-解网

问：

我正在处理二元分类任务，并希望了解将类阈值合并到我的模型中的最佳实践。具体来说，我希望确保模型不仅进行二元预测，而且还提供其在每个预测中的置信度的度量。我正在考虑两种方法：

模型中的阈值：将类阈值直接合并到模型体系结构中，以便根据概率是高于还是低于阈值进行二元预测。

import torch.nn as nn

class CustomBinaryClassifier(nn.Module):
    def __init__(self, in_features, threshold=0.5):
        super(CustomBinaryClassifier, self).__init()
        self.linear = nn.Linear(in_features, 1)
        self.sigmoid = nn.Sigmoid()
        self.threshold = threshold

    def forward(self, x):
        logits = self.linear(x)
        probabilities = self.sigmoid(logits)
        # Apply threshold for class assignment
        classes = (probabilities >= self.threshold).float()
        return classes

# Create an instance of the model with a specified threshold
threshold = 0.5
model = CustomBinaryClassifier(in_features=X.shape[1], threshold=threshold)

阈值作为后处理步骤：根据模型的原始对数计算类概率，并将阈值作为后处理步骤应用，以确定二进制类分配。

import torch

# Assuming you have a trained model 'model' and input data 'X_test'
untrained_preds = model(X_test)
untrained_preds_probs = torch.sigmoid(untrained_preds)

threshold = 0.5
predicted_classes = (untrained_preds_probs >= threshold).float()

每种方法的优缺点是什么，什么时候应该使用其中一种方法？此外，如果我在模型中使用阈值，它是否会影响模型的学习过程及其对预测的信心？
还有一件事，如果我在模型中包含置信度计算（如概率）（如方法 1，即 sigmoid），它会改变我在训练时计算损失的方式吗？它是否使模型在估计其预测的置信度和提高其准确性方面做得更好？

我正在寻找有关如何以及何时在二元分类模型中设置类阈值的实用建议。

机器学习深度学习 PyTorch 分类预测

在设置阈值方面，这通常是在根据所需指标进行训练后完成的。阈值使您能够调整模型的精度/召回率。想象一下，将阈值从 0 扫到 1。在 0 时，您的精度很差（大量误报），但召回率很高（没有漏报）。在 1 时，情况正好相反。

应首先确定哪个指标对应用程序最重要（精度、召回率、F1 等）。然后，在训练模型后，扫描验证集上的阈值范围，看看哪个值提供了最佳指标。您还可以查看 ROC 曲线等内容。

上一个：哪种 ML 模型更适合基于经度和纬度预测延迟？

下一个：基于 CT 图像的 Weird Mask R-CNN 预测结果

“最佳策略：在模型中设置分类阈值与在预测期间设置分类阈值”

"Optimal Strategy: Setting Classification Threshold in the Model vs. During Prediction"

评论