不平衡数据集的分类

Classification for unbalanced dataset

提问人:Ray 提问时间:11/15/2023 更新时间:11/15/2023 访问量:15

问:

我有一个包含 3 个类的数据集,这些数据集取自 40 人。有些人有 3 类数据,有些人只有 2 或 1 类数据。我正在尝试与一个人进行交叉验证的分类。但它并没有给我带来好的结果。那么,我该如何解决这种分类问题呢?

我尝试过与一个人进行交叉验证。但它不起作用

Python 数据集 分类 交叉验证

评论


答:

0赞 Rohit Patel #1
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np

X = np.array([[...], [...], ...])  # Replace with your features
y = np.array([0, 1, 2, 0, 1, 1, ...])  # Replace with your labels

classifier = SVC(kernel='linear', C=1)
pipeline = make_pipeline(StandardScaler(), classifier)

stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
accuracy_scores = []
for train_index, test_index in stratified_kfold.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]


    pipeline.fit(X_train, y_train)

    
    predictions = pipeline.predict(X_test)

    accuracy = accuracy_score(y_test, predictions)
    accuracy_scores.append(accuracy)
    enter code here

print("Accuracy scores for each fold:", accuracy_scores)

print("Mean Accuracy:", np.mean(accuracy_scores))