GridSearchCV 机器学习-解网

问：

我使用 GridSearch 来查找此决策树的相对最佳超参数（并使用 K-Fold CV 来评估模型的性能）。请查看代码和输出结果中的“最佳结果”行。

为什么它没有给我任何关于标准的信息（例如，是使用熵还是基尼）？

当我使用我编写的其他代码运行测试时，它有效，但提供的信息不正确（例如，根据GridSearch，熵更适合此模型，而实际上，当我运行手动测试时，Gini提供了更好的准确性和召回率（但是，对于精度，熵更好，但结果应基于代码中指定的准确性）。此外，对于最大深度，它建议使用值 7，而在实践中，9 或更多给出了更好的结果。

import pandas as pd
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, classification_report
from matplotlib import pyplot as plt
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
column_names = ['file_path', '50', '100', '250', '500', '1000', 'r50', 'r100', 'r250', 'r500', 'r1000', 'rfile', 'class2']
df = pd.read_csv("C:/Folder/deftxt - copy.csv", sep = ';', header = 0, names = column_names)
    
x = df.drop(['class2', 'file_path'], axis=1)
df['class2'] = df['class2'].astype(int)
y = df['class2'].values
    
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, shuffle = True, random_state = 100)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
    
model = DecisionTreeClassifier(random_state=100)
model.fit(x_train, y_train)
model.get_params()
    
k_fold_acc = cross_val_score(model, x_train, y_train, cv=10)
k_fold_mean = k_fold_acc.mean()
for i in k_fold_acc:
    print(i)
print("accuracy K Fold CV:" + str(k_fold_mean))
    
param_dist={
    "criterion":["gini", "entropy"],
    "max_depth":[1,2,3,4,5,6,7, None],
    "min_samples_split":[2,3,4,5],
}
grid = GridSearchCV(model, param_grid=param_dist, cv=10, n_jobs=-1, scoring='accuracy', verbose=1)
grid.fit(x_train, y_train)
    
print("The best results:" + str(grid.best_estimator_))
    
fn = ['50', '100', '250', '500', '1000', '-50', '-100', '-250', '-500', '-1000', 'total']
cn = ['ClassA', 'ClassB']
    
grid_predictions = grid.predict(x_test)
print(classification_report(y_test, grid_predictions))

输出：

(1369, 11) (587, 11) (1369,) (587,)
0.9927007299270073
0.9927007299270073
0.9781021897810219
0.9927007299270073
0.9927007299270073
0.9854014598540146
0.9854014598540146
0.9927007299270073
0.9781021897810219
0.9779411764705882
accuracy K Fold CV:0.9868452125375698
Fitting 10 folds for each of 64 candidates, totalling 640 fits
The best results:DecisionTreeClassifier(max_depth=7, random_state=100)
                precision    recall  f1-score   support
    
            0       0.98      0.97      0.97       174
            1       0.99      0.99      0.99       413
    
    accuracy                           0.98       587
    macro avg       0.98      0.98      0.98       587
weighted avg       0.98      0.98      0.98       587
    
    
Process finished with exit code 0

python 机器学习 scikit-learn 决策树 gridsearchcv

GridSearchCV 机器学习

GridSearchCV Machine learning

评论