XGBClassifier 在不同机器的类似环境中给出不同的结果-解网

问：

我使用 Grid-Search 和以下参数训练了一个 XGBoost 分类器模型：

params = {
    'max_depth':[5,6],
    'min_child_weight': [1, 2, 3],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0]
}

xgb = XGBClassifier(device="cuda",learning_rate=0.02, n_estimators=1000, objective='binary:logistic', verbosity=0, tree_method="gpu_hist")

skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)

grid_search = GridSearchCV(estimator=xgb, param_grid=params, scoring='roc_auc', n_jobs=-1, cv=skf.split(x_train,y_train), verbose=100, return_train_score=True)

grid_search.fit(x_train, y_train)

然后我保存了最好的模型，如下所示：

from joblib import dump
joblib.dump(grid_search.best_estimator_, 'xgboost_grid_search.joblib')

当我再次加载模型时，predict_proba给出不同的结果，这就是我加载模型以获得预测的方式：

import joblib
model = joblib.load("xgboost_grid_search.joblib")
model.predict_proba(x_test)

这里的x_train和x_test包含数字特征。y_train 和 y_test 是分类值（0 或 1）

通过阅读相当多的博客、文章、堆栈溢出答案，我确保在这两种环境中都满足以下条件：

 1. Correct python version - 3.11.5
 2. Same/consistent joblib and xgboost pip versions - 1.2.0 and 2.0.0 respectively
 3. Correct ordering of features in x_test as x_train and model.feature_names_in_

但是，我要指出的是，这两种环境的操作系统是不同的：Mac M1 和 Ubuntu（不确定这是否是一个问题）。

任何帮助都是值得赞赏的，如果我做错了什么，请告诉我。

提前致谢！

机器学习 scikit-learn xgboost joblib

params = {
    'max_depth':[5,6],
    'min_child_weight': [1, 2, 3],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0]
}

xgb = XGBClassifier(device="cuda",learning_rate=0.02, 
                    n_estimators=1000, 
                    objective='binary:logistic', 
                    verbosity=0, 
                    tree_method="gpu_hist", 
                    random_state= 1001) # HERE!

skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)

grid_search = GridSearchCV(estimator=xgb, param_grid=params, scoring='roc_auc', n_jobs=-1, cv=skf.split(x_train,y_train), verbose=100, return_train_score=True)

grid_search.fit(x_train, y_train)

XGBClassifier 在不同机器的类似环境中给出不同的结果

XGBClassifier gives different results on similar environments of different machines

评论

评论