后向差分编码器

Backward Difference Encoder

提问人:Gabriel Bueno Guimaraes 提问时间:9/21/2023 最后编辑:desertnautGabriel Bueno Guimaraes 更新时间:9/22/2023 访问量:28

问:

我正在尝试在某些列中使用 Backward Difference Encoder,然后训练逻辑回归模型。

def train_model_v0(X, y, model, cat_enc_method):

    # Applying Ordinal Encoding to the dependent variable 'churn'

    target_encoder = OrdinalEncoder()

    y = target_encoder.fit_transform(y.values.reshape(-1, 1)).flatten()

    # Defining the steps of the data processing pipeline

    steps = [

        ('Change_Columns', ChangeColumns()),  # Step to change columns (not specified in the code)

        ('Categorical_Encoder', cat_enc_method(cols=['state', 'area_code', 'international_plan', 'voice_mail_plan'])),

        ('scaler', StandardScaler()),  # Standard scaling step (normalization)

        ('model', model)  # The machine learning model to be trained

    ]

    # Defining the hyperparameters to be tested in a grid search

    grid_features = [

        {

            'model__penalty': ['l2', None],  # L2 regularization or none

            'model__C': np.logspace(0, 1, 10, base=0.001),  # Regularization parameter C

            'model__solver': ['lbfgs', 'newton-cg', 'sag']  # Optimization algorithms

        },

        {

            'model__penalty': ['l1', 'l2'],  # L1 or L2 regularization

            'model__C': np.logspace(0, 1, 10, base=0.001),  # Regularization parameter C

            'model__solver': ['liblinear']  # Optimization algorithm for L1 regularization

        },

        {

            'model__penalty': ['l1', 'l2', None, 'elasticnet'],  # L1, L2, none, or elasticnet regularization

            'model__C': np.logspace(0, 1, 10, base=0.001),  # Regularization parameter C

            'model__solver': ['saga']  # Optimization algorithm for elasticnet regularization

        }

    ]

    # Creating a pipeline that includes all data processing steps and the model

    pipe_model = Pipeline(steps=steps)

    # Performing a grid search (GridSearchCV) to find the best hyperparameters

    pipe_v1 = GridSearchCV(pipe_model,

                           param_grid=grid_features,

                           scoring='roc_auc',  # Evaluation metric (area under the ROC curve)

                           cv=5)  # 5-fold cross-validation

    # Fitting the model to the training data

    pipe_v1.fit(X, y)

   
# calling the function
 train_model_v0(X, y, LogisticRegression(max_iter= 10000),
             ce.backward_difference.BackwardDifferenceEncoder)

我遇到了一个错误,我无法弄清楚为什么会发生这种情况。

以下是回溯:

 868     results = self._format_results(
    869         all_candidate_params, n_splits, all_out, all_more_results
    870     )
    872     return results
--> 874 self._run_search(evaluate_candidates)
    876 # multimetric is determined here because in the case of a callable
    877 # self.scoring the return type is only known after calling
    878 first_test_score = all_out[0]["test_scores"]

...
    376         f"Below are more details about the failures:\n{fit_errors_summary}"
    377     )
--> 378     warnings.warn(some_fits_failed_message, FitFailedWarning)

TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union

以下是我正在使用的数据片段: 在此处输入图像描述

python 机器学习 scikit-learn categorical-data

评论

0赞 Ben Reiniger 9/21/2023
在网格搜索中设置,并报告结果的完整错误回溯。error_score="raise"

答: 暂无答案