提问人:victoris_93 提问时间:10/26/2023 更新时间:10/26/2023 访问量:21
在管道中转换的特征的置换特征重要性 (sklearn)
Permutation feature importance on features transformed within a pipeline (sklearn)
问:
早些时候也提出了类似的问题。我需要通过 计算预处理特征的特征重要性。预处理是在管道中实现的。代码如下:sklearn.inspection.permutation_importance
import numpy as np
import pandas as pd
import os
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
###
data load
###
X_train, X_test, y_train, y_test = train_test_split(all_features, diagnosis, test_size=0.25, random_state=42)
pca_conn = Pipeline(
steps = [("group_whiten", StandardScaler()),
('pca_conn', PCA(n_components = 100)),
("pca_whiten", StandardScaler())]
)
pca_grad = Pipeline(
steps = [("group_whiten", StandardScaler()),
('pca_grad', PCA(n_components = 100)),
("pca_whiten", StandardScaler())]
)
pca_centroid_disp_pca = Pipeline(
steps = [("group_whiten", StandardScaler()),
('pca_grad', PCA(n_components = 10)),
("pca_whiten", StandardScaler())]
)
pca_cortex_disp_pca = Pipeline(
steps = [("group_whiten", StandardScaler()),
('pca_grad', PCA(n_components = 100)),
("pca_whiten", StandardScaler())]
)
cat_encoder = Pipeline(
steps = [("cat_encoder", OneHotEncoder(handle_unknown="ignore"))]
)
whiten = Pipeline(
steps = [("whiten", StandardScaler())]
)
preprocessor = ColumnTransformer(
transformers=[
("pca_conn", pca_conn, conn_cols),
("pca_grad", pca_grad, grad_cols),
("pca_centroid_disp_pca", pca_centroid_disp_pca, centroid_disp_cols),
("pca_cortex_disp_pca", pca_cortex_disp_pca, cortex_disp_cols),
("encode_dataset", cat_encoder, ["dataset"]),
("encode_sex", cat_encoder, ["sex"]),
("whiten_fd", whiten, ["mean_fd"]),
("whiten_age", whiten, ["age"])
]
)
lr = LogisticRegression(random_state=42, max_iter = 10000)
clf = Pipeline([('preprocessor', preprocessor),
('lr',lr)])
from sklearn.linear_model import LogisticRegression
trained_logreg = clf.fit(X_train, y_train)
trained_logreg.score(X_test, y_test)
perm_acc = permutation_importance(trained_logreg, X_test, y_test,n_repeats=100, random_state=42, n_jobs = -1)
默认情况下,似乎会计算原始特征的排列重要性。有没有人尝试过为变换的特征实现排列重要性?有什么提示吗?我认为单独进行预处理不是一种选择(数据泄漏)。
答: 暂无答案
评论