我正在尝试使用 ColumnTrandformer 填充数值和分类值并使用 OneHotEncoder 转换分类值,但它不起作用

I am trying to fill numerical and categorical values and convert categorical values with OneHotEncoder using ColumnTrandformer but its not working

提问人:Bhumit 提问时间:10/16/2023 最后编辑:desertnautBhumit 更新时间:10/16/2023 访问量:37

问:

我尝试用 imputer 填充 DataFrame,然后对分类值执行 OneHoTNCODING 但是当我将任何 Alogos 应用于转换后的值时,它会抛出错误,在下面代码中提到,如果我在不使用 columntransformer 的情况下单独执行相同的任务,它工作正常,我做错了什么?

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

# Define the columns for different imputation strategies
mean_col = ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustSpeed', 'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Temp9am', 'Temp3pm']
median_col = ['Cloud9am', 'Cloud3pm']
mf_col = ['WindGustDir', 'WindDir9am', 'WindDir3pm', 'RainToday']
ohm_cols = ['Location','WindGustDir','WindDir9am','WindDir3pm','RainToday']

# Create a ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('trnf1', SimpleImputer(strategy='mean'), mean_col),
        ('trnf2', SimpleImputer(strategy='median'), median_col),
        ('trnf3', SimpleImputer(strategy='most_frequent'), mf_col),
        ('trnf4', OneHotEncoder(drop='first', sparse=False), ohm_cols)
    ],
    remainder='drop'  # Drop columns not specified in transformers
)

# Fit and transform the training data
x_train_transformed = preprocessor.fit_transform(x_train)

# Transform the testing data
x_test_transformed = preprocessor.transform(x_test)

错误:

ValueError: could not convert string to float: 'SSW'

下面的代码正在分段执行相同的任务,并且工作正常:

im = SimpleImputer(strategy= 'mean')
x_train[mean_col] = im.fit_transform(x_train[mean_col])
x_test[mean_col] = im.transform(x_test[mean_col])

im_median = SimpleImputer(strategy='median')
x_train[median_col] = im_median.fit_transform(x_train[median_col])
x_test[median_col] = im_median.transform(x_test[median_col])

im_mf = SimpleImputer(strategy='most_frequent')
x_train[mf_col] = im_mf.fit_transform(x_train[mf_col])
x_test[mf_col] = im_mf.transform(x_test[mf_col])
ohm  =OneHotEncoder(drop = 'first', sparse = False)
x_train_transformed = ohm.fit_transform(x_train1[ohm_cols])
x_test_transformed = ohm.transform(x_test1[ohm_cols])
python-3.x pandas 机器学习 scikit-learn

评论

1赞 desertnaut 10/16/2023
究竟在哪里,以及如何定义?请更新您的帖子以包含完整的错误跟踪 - 了解如何创建最小的可重现示例x_trainx_test
0赞 Ben Reiniger 10/16/2023
这回答了你的问题吗?将多个预处理步骤应用于 sklearn 管道中的列

答: 暂无答案