ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。K 表示聚类

ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). K means clustering

提问人:sunhem 提问时间:7/31/2023 最后编辑:sunhem 更新时间:7/31/2023 访问量:15

问:

'我正在做一个 k 均值聚类项目,当我对分类值进行 onehotencoder 并对数值应用标准缩放器时出现错误。

ValueError: Input contains NaN, infinity or a value too large for dtype('flo在64')。

数据干净,无空值,无大值,删除异常值,无缺失值

我该如何纠正这个问题?

我的代码如下:

# Columns to be one-hot encoded
columns_to_onehot = ['gender', 'category', 'payment_method', ]

# Columns to be scaled
columns_to_scale = ['age', 'quantity', 'price', 'total_amount']
# One Hot Encoding
encoder = OneHotEncoder(drop='first', sparse=False) # 'drop' parameter is set to 'first' to avoid multicollinearity

#encoder = LabelEncoder()
one_hot_encoded_columns = encoder.fit_transform(subset_df1[columns_to_onehot])
#getting the column names
column_names = encoder.get_feature_names(input_features=columns_to_onehot)

df_encoded = pd.concat([subset_df1.drop(columns_to_onehot, axis=1),
                       pd.DataFrame(one_hot_encoded_columns, columns=column_names)],
                       axis=1)

# Standard Scaling


scaler = StandardScaler()
df_encoded[columns_to_scale] = scaler.fit_transform(df_encoded[columns_to_scale])

#Finding the optimal K with Elbow Method and Silhouette score

Sum_of_squared_distances = []
silhouette_avg = []

K = range(1,10)
for k in K:
    model = KMeans(n_clusters=k, random_state=0)
    model.fit(df_encoded)
    Sum_of_squared_distances.append(model.inertia_)
    
    if k>1:
        silhouette_avg.append(silhouette_score(df_encoded, model.labels_ ,metric='euclidean'))
       
    else:
        pass

在此处输入图像描述

在此处输入图像描述

nan 值误差 无穷大

评论


答: 暂无答案