提问人:sunhem 提问时间:7/31/2023 最后编辑:sunhem 更新时间:7/31/2023 访问量:15
ValueError:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。K 表示聚类
ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). K means clustering
问:
'我正在做一个 k 均值聚类项目,当我对分类值进行 onehotencoder 并对数值应用标准缩放器时出现错误。
ValueError: Input contains NaN, infinity or a value too large for dtype('flo
在64')。
数据干净,无空值,无大值,删除异常值,无缺失值
我该如何纠正这个问题?
我的代码如下:
# Columns to be one-hot encoded
columns_to_onehot = ['gender', 'category', 'payment_method', ]
# Columns to be scaled
columns_to_scale = ['age', 'quantity', 'price', 'total_amount']
# One Hot Encoding
encoder = OneHotEncoder(drop='first', sparse=False) # 'drop' parameter is set to 'first' to avoid multicollinearity
#encoder = LabelEncoder()
one_hot_encoded_columns = encoder.fit_transform(subset_df1[columns_to_onehot])
#getting the column names
column_names = encoder.get_feature_names(input_features=columns_to_onehot)
df_encoded = pd.concat([subset_df1.drop(columns_to_onehot, axis=1),
pd.DataFrame(one_hot_encoded_columns, columns=column_names)],
axis=1)
# Standard Scaling
scaler = StandardScaler()
df_encoded[columns_to_scale] = scaler.fit_transform(df_encoded[columns_to_scale])
#Finding the optimal K with Elbow Method and Silhouette score
Sum_of_squared_distances = []
silhouette_avg = []
K = range(1,10)
for k in K:
model = KMeans(n_clusters=k, random_state=0)
model.fit(df_encoded)
Sum_of_squared_distances.append(model.inertia_)
if k>1:
silhouette_avg.append(silhouette_score(df_encoded, model.labels_ ,metric='euclidean'))
else:
pass
答: 暂无答案
评论