KNN 算法抛出 ValueError: Unknown label type: 'continuous'

KNN algorithm throws ValueError: Unknown label type: 'continuous'

提问人:Akshay Basutkar 提问时间:9/27/2023 最后编辑:desertnautAkshay Basutkar 更新时间:9/27/2023 访问量:33

问:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler

path = "/content/cirrhosis.csv"
data = pd.read_csv(path)

data = data.loc[0:311]
data.head()

for col in data.columns:
  if data[col].dtype == 'int64' or data[col].dtype == 'float64':
    data[col].fillna(data[col].mean(), inplace=True)

  elif data[col].dtype == 'object':
    data[col].fillna(data[col].mode(), inplace=True)

label_encoder = LabelEncoder()
for column in data.columns:
    if data[column].dtype == 'object':
        data[column] = label_encoder.fit_transform(data[column])
print(data)

scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
data = pd.DataFrame(scaled_data, columns=data.columns)

inputs = data.drop(['ID', 'Stage'],axis=1)
output = data.drop(['ID', 'N_Days', 'Status', 'Drug', 'Age', 'Sex', 'Ascites', 'Hepatomegaly', 'Spiders', 'Edema', 'Bilirubin', 'Cholesterol', 'Albumin', 'Copper', 'Alk_Phos', 'SGOT', 'Tryglicerides', 'Platelets', 'Prothrombin'], axis=1)
print(inputs)
print(output)

x_train, x_test, y_train, y_test = train_test_split(inputs, output, train_size=0.8)

model =  KNeighborsClassifier(n_neighbors=31)
model.fit(x_train,y_train)
y_pred = model.predict(x_test)

我试图提高 KNN 模型的准确性,所以我尝试执行特征缩放 但是当我执行特征缩放并尝试使用 model.fit() 训练我的模型时,它会抛出一个 ValueError 如果我不执行特征缩放,该算法有效,但在执行特征缩放时会抛出 ValueError

/usr/local/lib/python3.10/dist-packages/sklearn/neighbors/_classification.py:215: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return self._fit(X, y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-73-f656e2af91bb> in <cell line: 2>()
      1 model =  KNeighborsClassifier(n_neighbors=31)
----> 2 model.fit(x_train,y_train)
      3 y_pred = model.predict(x_test)
      4 print(y_pred)
      5 print(y_test)

2 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    216         "multilabel-sequences",
    217     ]:
--> 218         raise ValueError("Unknown label type: %r" % y_type)
    219 
    220 

ValueError: Unknown label type: 'continuous'
Python Pandas DataFrame scikit-学习 knn

评论

0赞 OCa 9/27/2023
您的帖子可能缺少明确的问题。

答:

1赞 Ugur Yigit 9/27/2023 #1

你能检查你的响应变量是否连续吗? 您正在执行分类任务,因此y_train或y_test中的连续变量可能会导致错误。也许缩放整个数据导致了此错误,并且您的目标变量变为连续变量。

您的响应变量应该是分类的,例如 0/1 或 Yes/No 等。

评论

0赞 Akshay Basutkar 9/29/2023
是的,它奏效了。我检查了输出,发现它也是连续形式,我纠正了它