在意外 KNN 预测中降低 RMSE 分数的步骤
Steps to Reduce RMSE Score in Surprise KNN Predictions
我尝试使用 Surprise 库和 k-Nearest Neighbors (KNN) 算法构建一个推荐系统。我遇到的主要挑战是非常高的 RMSE(均方根误差)分数,目前为 RMSE:3179.9423。
我正在使用的数据是一个插补的用户项目矩阵,其中评级是使用以下公式从客户交互中得出的: IR_iu = 100 * 购买 + 50 * 添加到收藏夹 + 15 * 与项目互动
在这个公式中,(IR_{iu} ) 表示用户 ( u ) 对项目 ( i ) 的估算评级。交互是加权的,购买(购买)得分较高,添加到收藏夹得分中等,与商品的一般互动得分较低。
df = pd.read_excel("D:\SELECT\CustomerRatings.xlsx")
df.replace(np.nan, 0, inplace = True)
# Separate the first column (user IDs) from the rest of the data
user_ids = df.iloc[:, 0]
data_without_user_ids = df.iloc[:, 1:]
# Initialize the MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 10))
# Normalize each row of the data without user IDs
normalized_data = pd.DataFrame(scaler.fit_transform(data_without_user_ids.T).T, columns=data_without_user_ids.columns)
# Combine the user IDs and the normalized data into a new DataFrame
normalized_df = pd.concat([user_ids, normalized_data], axis=1)
# 'normalized_df' now contains the user IDs and the scaled data
# Reset the index and melt the DataFrame to long format
user_item_matrix = normalized_df.reset_index()
melted_data = pd.melt(user_item_matrix, id_vars=['UserID'], var_name='item', value_name='rating')
reader = Reader(rating_scale=(0, 10))
data = Dataset.load_from_df(melted_data, reader)
from surprise import KNNBasic
from surprise.model_selection import train_test_split
# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.3)
sim_options = {
"name": "cosine",
"user_based": False, # compute similarities between items
algo = KNNBasic(sim_options=sim_options)
predictions = algo.test(testset)
