在意外 KNN 预测中降低 RMSE 分数的步骤-解网

问：

我尝试使用 Surprise 库和 k-Nearest Neighbors （KNN）算法构建一个推荐系统。我遇到的主要挑战是非常高的 RMSE（均方根误差）分数，目前为 RMSE：3179.9423。

我正在使用的数据是一个插补的用户项目矩阵，其中评级是使用以下公式从客户交互中得出的： IR_iu = 100 * 购买 + 50 * 添加到收藏夹 + 15 * 与项目互动

在这个公式中，（IR_{iu} ）表示用户（ u ）对项目（ i ）的估算评级。交互是加权的，购买（购买）得分较高，添加到收藏夹得分中等，与商品的一般互动得分较低。

我的期望是发现一种更有效的方法来降低RMSE分数并提高预测的准确性。这种考虑考虑了由客户交互形成的插补用户-项目矩阵的独特特征。此外，我愿意探索可能更适合我问题的替代算法。值得一提的是，我在这个领域的经验是有限的，这标志着我第一次尝试在没有经验丰富的导师指导的情况下构建推荐系统。因此，我采取了试错法来应对挑战

df = pd.read_excel("D:\SELECT\CustomerRatings.xlsx")
df.replace(np.nan, 0, inplace = True)



# Separate the first column (user IDs) from the rest of the data
user_ids = df.iloc[:, 0]
data_without_user_ids = df.iloc[:, 1:]

# Initialize the MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 10))

# Normalize each row of the data without user IDs
normalized_data = pd.DataFrame(scaler.fit_transform(data_without_user_ids.T).T, columns=data_without_user_ids.columns)

# Combine the user IDs and the normalized data into a new DataFrame
normalized_df = pd.concat([user_ids, normalized_data], axis=1)

# 'normalized_df' now contains the user IDs and the scaled data
# Reset the index and melt the DataFrame to long format
user_item_matrix = normalized_df.reset_index()
melted_data = pd.melt(user_item_matrix, id_vars=['UserID'], var_name='item', value_name='rating')
reader = Reader(rating_scale=(0, 10))

data = Dataset.load_from_df(melted_data, reader)
from surprise import KNNBasic
from surprise.model_selection import train_test_split

# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.3)


sim_options = {
    "name": "cosine",
    "user_based": False,  # compute  similarities between items
}
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)
predictions = algo.test(testset)
accuracy.rmse(predictions)

Python 人工智能 KNN 推荐引擎协同过滤

在意外 KNN 预测中降低 RMSE 分数的步骤

Steps to Reduce RMSE Score in Surprise KNN Predictions

评论