提问人:Marquee 提问时间:11/13/2023 更新时间:11/13/2023 访问量:30
如何评估学习排名 XGBoost
How to evaluate Learning To Rank XGBoost
问:
我是将学习排名方法与 XGBRanker 一起使用的新手。我的数据包含 qid |客户数据 |产品数据 |was_buy (0,1)。我使用 XGBRanker 准备了一个模型,目的是为每个客户推荐顶级 K 产品(我不确定它是否正确,但我相信是这样)。但是,我的问题是,如何验证结果?
- 我准备了数据,定义了 qid,将数据拆分为训练和 测试。我保存了重要信息(产品 ID 和用户电子邮件) 用于备份以解释结果并将其从数据中删除 这将进入模型。
# Rozdelenie dát na vstupné premenné a label
X = df.drop('was_buy', axis=1)
y = df['was_buy']
# Add qid (email)
qid = X['email'].factorize()[0] # Konverzia emailu na číselné hodnoty pre qid
# Split data into train and test
X_train, X_test, y_train, y_test, qid_train, qid_test = train_test_split(X, y, qid, test_size=0.3, random_state=42, stratify=y)
# Save and delete identifiers
train_email_ids = X_train['email'].copy()
train_entity_ids = X_train['entity_id'].copy()
test_email_ids = X_test['email'].copy()
test_entity_ids = X_test['entity_id'].copy()
X_train = X_train.drop(['email', 'entity_id'], axis=1)
X_test = X_test.drop(['email', 'entity_id'], axis=1)
# Sor data by qid
sort_idx_train = qid_train.argsort()
X_train = X_train.iloc[sort_idx_train]
y_train = y_train.iloc[sort_idx_train]
qid_train = qid_train[sort_idx_train]
train_entity_ids = train_entity_ids.iloc[sort_idx_train]
sort_idx_test = qid_test.argsort()
X_test = X_test.iloc[sort_idx_test]
y_test = y_test.iloc[sort_idx_test]
qid_test = qid_test[sort_idx_test]
test_entity_ids = test_entity_ids.iloc[sort_idx_test]
- 我定义了一个“ndcg_scorer”,因为我需要用它来 优化,因为它不直接存在于库中,我有 以这种方式定义它。
from sklearn.metrics import make_scorer, ndcg_score
# Define scoring function
def ndcg_scorer(y_true, y_pred):
return ndcg_score([y_true], [y_pred])
# Prepare scorer object
ndcg_scorer = make_scorer(ndcg_scorer, needs_proba=False, greater_is_better=True)
#I define params here and optimize model vy Random Search
- 我用顶级参数重新运行了模型并得到了结果。
best_parameters = random_search.best_params_
ranker_optimized = xgb.XGBRanker(
tree_method="hist",
objective="rank:ndcg",
n_estimators=100,
**best_parameters # Unpacking the best parameters
)
ranker_optimized.fit(X_train, y_train, qid=qid_train)
preds_optimized = ranker_optimized.predict(X_test)
# Final dataframe, where i add identifiers of customer and product
result_df_optimized = pd.DataFrame({
'email': test_email_ids.values[sort_idx_test],
'entity_id': test_entity_ids.values[sort_idx_test],
'pred_value': preds_optimized
})
- 请问现在如何定义指标来评估模型?
答: 暂无答案
评论