使用 SHAP 解释 xgboost: Check failed: end <= model。BoostedRounds() (309 vs. 133) : 超出树层的范围

Using SHAP to explain xgboost: Check failed: end <= model.BoostedRounds() (309 vs. 133) : Out of range for tree layers

提问人:lenhhoxung 提问时间:11/8/2023 更新时间:11/16/2023 访问量:34

问:

我正在使用 SHAP 使用以下代码来解释我的 xgboost 模型:

explainer = shap.TreeExplainer(model)
explainer.shap_values(pd_df)
# explainer(xgboost.DMatrix(pd_df, label=label))

但它会抛出以下错误:

XGBoostError                              Traceback (most recent call last)
/app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/shap/explainers/_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
    357                 try:
--> 358                     phi = self.model.original_model.predict(
    359                         X, iteration_range=(0, tree_limit), pred_contribs=True,

/app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/core.py in predict(self, data, output_margin, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features, training, iteration_range, strict_shape)
   2295         dims = c_bst_ulong()
-> 2296         _check_call(
   2297             _LIB.XGBoosterPredictFromDMatrix(

/app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/core.py in _check_call(ret)
    280     if ret != 0:
--> 281         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    282 

XGBoostError: [06:41:59] /workspace/src/gbm/gbtree.h:125: Check failed: end <= model.BoostedRounds() (309 vs. 133) : Out of range for tree layers.
Stack trace:
  [bt] (0) /app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0x45a59a) [0x7f315de3b59a]
  [bt] (1) /app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0x47123f) [0x7f315de5223f]
  [bt] (2) /app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0x47166b) [0x7f315de5266b]
  [bt] (3) /app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(+0x4c463b) [0x7f315dea563b]
  [bt] (4) /app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/xgboost/lib/libxgboost.so(XGBoosterPredictFromDMatrix+0x2be) [0x7f315db4de2e]
  [bt] (5) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f371af33dec]
  [bt] (6) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f371af33715]
  [bt] (7) /usr/local/lib/python3.9/lib-dynload/_ctypes.cpython-39-x86_64-linux-gnu.so(+0x1286f) [0x7f371b14886f]
  [bt] (8) /usr/local/lib/python3.9/lib-dynload/_ctypes.cpython-39-x86_64-linux-gnu.so(+0xc2fb) [0x7f371b1422fb]

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-20-40f7d265c006> in <cell line: 1>()
----> 1 explainer.shap_values(pd_df)

/app/dataiku/DSS_DATA_DIR/code-envs/python/env/lib/python3.9/site-packages/shap/explainers/_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
    365                         "See https://github.com/slundberg/shap/issues/580."
    366                     )
--> 367                     raise ValueError(emsg) from e
    368 
    369                 if check_additivity and self.model.model_output == "raw":

ValueError: This reshape error is often caused by passing a bad data matrix to SHAP. See https://github.com/slundberg/shap/issues/580.

模型的预测工作正常:

(model.predict(pd_df) == label).mean()
>> 0.8375209380234506

XGBoost 版本:2.0.0 SHAP 版本 0.43.0

可能是什么原因?

XGBOOST SHAP

评论


答:

0赞 forgetful_coder 11/16/2023 #1

抱歉,由于声誉不足而无法发表评论。 难道您正在eval_set和提前停止进行多类分类吗?

以这个例子为例:https://shap-lrjball.readthedocs.io/en/latest/example_notebooks/tree_explainer/XGBoost%20Multi-class%20Example.html

将安装说明替换为:

model = xgboost.XGBClassifier(objective="binary:logistic", max_depth=4, n_estimators=10, early_stopping_rounds=5)
model.fit(X_train, Y_train, eval_set=[(X_train, Y_train), (X_test, Y_test)], verbose=False)

它会抛出你的错误XGBoostError: [16:49:37] /workspace/src/gbm/gbtree.h:125: Check failed: end <= model.BoostedRounds() (30 vs. 10) : Out of range for tree layers.

如果您恢复到原始状态(或干脆放弃early_stopping),应该没问题:

model = xgboost.XGBClassifier(objective="binary:logistic", max_depth=4, n_estimators=10)
model.fit(X_train, Y_train)

shap_values = shap.TreeExplainer(model).shap_values(X_test)
shap.summary_plot(shap_values, X_test)

当然,这不是一个真正的原因,但可能会让你摆脱困境。