对基本模型使用 h2o automl 时出现 h2o GBM 检查点错误

h2o GBM checkpointing error when using h2o automl for the base model

提问人:mindstorm84 提问时间:7/29/2023 更新时间:7/29/2023 访问量:26

问:

我想使用检查点在一组新的观测值上重新训练我的 h2o 模型,但面临错误。使用检查点时,我的代码在训练步骤上失败。我的原始模型是使用 h2o automl 创建的,我验证了 aml.leader 是 GBM 模型。

该错误与无法修改max_depth字段有关。但是,我没有在gbm_continued定义中修改max_depth参数。

#ds_file is my local dataset with 4k rows
ds= h2o.import_file(ds_file)
splits = ds.split_frame(ratios= [0.8], seed=1)
train = splits[0]
test = splits[1]
aml = H2OAutoML(max_runtime_secs = 60, seed = 1 , project_name = 'test')
aml.train(y=y, training_frame = train, leaderboard_frame = test)
#verify that aml.leader is the GBM model
print(aml.leader)
#H2OGradientBoostingEstimator : Gradient Boosting Machine
#Model Key: GBM_1_AutoML_1_20230727_145804
#ds2_file is my local dataset with 30k rows
ds2 = h2o.import_file(ds2_file)
Splits2 = ds2.split_frame(ratios= [0.8], seed=1)
train2 = splits2[0]
test2 = splits2[1]
gbm_continued = H2OGradientBoostingEstimator(model_id = 'gbm_continued', checkpoint = aml.leader)
gbm_continued.train(x=predictors, y = y, training_frame = train2)

这是错误消息:

>>> gbm_continued.train(x=predictors, y = y, training_frame = train2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "h2o-dev/lib/lib/python3.8/site-packages/h2o/estimators/estimator_base.py", line 108, in train
    self._train(parms, verbose=verbose)
  File "dev_items/h2o-dev/lib/lib/python3.8/site-packages/h2o/estimators/estimator_base.py", line 187, in _train
    model_builder_json = h2o.api("POST /%d/ModelBuilders/%s" % (rest_ver, self.algo), data=parms)
  File "h2o-dev/lib/lib/python3.8/site-packages/h2o/h2o.py", line 124, in api
    return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
  File "h2o-dev/lib/lib/python3.8/site-packages/h2o/backend/connection.py", line 498, in request
    return self._process_response(resp, save_to)
  File "h2o-dev/lib/lib/python3.8/site-packages/h2o/backend/connection.py", line 852, in _process_response
    raise H2OResponseError(data)
h2o.exceptions.H2OResponseError: ModelBuilderErrorV3  (water.exceptions.H2OModelBuilderIllegalArgumentException):
    timestamp = 1690566243266
    error_url = '/3/ModelBuilders/gbm'
    msg = 'Illegal argument(s) for GBM model: gbm_continued.  Details: ERRR on field: _max_depth: Field _max_depth cannot be modified if checkpoint is specified!\nERRR on field: _ntrees: If checkpoint is specified then requested ntrees must be higher than 409'
    dev_msg = 'Illegal argument(s) for GBM model: gbm_continued.  Details: ERRR on field: _max_depth: Field _max_depth cannot be modified if checkpoint is specified!\nERRR on field: _ntrees: If checkpoint is specified then requested ntrees must be higher than 409'
    http_status = 412

我发现了一个关于这个主题的相关问题,但没有解决这个问题。

蟒蛇 H2O

评论


答:

1赞 Wendy 7/29/2023 #1

要解决您遇到的错误,请尝试以下操作:

gbm_autoML = h2o.get_model(aml.leader) gbm_continued = H2OGradientBoostingEstimator(model_id = 'gbm_continued', max_depth = gbm_autoML.actual_params['max_depth'], ntrees = gbm_autoML.actual_params['ntrees']+2, checkpoint = aml.leader)

继续训练 GBM 模型,这意味着您正在向模型中添加更多树。这就是为什么我在 ntrees 参数中添加了 2。随意将 2 更改为您想要的任何其他内容,只要它>= 1。

希望这对您有所帮助,祝您好运。