无法创建 DaskDMatrix

Cannot create DaskDMatrix

提问人:crbl 提问时间:11/15/2023 更新时间:11/18/2023 访问量:16

问:

我正在尝试按照文档使用 dask 训练 xgboost 模型。我的问题是我在必须创建DaskDMatrix的步骤中被阻止。无论我如何尝试,我都会收到一个错误,指出此方法不存在。我试过了:

import dask
import dask_xgboost as dxgb

client = dask.distributed.Client()

params = {'objective':'binary:logistic', 
        'booster': 'dart', 
          'n_estimators': 800,
          'max_depth': 4,
          'learning_rate': 0.02, 
          'random_state': 42}
}

dtrain = dxgb.DaskDMatrix(client, X_train, y_train)
dval = dxgb.DaskDMatrix(client, X_val, y_val)

eval_set = [(dtrain, 'train'), (dval, 'validation')]

model = dxgb.train(client, params, dtrain, num_boost_round=10, evals=eval_set, eval_metric=['logloss', 'aucpr'], verbose=True)

==> “AttributeError:模块'dask_xgboost'没有属性'DaskDMatrix'”

然后我试了一下:

import xgboost as xgb

# Create a Dask DMatrix from a Dask DataFrame
dtrain = xgb.dask.DaskDMatrix(client, X_train, y_train)

==> AttributeError:模块“xgboost”没有属性“dask”

还有:

import dask_ml.xgboost as dxgb

# Create a Dask DMatrix from a Dask DataFrame
dtrain = dxgb.DaskDMatrix(client, X_train, y_train)

==> AttributeError:模块“dask_ml.xgboost”没有属性“DaskDMatrix”

在哪里可以找到正确的代码?

xgboost DASK分布式

评论


答:

1赞 Guillaume EB 11/18/2023 #1

在我的笔记本电脑上使用新安装的 conda xgboost 和 dask-xgboost 工作代码,并遵循文档

import xgboost as xgb
import dask.array as da
import dask.distributed

client = dask.distributed.Client()

params = {'objective':'binary:logistic', 
        'booster': 'dart', 
          'n_estimators': 800,
          'max_depth': 4,
          'learning_rate': 0.02, 
          'random_state': 42}

X_train = da.random.random((10,100000))
y_train = da.random.random((1,100000))
dtrain = xgb.dask.DaskDMatrix(client, X_train, y_train)
0赞 crbl 11/18/2023 #2

是的,同时我发现 xgb.dask 需要 xgb 版本 > 0.9