提问人:Anna Jackson 提问时间:11/16/2023 更新时间:11/16/2023 访问量:28
为什么 tidymodels 中的 ranger 给出了不同的模型来直接调用 ranger?
Why is ranger within tidymodels giving a different model to calling ranger directly?
问:
我想知道为什么当我在 tidymodels 中使用 ranger 而不是直接使用 ranger 时,我没有得到相同的模型?
下面是一个可重现的例子:
library(tidymodels)
library(ranger)
# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
# rf model specs
rf_mod <-
rand_forest(trees = 10) |>
set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE) |>
set_mode("classification")
# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train) # OOB=4.81%
# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train,
num.trees=10, respect.unordered.factors = TRUE, probability = FALSE) # OOB=5.77%
答:
5赞
EmilHvitfeldt
11/16/2023
#1
你得到不同的模型,因为参数没有设置。如果为两种方式设置相同的种子,则会得到相同的模型拟合seed
library(tidymodels)
library(ranger)
# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
#> Joining with `by = join_by(Sepal.Length, Sepal.Width, Petal.Length,
#> Petal.Width, Species)`
# rf model specs
rf_mod <-
rand_forest(trees = 10) |>
set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE,
seed = 1234) |>
set_mode("classification")
# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train)
#> parsnip model object
#>
#> Ranger result
#>
#> Call:
#> ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~10, respect.unordered.factors = ~TRUE, probability = ~FALSE, seed = ~1234, num.threads = 1, verbose = FALSE)
#>
#> Type: Classification
#> Number of trees: 10
#> Sample size: 105
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 1
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error: 2.88 %
# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train,
num.trees=10, respect.unordered.factors = TRUE, probability = FALSE,
seed = 1234)
#> Ranger result
#>
#> Call:
#> ranger(Species ~ ., data = train, num.trees = 10, respect.unordered.factors = TRUE, probability = FALSE, seed = 1234)
#>
#> Type: Classification
#> Number of trees: 10
#> Sample size: 105
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 1
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error: 2.88 %
创建于 2023-11-15 with reprex v2.0.2
评论
0赞
Anna Jackson
11/17/2023
谢谢。我没有意识到 ranger() 中的种子参数被用于 tidymodels 选项中:-)
评论