为什么 tidymodels 中的 ranger 给出了不同的模型来直接调用 ranger?

Why is ranger within tidymodels giving a different model to calling ranger directly?

提问人:Anna Jackson 提问时间:11/16/2023 更新时间:11/16/2023 访问量:28

问:

我想知道为什么当我在 tidymodels 中使用 ranger 而不是直接使用 ranger 时,我没有得到相同的模型?

下面是一个可重现的例子:

library(tidymodels)
library(ranger)

# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)

# rf model specs
rf_mod <- 
  rand_forest(trees = 10)  |>  
  set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE) |> 
  set_mode("classification")

# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train) # OOB=4.81%

# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train, 
       num.trees=10, respect.unordered.factors = TRUE, probability = FALSE) # OOB=5.77%
R 随机森林 整洁模型

评论


答:

5赞 EmilHvitfeldt 11/16/2023 #1

你得到不同的模型,因为参数没有设置。如果为两种方式设置相同的种子,则会得到相同的模型拟合seed

library(tidymodels)
library(ranger)

# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
#> Joining with `by = join_by(Sepal.Length, Sepal.Width, Petal.Length,
#> Petal.Width, Species)`

# rf model specs
rf_mod <- 
  rand_forest(trees = 10)  |>  
  set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE, 
             seed = 1234) |> 
  set_mode("classification")

# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train)
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~10,      respect.unordered.factors = ~TRUE, probability = ~FALSE,      seed = ~1234, num.threads = 1, verbose = FALSE) 
#> 
#> Type:                             Classification 
#> Number of trees:                  10 
#> Sample size:                      105 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             2.88 %

# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train, 
       num.trees=10, respect.unordered.factors = TRUE, probability = FALSE, 
       seed = 1234)
#> Ranger result
#> 
#> Call:
#>  ranger(Species ~ ., data = train, num.trees = 10, respect.unordered.factors = TRUE,      probability = FALSE, seed = 1234) 
#> 
#> Type:                             Classification 
#> Number of trees:                  10 
#> Sample size:                      105 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             2.88 %

创建于 2023-11-15 with reprex v2.0.2

评论

0赞 Anna Jackson 11/17/2023
谢谢。我没有意识到 ranger() 中的种子参数被用于 tidymodels 选项中:-)