如何在 R 中获取插入符号模型的 SHAP 值?

How to get SHAP values for caret models in R?

提问人:Mustafa Kamal 提问时间:9/2/2023 最后编辑:Mustafa Kamal 更新时间:9/3/2023 访问量:71

问:

我正在尝试为我的模型获取 SHAP 值(我使用插入符号构建)。我有一个射频模型,数据是:

data = structure(list(Main_Street = structure(c(2L, 3L, 2L, 1L, 3L, 
2L, 3L, 1L, 2L, 2L), .Label = c("64", "70", "270"), class = "factor"), 
    Blocked_Lanes = c(3L, 4L, 2L, 1L, 1L, 2L, 6L, 3L, 3L, 3L), 
    Total_Vehicle_Count = c(1L, 2L, 2L, 2L, 1L, 4L, 3L, 2L, 2L, 
    1L), Tractor_Trailer_Count = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), Weather_Winter_Storm = structure(c(1L, 2L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("No", "Yes"), class = "factor"), 
    Weather_Rain = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 1L), .Label = c("No", "Yes"), class = "factor"), Injuries_Count = c(0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L), Accident_Overturned_Car = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("No", "Yes"
    ), class = "factor"), Fatalities_Count = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), Speed = c(65L, 46L, 10L, 42L, 40L, 
    21L, 15L, 57L, 59L, 59L), Total_Volume = c(48.7, 22.5, 47.3, 
    102, 138, 75.3, 60.5, 83.3, 18, 26.7), Occupancy = c(3.5, 
    1.7, 40.8, 23.8, 14.1, 31, 27.1, 4.9, 2.6, 2.5), Lanes_Cleared_Duration = c(53L, 
    35L, 32L, 4L, 11L, 35L, 42L, 12L, 36L, 69L)), row.names = c(NA, 
-10L), class = "data.frame")

射频模型为:

fitControl <- trainControl(method = "repeatedcv", 
                           number = 10, 
                           repeats = 10) 

set.seed (2356)
randomforestGrid <-  expand.grid(mtry = c(2:sqrt(61))) # better be a dataframe
set.seed(2356)
rf_model <- train(Lanes_Cleared_Duration~.,
                 data = training, 
                 method = "rf", 
                 trControl = fitControl, 
                 metric= "RMSE",
                 verbose = FALSE, 
                 tuneGrid = randomforestGrid,
               n.trees = c(1:50)*100)

关于如何制作 SHAP 图有很多来源,但没有一个适用于我的数据,而且我不断收到错误。例如,这篇文章试图问一个类似的问题,但没有解决问题。这是我想得到类似的东西的情节:SHAP Plot

是否也可以导出包含每个变量的 SHAP 值的数据框?

机器学习 R-Caret SHAP 解释

评论

0赞 Michael M 9/2/2023
您的代码不起作用。

答:

2赞 Michael M 9/3/2023 #1

下面是一个从我们的 {kernelshap} README 稍作修改的示例:

library(caret)
library(kernelshap)
library(shapviz)

fit <- train(
  Sepal.Length ~ ., 
  data = iris, 
  method = "rf", 
  tuneGrid = data.frame(mtry = 2:4),
  trControl = trainControl(method = "oob")
)

# take subsample as bg_X if data has >500 rows or so
s <- kernelshap(fit, X = iris[, -1], bg_X = iris) 
sv <- shapviz(s)
sv_importance(sv, kind = "bee")
sv_dependence(sv, v = colnames(iris[, -1]))

head(s$S)
     Sepal.Width Petal.Length Petal.Width     Species
[1,]  0.18710551   -0.7689923 -0.11966640 -0.02138098
[2,] -0.04975942   -0.8421627 -0.16929579 -0.02247297
[3,] -0.05134404   -0.9807516 -0.21007903 -0.02603232
[4,] -0.01474815   -0.8314441 -0.18571834 -0.02234505
[5,]  0.16345002   -0.8066228 -0.13735372 -0.02104766
[6,]  0.27269103   -0.6231013 -0.06449333 -0.01560054

enter image description here enter image description here

评论

1赞 Mustafa Kamal 9/5/2023
1- 为什么我得到一个列表而不是一个包含每个变量的 SHAP 值的数据框?2- 为什么代码需要这么长时间才能运行?3- 重要提示,当我安装库时,我的 R 损坏了,其他库在调用时报告了错误。我必须卸载并安装 R 和 Rstudio 才能使库正常工作。感谢您提供代码!
0赞 Michael M 9/5/2023
不知道是什么原因导致了这个问题。SHAP 值以数值矩阵的形式存储在包含其他内容的列表中。有很多方法可以使事情变得更快,但这需要你这边有一个工作的例子。