提问人:Mustafa Kamal 提问时间:9/2/2023 最后编辑:Mustafa Kamal 更新时间:9/3/2023 访问量:71
如何在 R 中获取插入符号模型的 SHAP 值?
How to get SHAP values for caret models in R?
问:
我正在尝试为我的模型获取 SHAP 值(我使用插入符号构建)。我有一个射频模型,数据是:
data = structure(list(Main_Street = structure(c(2L, 3L, 2L, 1L, 3L,
2L, 3L, 1L, 2L, 2L), .Label = c("64", "70", "270"), class = "factor"),
Blocked_Lanes = c(3L, 4L, 2L, 1L, 1L, 2L, 6L, 3L, 3L, 3L),
Total_Vehicle_Count = c(1L, 2L, 2L, 2L, 1L, 4L, 3L, 2L, 2L,
1L), Tractor_Trailer_Count = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), Weather_Winter_Storm = structure(c(1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("No", "Yes"), class = "factor"),
Weather_Rain = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L), .Label = c("No", "Yes"), class = "factor"), Injuries_Count = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L), Accident_Overturned_Car = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("No", "Yes"
), class = "factor"), Fatalities_Count = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Speed = c(65L, 46L, 10L, 42L, 40L,
21L, 15L, 57L, 59L, 59L), Total_Volume = c(48.7, 22.5, 47.3,
102, 138, 75.3, 60.5, 83.3, 18, 26.7), Occupancy = c(3.5,
1.7, 40.8, 23.8, 14.1, 31, 27.1, 4.9, 2.6, 2.5), Lanes_Cleared_Duration = c(53L,
35L, 32L, 4L, 11L, 35L, 42L, 12L, 36L, 69L)), row.names = c(NA,
-10L), class = "data.frame")
射频模型为:
fitControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10)
set.seed (2356)
randomforestGrid <- expand.grid(mtry = c(2:sqrt(61))) # better be a dataframe
set.seed(2356)
rf_model <- train(Lanes_Cleared_Duration~.,
data = training,
method = "rf",
trControl = fitControl,
metric= "RMSE",
verbose = FALSE,
tuneGrid = randomforestGrid,
n.trees = c(1:50)*100)
关于如何制作 SHAP 图有很多来源,但没有一个适用于我的数据,而且我不断收到错误。例如,这篇文章试图问一个类似的问题,但没有解决问题。这是我想得到类似的东西的情节:
是否也可以导出包含每个变量的 SHAP 值的数据框?
答:
2赞
Michael M
9/3/2023
#1
下面是一个从我们的 {kernelshap} README 稍作修改的示例:
library(caret)
library(kernelshap)
library(shapviz)
fit <- train(
Sepal.Length ~ .,
data = iris,
method = "rf",
tuneGrid = data.frame(mtry = 2:4),
trControl = trainControl(method = "oob")
)
# take subsample as bg_X if data has >500 rows or so
s <- kernelshap(fit, X = iris[, -1], bg_X = iris)
sv <- shapviz(s)
sv_importance(sv, kind = "bee")
sv_dependence(sv, v = colnames(iris[, -1]))
head(s$S)
Sepal.Width Petal.Length Petal.Width Species
[1,] 0.18710551 -0.7689923 -0.11966640 -0.02138098
[2,] -0.04975942 -0.8421627 -0.16929579 -0.02247297
[3,] -0.05134404 -0.9807516 -0.21007903 -0.02603232
[4,] -0.01474815 -0.8314441 -0.18571834 -0.02234505
[5,] 0.16345002 -0.8066228 -0.13735372 -0.02104766
[6,] 0.27269103 -0.6231013 -0.06449333 -0.01560054
评论
1赞
Mustafa Kamal
9/5/2023
1- 为什么我得到一个列表而不是一个包含每个变量的 SHAP 值的数据框?2- 为什么代码需要这么长时间才能运行?3- 重要提示,当我安装库时,我的 R 损坏了,其他库在调用时报告了错误。我必须卸载并安装 R 和 Rstudio 才能使库正常工作。感谢您提供代码!
0赞
Michael M
9/5/2023
不知道是什么原因导致了这个问题。SHAP 值以数值矩阵的形式存储在包含其他内容的列表中。有很多方法可以使事情变得更快,但这需要你这边有一个工作的例子。
评论