提问人:Leonhard Geisler 提问时间:9/15/2021 最后编辑:cazmanLeonhard Geisler 更新时间:9/15/2021 访问量:137
如何处理模型时预测生成的预测数据(熔化和“未熔化”) - 丢失的变量
How to handle forecast data (melt and "unmelt") generated by modeltime prediction - lost variables
问:
下面我使用 tidyverse modeltime 包创建了一些虚假的预测数据。我获得了 2016 年的月度数据,并希望为 2020 年制作一个测试 fc。如您所见,我加载的数据采用宽格式。为了在模型时间中使用,我将其转换为长数据。在建模阶段之后,我想为 2020 年的预测值创建一个数据帧。为此,我需要以某种方式“解解”数据。不幸的是,在这个过程中,我失去了很多变量。从我想预测的 240 个变量中,最终结果我只得到 49 个。也许我是盲人,或者我不知道如何正确配置模型时间函数。我真的非常感谢一些帮助。提前致谢!
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime))
## create some senseless data to produce forecasts on...
dates <- ymd("2016-01-01")+ months(0:59)
fake_values <-
c(661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239,
661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239,
661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239,
661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239)
replicate <- rep(1,60) %*% t.default(fake_values)
replicate <- as.data.frame(replicate)
df <- bind_cols(replicate, dates) %>%
rename(c(dates = ...241))
## melt it down
data <- reshape2::melt(df, id.var='dates')
## make some senseless forecast on senseless data...
split_obj <- initial_time_split(data, prop = 0.8)
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ dates, data = training(split_obj))
## model table
models_tbl_prophet <- modeltime_table(model_fit_prophet)
## calibration
calibration_tbl_prophet <- models_tbl_prophet %>%
modeltime_calibrate(new_data = testing(split_obj))
## forecast
fc_prophet <- calibration_tbl_prophet %>%
modeltime_forecast(
new_data = testing(split_obj),
actual_data = data,
keep_data = TRUE
)
## "unmelt" that bastard again
fc_prophet <- fc_prophet %>% filter(str_detect(.key, "prediction"))
fc_prophet <- fc_prophet[,c(4,9,10)]
fc_prophet <- dplyr::filter(fc_prophet, .index >= "2020-01-01", .index <= "2020-12-01")
#fc_prophet <- fc_prophet %>% subset(fc_prophet, as.character(.index) >"2020-01-01" & as.character(.index)< "2020-12-01" )
fc_wide_prophet <- fc_prophet %>%
pivot_wider(names_from = variable, values_from = value)
答:
1赞
Matt Dancho
9/15/2021
#1
这是我的完整解决方案。我还提供了我在这里所做的事情的背景:https://github.com/business-science/modeltime/issues/133
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime))
library(timetk)
## create some senseless data to produce forecasts on...
dates <- ymd("2016-01-01")+ months(0:59)
fake_values <-
c(661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239,
661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239,
661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239,
661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,23,365,456,1174,1760,735,366,
510,580,939,1127,2397,1514,1370,832,765,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
862,779,1112,1446,2407,3917,2681,2397,1246,1125,1223,1234,1239)
replicate <- rep(1,60) %*% t.default(fake_values)
replicate <- as.data.frame(replicate)
df <- bind_cols(replicate, dates) %>%
rename(c(dates = ...241))
## melt it down
data <- reshape2::melt(df, id.var='dates')
data %>% as_tibble() -> data
data %>%
filter(as.numeric(variable) %in% 1:9) %>%
group_by(variable) %>%
plot_time_series(dates, value, .facet_ncol = 3, .smooth = F)
## make some senseless forecast on senseless data...
split_obj <- initial_time_split(data, prop = 0.8)
split_obj %>%
tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(dates, value)
split_obj_2 <- time_series_split(data, assess = "1 year", cumulative = TRUE)
split_obj_2 %>%
tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(dates, value)
model_fit_prophet <- prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(value ~ dates, data = training(split_obj))
## model table
models_tbl_prophet <- modeltime_table(model_fit_prophet)
## calibration
calibration_tbl_prophet <- models_tbl_prophet %>%
modeltime_calibrate(new_data = testing(split_obj_2))
## forecast
fc_prophet <- calibration_tbl_prophet %>%
modeltime_forecast(
new_data = testing(split_obj_2),
actual_data = data,
keep_data = TRUE
)
fc_prophet %>%
filter(as.numeric(variable) %in% 1:9) %>%
group_by(variable) %>%
plot_modeltime_forecast(.facet_ncol = 3)
## "unmelt" that bastard again
# fc_prophet <- fc_prophet %>% filter(str_detect(.key, "prediction"))
# fc_prophet <- fc_prophet[,c(4,9,10)]
# fc_prophet <- dplyr::filter(fc_prophet, .index >= "2020-01-01", .index <= "2020-12-01")
# #fc_prophet <- fc_prophet %>% subset(fc_prophet, as.character(.index) >"2020-01-01" & as.character(.index)< "2020-12-01" )
#
# fc_wide_prophet <- fc_prophet %>%
# pivot_wider(names_from = variable, values_from = value)
# Make a future forecast
refit_tbl_prophet <- calibration_tbl_prophet %>%
modeltime_refit(data = data)
future_fc_prophet <- refit_tbl_prophet %>%
modeltime_forecast(
new_data = data %>% group_by(variable) %>% future_frame(.length_out = "1 year"),
actual_data = data,
keep_data = TRUE
)
future_fc_prophet %>%
filter(as.numeric(variable) %in% 1:9) %>%
group_by(variable) %>%
plot_modeltime_forecast(.facet_ncol = 3)
# Reformat as wide
future_wide_tbl <- future_fc_prophet %>%
filter(.key == "prediction") %>%
select(.model_id, .model_desc, dates, variable, .value) %>%
pivot_wider(
id_cols = c(.model_id, .model_desc, dates),
names_from = variable,
values_from = .value
)
future_wide_tbl[names(df)]
评论
0赞
Leonhard Geisler
9/16/2021
非常感谢您的快速回复!我只是使用真实数据实施了解决方案。检查我感知到的结果,我让所有的列都知道,什么是伟大的。尽管如此,对于每个变量,我都得到了相同的预测结果(在样本中和样本外)。在我上面的例子中,你无法观察这种行为,因为我只是复制了相同的变量来预测 240 次。我可以向您发送一个包含随机数据的示例代码来说明上述内容。问候
1赞
Matt Dancho
9/16/2021
是的,你得到的预测很差,因为你使用的是先知。尝试使用具有时间序列签名功能的 xgboost。对于使用单个全局模型进行 1000 多个时间序列的预测来说,这是非常快的。我在模型时间课程中教授了这一点。university.business-science.io/p/......
0赞
Leonhard Geisler
9/16/2021
好吧,我将尝试一些不同的模型,例如 xgboost。再次感谢
0赞
Leonhard Geisler
11/3/2021
亲爱的马特,感谢您的建议,从本地模型更改为全球模型(效果很好)。尽管如此,我还是测试了在同一日期开始的时间序列。如果我有 200 个具有不同观测起点的时间序列怎么办?如果我在本地预测它们,则无需将它们聚类到相同长度的时间序列块中,因为我可以从确切的时间序列开始日期开始每个预测。我的猜测是,当我将不同长度的时间序列混合到一个全局模型中时,它不会给我带来令人满意的结果,我需要做大量的长度聚类。我就在那里吗?
下一个:在 R 中准备时序数据
评论