如何创建具有两个与因子交互的日期刻度的 GAM?

How can I create a GAM that has two date scales that interact with a factor?

提问人:adkane 提问时间:11/15/2023 更新时间:11/15/2023 访问量:15

问:

我想对我的 y 变量和时间之间的关系进行建模。我怀疑夏季有季节性影响,整体趋势可能会按年反映。我还想探索与国家的互动。我不确定如何在 GAM 中写他。我的想法是按时间进行两次单独的互动。有一次效果,我会做以下事情:

 gam(y~ s(nmonth,Country, bs = "fs"), data = mydata,
           method = "REML")

我可以像这样坚持另一个互动吗?

 gam(y~ s(nmonth,Country, bs = "fs") + s(nyear,Country, bs = "fs"), data = mydata,
           method = "REML")

下面是一些示例数据,用于结构而不是应用模型:

data = structure(list(nmonth = c(12, 9, 4, 4, 3, 1, 1, 11, 9, 8, 8, 
8, 8, 8, 7, 7, 5, 5, 5, 4, 3, 1, 12, 1, 7, 6, 6, 5, 12, 12, 12, 
11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), nyear = c(2008, 2011, 
2012, 2012, 2011, 2011, 2009, 2020, 2021, 2021, 2020, 2019, 2014, 
2014, 2017, 2014, 2020, 2014, 2010, 2022, 2016, 2012, 2010, 2010, 
2007, 2007, 2007, 2007, 2020, 2016, 2016, 2022, 2021, 2020, 2019, 
2014, 2022, 2021, 2020, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2018, 2018, 2018, 2018, 2013, 2013, 2013, 2013, 2013, 2009, 2021, 
2021, 2016, 2016, 2014, 2014, 2014, 2014, 2014, 2021, 2021, 2021, 
2020, 2018, 2017, 2017, 2016, 2016, 2016, 2016, 2016, 2015, 2015, 
2015, 2011, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018), 
    Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), levels = c("England", "Ireland", "Northern Ireland", 
    "Scotland", "Wales"), class = "factor"), y= c(-1, 
    -8, 0, -1, -15, -13, 6, -39, 2, -1, 4, 1, -15, -17, -6, -9, 
    2, -14, -2, -2, -2, -6, 1, -4, -1, -3, 4, -11, 8, 9, -7, 
    2, 0, 10, -12, NA, 6, 0, -36, -7, 0, -26, -9, -6, -2, -1, 
    4, -4, 11, 4, 4, -2, 3, 3, 8, 9, -3, 7, 12, 7, 5, 2, 0, -2, 
    1, -3, -21, 2, -8, 2, 3, -1, -8, NA, -8, -20, -14, -14, -10, 
    -19, -37, -3, -8, -4, 3, -23, 12, -8, -14, -4, -17, -18, 
    -15, -9, -3, -4, -5, 5, -8, -4)), row.names = c(NA, 100L), class = "data.frame")
R 时序 交互 gam

评论


答:

1赞 Gavin Simpson 12/3/2023 #1

假设被编码为一个因子,你可以这样做,但有几种选择。您选择的模型意味着月份和年份的平滑度在所有国家/地区分别具有相同的摆动。函数的估计形状可能因国家/地区而异,但每个函数的摆动度是相同的。Country

其他选项将允许摆动因国家/地区而异,并且有几种选择。

  1. 标准因子平滑:

    y ~ Country + s(nmonth, by = Country) + s(nyear, by = Country))
    
  2. 平滑的有序因子

    df <- df |> transform(oCountry = ordered(Country)
    contrast(df$oCountry) <- "contr.treatment"
    y ~ oCountry + s(nmonth, by = oCountry) + s(nyear, by = oCountry))
    

这些只是一个连续时间变量和因子之间的相互作用。很有可能,特别是如果你在抽样年份方面有足够长的记录,季节性模式会随着长期趋势而变化。这将需要两个连续变量之间的张量积交互作用,该变量也与因子相互作用。例如:CountryCountry

gam(y ~ Country +
  s(nmonth, by = Country) +
  s(nyear, by = Country) +
  ti(nmonth, nyear, by = Country), ...)

对于我提到的模型的因子,或者如果你不需要像那样分解它,这是一种方法:

gam(y ~ Country + te(nmonth, nyear, by = Country), ...)

问题中平滑的等价值稍微棘手一些,因为基仅适用于单变量连续平滑。相反,这种形式的张量积构造在单变量情况下应该是等价的,并且在双变量情况下提供与随机光滑相同的随机曲面:fsfst2()

gam(y ~ t2(nmonth, nyear, Country, bs = c("cr", "cr", "re"), full = TRUE), ...)

除非默认值足够,否则您应该在所有平滑上设置任何内容。我不是为了清楚而设置的。kk