提问人:adkane 提问时间:11/15/2023 更新时间:11/15/2023 访问量:15
如何创建具有两个与因子交互的日期刻度的 GAM?
How can I create a GAM that has two date scales that interact with a factor?
问:
我想对我的 y 变量和时间之间的关系进行建模。我怀疑夏季有季节性影响,整体趋势可能会按年反映。我还想探索与国家的互动。我不确定如何在 GAM 中写他。我的想法是按时间进行两次单独的互动。有一次效果,我会做以下事情:
gam(y~ s(nmonth,Country, bs = "fs"), data = mydata,
method = "REML")
我可以像这样坚持另一个互动吗?
gam(y~ s(nmonth,Country, bs = "fs") + s(nyear,Country, bs = "fs"), data = mydata,
method = "REML")
下面是一些示例数据,用于结构而不是应用模型:
data = structure(list(nmonth = c(12, 9, 4, 4, 3, 1, 1, 11, 9, 8, 8,
8, 8, 8, 7, 7, 5, 5, 5, 4, 3, 1, 12, 1, 7, 6, 6, 5, 12, 12, 12,
11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), nyear = c(2008, 2011,
2012, 2012, 2011, 2011, 2009, 2020, 2021, 2021, 2020, 2019, 2014,
2014, 2017, 2014, 2020, 2014, 2010, 2022, 2016, 2012, 2010, 2010,
2007, 2007, 2007, 2007, 2020, 2016, 2016, 2022, 2021, 2020, 2019,
2014, 2022, 2021, 2020, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2018, 2018, 2018, 2018, 2013, 2013, 2013, 2013, 2013, 2009, 2021,
2021, 2016, 2016, 2014, 2014, 2014, 2014, 2014, 2021, 2021, 2021,
2020, 2018, 2017, 2017, 2016, 2016, 2016, 2016, 2016, 2015, 2015,
2015, 2011, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018),
Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), levels = c("England", "Ireland", "Northern Ireland",
"Scotland", "Wales"), class = "factor"), y= c(-1,
-8, 0, -1, -15, -13, 6, -39, 2, -1, 4, 1, -15, -17, -6, -9,
2, -14, -2, -2, -2, -6, 1, -4, -1, -3, 4, -11, 8, 9, -7,
2, 0, 10, -12, NA, 6, 0, -36, -7, 0, -26, -9, -6, -2, -1,
4, -4, 11, 4, 4, -2, 3, 3, 8, 9, -3, 7, 12, 7, 5, 2, 0, -2,
1, -3, -21, 2, -8, 2, 3, -1, -8, NA, -8, -20, -14, -14, -10,
-19, -37, -3, -8, -4, 3, -23, 12, -8, -14, -4, -17, -18,
-15, -9, -3, -4, -5, 5, -8, -4)), row.names = c(NA, 100L), class = "data.frame")
答:
1赞
Gavin Simpson
12/3/2023
#1
假设被编码为一个因子,你可以这样做,但有几种选择。您选择的模型意味着月份和年份的平滑度在所有国家/地区分别具有相同的摆动。函数的估计形状可能因国家/地区而异,但每个函数的摆动度是相同的。Country
其他选项将允许摆动因国家/地区而异,并且有几种选择。
标准因子平滑:
y ~ Country + s(nmonth, by = Country) + s(nyear, by = Country))
平滑的有序因子
df <- df |> transform(oCountry = ordered(Country) contrast(df$oCountry) <- "contr.treatment" y ~ oCountry + s(nmonth, by = oCountry) + s(nyear, by = oCountry))
这些只是一个连续时间变量和因子之间的相互作用。很有可能,特别是如果你在抽样年份方面有足够长的记录,季节性模式会随着长期趋势而变化。这将需要两个连续变量之间的张量积交互作用,该变量也与因子相互作用。例如:Country
Country
gam(y ~ Country +
s(nmonth, by = Country) +
s(nyear, by = Country) +
ti(nmonth, nyear, by = Country), ...)
对于我提到的模型的因子,或者如果你不需要像那样分解它,这是一种方法:
gam(y ~ Country + te(nmonth, nyear, by = Country), ...)
问题中平滑的等价值稍微棘手一些,因为基仅适用于单变量连续平滑。相反,这种形式的张量积构造在单变量情况下应该是等价的,并且在双变量情况下提供与随机光滑相同的随机曲面:fs
fs
t2()
gam(y ~ t2(nmonth, nyear, Country, bs = c("cr", "cr", "re"), full = TRUE), ...)
除非默认值足够,否则您应该在所有平滑上设置任何内容。我不是为了清楚而设置的。k
k
评论