提问人:Colton Stephens 提问时间:11/9/2023 最后编辑:Colton Stephens 更新时间:11/21/2023 访问量:106
R 在 1:36:14 而不是 2:00:00 从 PDT 切换到 PST - Lubridate 在切换前分配时区
R switches from PDT to PST at 1:36:14 and not at 2:00:00 - Lubridate assigns time zone before switch
问:
在查看从 PDT 到 PST 的时区更改重叠的日期时间值时,R 似乎在 1:36:14 切换时区,而不是按预期在 2:00:00 切换时区。具体而言,R 将 PST 时区分配给 2021-11-07 01:36:14 之后的所有日期时间(如下所示):
x <-c(
"2021-11-07 1:00:00",
"2021-11-07 1:00:01",
"2021-11-07 1:35:00",
"2021-11-07 1:36:00",
"2021-11-07 1:36:10",
"2021-11-07 1:36:14",
"2021-11-07 1:36:15",
"2021-11-07 1:36:30",
"2021-11-07 1:36:59",
"2021-11-07 1:45:00",
"2021-11-07 1:59:59",
"2021-11-07 2:00:00",
"2021-11-07 2:30:00"
)
x_pst <- as.POSIXct(x, tz = "PST8PDT")
> x_pst
# ...
[5] "2021-11-07 01:36:10 PDT" "2021-11-07 01:36:14 PDT"
[7] "2021-11-07 01:36:15 PST" "2021-11-07 01:36:30 PST"
# ...
除此之外,lubridate 似乎在切换之前将所有日期时间调整为 PST(使用相同的数据):
x_pst <- lubridate::as_datetime(x, tz = "PST8PDT")
> x_pst
[1] "2021-11-07 01:00:00 PST" "2021-11-07 01:00:01 PST"
[3] "2021-11-07 01:35:00 PST" "2021-11-07 01:36:00 PST"
[5] "2021-11-07 01:36:10 PST" "2021-11-07 01:36:14 PST"
[7] "2021-11-07 01:36:15 PST" "2021-11-07 01:36:30 PST"
[9] "2021-11-07 01:36:59 PST" "2021-11-07 01:45:00 PST"
[11] "2021-11-07 01:59:59 PST" "2021-11-07 02:00:00 PST"
[13] "2021-11-07 02:30:00 PST"
x_pst <- lubridate::ymd_hms(x, tz = "PST8PDT")
> x_pst
# same output as above
那么,为什么时区会在如此特定的时间切换,以及 lubridate 通过将 PST 分配给更改前的所有日期时间来做什么?
会议信息:
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: US/Pacific
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.1 generics_0.1.3 tools_4.3.1
[4] lubridate_1.9.3 timechange_0.2.0
答:
这不是一个完整的答案,但我希望有更多专业知识的人可以在此基础上再接再厉。
as.POSIXct
在进入代码之前,我首先想提供一些上下文。我们从定义了多个方法的泛型函数开始。as.POSIXct
S3
as.POSIXct
#> function (x, tz = "", ...)
#> UseMethod("as.POSIXct")
methods(as.POSIXct)
#> [1] as.POSIXct.Date as.POSIXct.default as.POSIXct.numeric as.POSIXct.POSIXlt
#> see '?methods' for accessing help and source code
对于 OP 给出的示例,由于我们正在处理字符数据类型,因此我们将使用以下方法:default
as.POSIXct.default
#> function (x, tz = "", ...)
#> {
#> if (inherits(x, "POSIXct"))
#> return(if (missing(tz)) x else .POSIXct(x, tz))
#> if (is.null(x))
#> return(.POSIXct(numeric(), tz))
#> if (is.character(x) || is.factor(x))
#> return(as.POSIXct(as.POSIXlt(x, tz, ...), tz, ...))
#> if (is.logical(x) && all(is.na(x)))
#> return(.POSIXct(as.numeric(x), tz))
#> stop(gettextf("do not know how to convert '%s' to class %s",
#> deparse1(substitute(x)), dQuote("POSIXct")), domain = NA)
#> }
这使我们调用(上面的第三个条件)一个泛型函数,它恰好有一个字符方法:.我不会粘贴源代码,但该函数的核心是 .as.POSIXlt
S3
as.POSIXlt.character
strptime
strptime
#> function (x, format, tz = "")
#> .Internal(strptime(if (is.character(x)) x else if (is.object(x)) `names<-`(as.character(x),
#> names(x)) else `storage.mode<-`(x, "character"), format, tz))
您可以在此处查看代码。我最初尝试从逻辑上遵循代码,但事实证明这非常困难。C
RApiDatetime
幸运的是,有一个包 RApiDatetime(感谢 Dirk!),它的功能是: .根据 OP 提供的值调用它,我们有:RApiDatetime::rapistrptime
RApiDatetime::rapistrptime(x, fmt = "%Y-%m-%d %H:%M:%OS", "PST8PDT")
#> $sec
#> [1] 0 1 0 0 10 14 15 30 59 0 59 0 0
#>
#> $min
#> [1] 0 0 35 36 36 36 36 36 36 45 59 0 30
#>
#> $hour
#> [1] 1 1 1 1 1 1 1 1 1 1 1 2 2
#>
#> $mday
#> [1] 7 7 7 7 7 7 7 7 7 7 7 7 7
#>
#> $mon
#> [1] 10 10 10 10 10 10 10 10 10 10 10 10 10
#>
#> $year
#> [1] 121 121 121 121 121 121 121 121 121 121 121 121 121
#>
#> $wday
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0
#>
#> $yday
#> [1] 310 310 310 310 310 310 310 310 310 310 310 310 310
#>
#> $isdst
#> [1] 1 1 1 1 1 1 0 0 0 0 0 0 0
#>
#> $zone
#> [1] "PDT" "PDT" "PDT" "PDT" "PDT" "PDT" "PST" "PST" "PST" "PST" "PST" "PST" "PST"
#>
#> $gmtoff
#> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
#>
#> attr(,"class")
#> [1] "POSIXlt" "POSIXt"
#> attr(,"tzone")
#> [1] "PST8PDT" "PST" "PDT"
我们看到这个领域看起来值得研究。在克隆了 repo 并粗略地使用了 之后,我更容易遵循路径。我们发现背后的真正行动就发生在这里。isdst
printf
isdist
.
.
OK = tm->tm_year < 138 && tm->tm_year >= (have_broken_mktime() ? 70 : 02);
if(OK) {
res = (double) mktime(tm);
if (res == -1.) return res;
.
.
mktime
最后,我们在评论中谈到了我的主张。mktime
我写了这个非常简单的函数来查看调用后我们的结构会发生什么:C++
mktime
#include <Rcpp.h>
using namespace Rcpp;
#include <time.h>
#include <stdio.h>
// [[Rcpp::export]]
void CheckMkTime(int tm_sec) {
struct tm info;
info.tm_sec = tm_sec;
info.tm_min = 36;
info.tm_hour = 1;
info.tm_mday = 7;
info.tm_mon = 10;
info.tm_year = 121;
info.tm_wday = 0;
info.tm_yday = 310;
info.tm_isdst = -1;
time_t val = mktime(&info);
printf("mktime_res: %jd,\n tm_zone: %s,\n tm_gmtoff: %ld,\n tm_sec: %d,\n "
"tm_min: %d,\n tm_hour: %d,\n tm_mday: %d,\n tm_mon: %d,\n "
"tm_year: %d,\n tm_wday: %d,\n tm_yday: %d,\n tm_isdst: %d,\n",
val,
info.tm_zone,
info.tm_gmtoff,
info.tm_sec,
info.tm_min,
info.tm_hour,
info.tm_mday,
info.tm_mon,
info.tm_year,
info.tm_wday,
info.tm_yday,
info.tm_isdst);
}
并调用它,我们有:tm_sec = 14
CheckMkTime(14)
#> mktime_res: 1636274174,
#> tm_zone: PDT,
#> tm_gmtoff: -25200,
#> tm_sec: 14,
#> tm_min: 36,
#> tm_hour: 1,
#> tm_mday: 7,
#> tm_mon: 10,
#> tm_year: 121,
#> tm_wday: 0,
#> tm_yday: 310,
#> tm_isdst: 1,
我们看到:tm_sec = 15
CheckMkTime(15)
#> mktime_res: 1636277775,
#> tm_zone: PST,
#> tm_gmtoff: -28800,
#> tm_sec: 15,
#> tm_min: 36,
#> tm_hour: 1,
#> tm_mday: 7,
#> tm_mon: 10,
#> tm_year: 121,
#> tm_wday: 0,
#> tm_yday: 310,
#> tm_isdst: 0,
所以问题对吗?
mktime
我不太确定...
我写了纯代码:C
#include <time.h>
#include <stdio.h>
int main(void) {
struct tm info;
info.tm_sec = 14;
info.tm_min = 36;
info.tm_hour = 1;
info.tm_mday = 7;
info.tm_mon = 10;
info.tm_year = 121;
info.tm_wday = 0;
info.tm_yday = 310;
info.tm_isdst = -1;
time_t val = mktime(&info);
printf("mktime_res: %jd,\n tm_zone: %s,\n tm_gmtoff: %ld,\n tm_sec: %d,\n "
"tm_min: %d,\n tm_hour: %d,\n tm_mday: %d,\n tm_mon: %d,\n "
"tm_year: %d,\n tm_wday: %d,\n tm_yday: %d,\n tm_isdst: %d\n",
val,
info.tm_zone,
info.tm_gmtoff,
info.tm_sec,
info.tm_min,
info.tm_hour,
info.tm_mday,
info.tm_mon,
info.tm_year,
info.tm_wday,
info.tm_yday,
info.tm_isdst);
struct tm info2;
info2.tm_sec = 15;
info2.tm_min = 36;
info2.tm_hour = 1;
info2.tm_mday = 7;
info2.tm_mon = 10;
info2.tm_year = 121;
info2.tm_wday = 0;
info2.tm_yday = 310;
info2.tm_isdst = -1;
val = mktime(&info2);
printf("\n\nmktime_res: %jd,\n tm_zone: %s,\n tm_gmtoff: %ld,\n tm_sec: %d,\n "
"tm_min: %d,\n tm_hour: %d,\n tm_mday: %d,\n tm_mon: %d,\n "
"tm_year: %d,\n tm_wday: %d,\n tm_yday: %d,\n tm_isdst: %d\n",
val,
info2.tm_zone,
info2.tm_gmtoff,
info2.tm_sec,
info2.tm_min,
info2.tm_hour,
info2.tm_mday,
info2.tm_mon,
info2.tm_year,
info2.tm_wday,
info2.tm_yday,
info2.tm_isdst);
return 0;
}
编译了它,并在终端中运行了它:
% clang time_shift.c -o time_shift
% ./time_shift
#> mktime_res: 1636266974,
#> tm_zone: EST,
#> tm_gmtoff: -18000,
#> tm_sec: 14,
#> tm_min: 36,
#> tm_hour: 1,
#> tm_mday: 7,
#> tm_mon: 10,
#> tm_year: 121,
#> tm_wday: 0,
#> tm_yday: 310,
#> tm_isdst: 0
#>
#>
#> mktime_res: 1636266975,
#> tm_zone: EST,
#> tm_gmtoff: -18000,
#> tm_sec: 15,
#> tm_min: 36,
#> tm_hour: 1,
#> tm_mday: 7,
#> tm_mon: 10,
#> tm_year: 121,
#> tm_wday: 0,
#> tm_yday: 310,
#> tm_isdst: 0
我们在这里没有看到问题。但是,我们确实注意到在这两种情况下都是 is,而当我们在运行后运行它时,我们获得了 14 和 15。tm_zone
EST
R
RApiDatetime::rapistrptime(x, fmt = "%Y-%m-%d %H:%M:%OS", "PST8PDT")
PST
PDT
有了这个,我在一个新的会话中重新运行了这些示例,并获得了与我们在纯实现中相同的结果。
R
C
在调用 base 后,我们在新会话中没有看到此行为。R
strptime
R
我试着查看 mktime 源代码,但它远远超出了我的范围。
会议信息
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.4.1
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: PST8PDT
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RApiDatetime_0.0.8
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.3.1 tools_4.3.1 Rcpp_1.0.11
评论
mktime
mktime
评论
all(sapply(as.POSIXlt(x_pst), \(x) x$zone) == "PST")
en_CA.UTF-8
sessionInfo()
FALSE
C
mktime