提问人:plover 提问时间:11/10/2023 最后编辑:plover 更新时间:11/10/2023 访问量:98
使用 Lubridate 比较行间的日期
comparing dates between rows with lubridate
问:
我正在尝试比较行之间的日期并对其应用逻辑。我使用此处的代码作为起点,但我在实际日期比较时出现错误。
我试图确定的是
- 后续购买之间的时间(以天为单位),以及
- 如果时间为 60 天或更长时间,并且
- 当前和之前的购买是什么
示例如下:
mydata <- data.frame(store=c('A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'),
PurchaseDate =c('2023-01-01', '2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01', '2023-04-01', '2023-05-01', '2023-06-01'),
sales=c("apples", "bananas", "cherries", "bananas", "cherries", "cherries", "bananas", "bananas"))
mydata$PurchaseDate <- as.Date(mydata$PurchaseDate)
mydata %>%
group_by(store) |> arrange((PurchaseDate), .by_group = TRUE)
我想返回的是这样的东西,以显示后续购买之间的天数,如果是 60 天或更长时间,以及之前购买的时间。最终,我会将这次销售与LastSale进行比较。
store PurchaseDate sales LastPurchase over60 LastSale
A 2023-01-01 apples 0 0 NA
A 2023-02-01 cherries 32 0 apples
A 2023-04-02 cherries 61 1 cherries
A 2023-05-01 bananas 31 0 cherries
B 2023-01-01 bananas 0 0 NA
B 2023-03-02 bananas 61 1 bananas
B 2023-04-01 cherries 31 0 bananas
B 2023-06-01 bananas 62 1 cherries
我试过了这个,但现在我知道这里的差异时间是错误的,但我想要的是相邻行之间的差异。
mydata %>%
group_by(store) |> arrange((PurchaseDate), .by_group = TRUE) |>
mutate(over60 = ifelse(abs(difftime(PurchaseDate, lag(sales, default =first(PurchaseDate)), units = "days")) < 60, 0, 1))
答:
1赞
Friede
11/10/2023
#1
library(dplyr)
mydata |>
group_by(store) |>
arrange((PurchaseDate), .by_group = TRUE) |>
mutate(Day = as.numeric(difftime(as.Date(PurchaseDate), lag(as.Date(PurchaseDate)), units = "days")),
Over60 = ifelse(Day > 60L, 1L, 0L),
LastSale = lag(sales))
给
# A tibble: 8 × 6
# Groups: store [2]
store PurchaseDate sales Day Over60 LastSale
<chr> <chr> <chr> <dbl> <int> <chr>
1 A 2023-01-01 apples NA NA NA
2 A 2023-02-01 cherries 31 0 apples
3 A 2023-04-01 cherries 59 0 cherries
4 A 2023-05-01 bananas 30 0 cherries
5 B 2023-01-01 bananas NA NA NA
6 B 2023-03-01 bananas 59 0 bananas
7 B 2023-04-01 cherries 31 0 bananas
8 B 2023-06-01 bananas 61 1 cherries
在列中可以提供潜在的优势。显然,一如既往,这取决于任务。请参阅以了解说明。使用 -function from 时,提供另一个没有 -argument 的 -function。NA
Day
?lag
lag
{dplyr}
R base
lag
default=
0赞
akash87
11/10/2023
#2
mydata %>%
arrange(store, PurchaseDate) %>%
group_by(store) %>%
mutate(Days = difftime(PurchaseDate, lag(PurchaseDate, default = min(mydata$PurchaseDate)), units = 'days'),
over60 = if_else(Days <= 60, 0, 1), last_sale = lag(sales)) %>%
ungroup()
结果
store PurchaseDate sales Days over60 last_sale
<chr> <date> <chr> <drtn> <dbl> <chr>
1 A 2023-01-01 apples 0 days 0 NA
2 A 2023-02-01 cherries 31 days 0 apples
3 A 2023-04-01 cherries 59 days 0 cherries
4 A 2023-05-01 bananas 30 days 0 cherries
5 B 2023-01-01 bananas 0 days 0 NA
6 B 2023-03-01 bananas 59 days 0 bananas
7 B 2023-04-01 cherries 31 days 0 bananas
8 B 2023-06-01 bananas 61 days 1 cherries
评论
0赞
plover
11/10/2023
谢谢!对于返回 1 的列,有没有办法拉出早期的销售?那么对于第 4 列:NA、苹果、樱桃、樱桃、NA、香蕉、樱桃、香蕉?我更新了我的问题,使其更清楚。
0赞
akash87
11/10/2023
@plover 你能检查一下这是否是你想要的吗?
0赞
plover
11/11/2023
它很接近,但它给出了一个默认错误 - 我用这里提供的其他代码弄清楚了,所以我感谢大家的帮助!
1赞
Sys_Tem
11/10/2023
#3
我认为此代码在您的请求中创建了三个特定列。base r 中的 diff 函数计算连续 PurchaseDate 值之间的差值。第一个 PurchaseDate 以 0 开头,因为没有要比较的上一个 PurchaseDate。
mydata %>%
group_by(store) |>
arrange((PurchaseDate), .by_group = TRUE) |>
mutate(LastPurchase = c(0, diff(PurchaseDate)),
over60 = ifelse(LastPurchase < 60, 0, 1),
LastSale = lag(sales))
结果如下:
store PurchaseDate sales LastPurchase over60 LastSale
<chr> <date> <chr> <dbl> <dbl> <chr>
1 A 2023-01-01 apples 0 0 NA
2 A 2023-02-01 cherries 31 0 apples
3 A 2023-04-01 cherries 59 0 cherries
4 A 2023-05-01 bananas 30 0 cherries
5 B 2023-01-01 bananas 0 0 NA
6 B 2023-03-01 bananas 59 0 bananas
7 B 2023-04-01 cherries 31 0 bananas
8 B 2023-06-01 bananas 61 1 cherries
评论
sales
difftime()
Days
over60
over30
Over60
Over60