尝试group_by然后汇总 max 和 min - 遇到明确格式的错误

trying to group_by and then summarize max and min - running into error for unambiguous format

提问人:tchoup 提问时间:5/17/2022 最后编辑:Maëltchoup 更新时间:5/17/2022 访问量:434

问:

我的地址与 Kingwood 和 Humble 地址的信息重复。我正在尝试使用以下代码组合这些条目,保留最小首次报告日期和最大上次报告日期:

df <- df %>% group_by(id, street) %>% 
  summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>% 
  summarise(lastReportedDate = max(as.Date(lastReportedDate)))

但是,由于某种原因,id == 1000 给了我错误:

Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"

谁能帮我理解这个错误?数据示例如下:

dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr", 
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln", 
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct", 
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr", 
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble", 
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood", 
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"), 
    state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", 
    "TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339", 
    "77339", "77339", "77339", "77339", "77339", "77339", "77347", 
    "77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019", 
    "1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017", 
    "9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004", 
    "4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022", 
    "1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018", 
    "9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004", 
    "4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357", 
    "357", "357", "359", "359", "359", "359", "1000", "1000", 
    "1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))
R 数据帧 日期 dplyr 数据操作

评论

0赞 danh 5/17/2022
你还没有告诉如何解析你的日期。请尝试。as.Date()as.Date(firstReportedDate, '%m/%d/%Y')

答:

3赞 Maël 5/17/2022 #1

将所有内容嵌入到同一个摘要调用中。此外,您应该在数据不是国际日期格式的参数中指定日期的格式formatas.Date

dat %>% 
  mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>% 
  group_by(id, street) %>% 
  summarise(firstReportedDate = min(firstReportedDate),
            lastReportedDate = max(lastReportedDate))

输出

# A tibble: 10 × 4
# Groups:   id [4]
   id    street                 firstReportedDate lastReportedDate
   <chr> <chr>                  <date>            <date>          
 1 1000  19703 Highway 59 N     2009-01-01        2011-01-01      
 2 1000  6714 Dorylee Ln        2004-03-05        2004-03-05      
 3 1000  Po Box 203             2017-09-22        2022-04-01      
 4 1431  3511 Forest Row Dr     2009-09-30        2022-04-01      
 5 1431  Acorn Ln               2009-01-01        2013-01-01      
 6 357   1040 Marina Dr         2015-01-01        2021-01-01      
 7 357   2200 Lake Village Dr   2017-09-30        2022-04-01      
 8 359   1060 Marina Dr         2009-09-30        2009-09-30      
 9 359   22302 Rustic Bridge Ln 2017-06-15        2022-04-01      
10 359   3211 Laurel Point Ct   2002-10-12        2004-03-03