在 R 中使用 pivot_longer 时的标头类型问题

Headers type problem when using pivot_longer in R

提问人:Laudine Carbuccia 提问时间:11/16/2023 更新时间:11/16/2023 访问量:27

问:

我有一个数据帧,对应于我的研究助理每天工作的小时数。它看起来像这样:

structure(list(SurveyorId = c("Zineb", "Elisa", "AudreyB", "CamilleP", 
"CamilleV", "AudreyG", "CharlesE", "Lou", "Ludivine", "Elise", 
"Jason", "Mathilde", "Hassina", "Nihel", "Dauphine"), `16/10/2023` = c(8, 
7, 3, 7, 6, 4, NA, 2.5, NA, 7, NA, NA, 4, NA, NA), `17/10/2023` = c(8, 
7, 5.5, 4, 2.5, 4.5, NA, 2, NA, 4, NA, 5.5, 7.5, NA, NA), `18/10/2023` = c(8, 
5, 7, NA, NA, 2.5, NA, NA, 2, NA, NA, 8, 7.5, NA, NA), `19/10/2023` = c(8, 
7, NA, NA, NA, NA, NA, NA, 1.5, NA, NA, 2.5, 9, NA, NA), `20/10/2023` = c(8, 
NA, 7.5, NA, NA, 2, NA, 2, NA, NA, 2.2, NA, 3.5, NA, NA), `21/10/2023` = c(NA, 
NA, NA, 6, 6, NA, 7.5, 5, 4.5, 4, 4.5, NA, NA, NA, NA), `23/10/2023` = c(8, 
9, NA, 7.5, 7, 4, NA, 1, NA, 7.5, NA, 2.5, 4, NA, NA), `24/10/2023` = c(8, 
NA, NA, 4.5, 2, 4.5, 8, 3.75, NA, NA, NA, 5.5, 7.5, NA, NA), 
    `25/10/2023` = c(8, 5, NA, NA, NA, 2, 1.5, NA, 1.5, NA, NA, 
    4, 8, NA, NA), `26/10/2023` = c(8, 8, 8, NA, NA, NA, NA, 
    1, 2, 7, 4, NA, 8, NA, NA), `27/10/2023` = c(8, NA, 4, NA, 
    NA, 2, NA, NA, 2, NA, NA, NA, 2.75, NA, NA), `28/10/2023` = c(NA, 
    NA, NA, 7.5, 6, NA, 8, 6, 6.5, 4, NA, 3, NA, NA, NA), `30/10/2023` = c(8, 
    9, 6, 2, NA, 4, NA, 1, 1.5, 7, NA, NA, 4.75, 3, NA), `31/10/2023` = c(8, 
    NA, 4, 2, NA, 4, 8, 2.5, 0.5, 0, NA, 3, 8.5, 3.5, NA), `01/11/2023` = c(NA, 
    5, NA, NA, NA, 2, NA, NA, NA, 7, NA, 3, NA, 3, NA), `02/11/2023` = c(3, 
    6.5, 4, NA, NA, NA, NA, 2, 2, NA, 3.5, 3, 8, 5.4, NA), `03/11/2023` = c(8, 
    NA, NA, NA, NA, NA, NA, NA, 2, NA, 2, 4.5, 3, 3.5, NA), `04/11/2023` = c(1, 
    NA, NA, NA, 7, NA, NA, 4.5, NA, NA, 3.75, NA, NA, 2.4, 7), 
    `06/11/2023` = c(8, 7, 2, 7, 6, 4, NA, 2, NA, 5, NA, 5, 4.6, 
    NA, 7), `07/11/2023` = c(7.5, NA, 0.5, 4, 3.5, 4, 8, 4, NA, 
    NA, NA, 4, 8, NA, 7), `08/11/2023` = c(7.5, 6.5, 2, NA, NA, 
    NA, 2, NA, 3, NA, NA, NA, 8.15, NA, 7), `09/11/2023` = c("7.5", 
    "5.5", NA, NA, NA, NA, NA, "2", "2", "7", "3.2", "5", "9", 
    NA, "5.5"), `10/11/2023` = c(9, NA, 1.5, NA, NA, NA, NA, 
    NA, 2.5, NA, 2, NA, 3.5, NA, NA), `11/11/2023` = c(NA, NA, 
    NA, 7, 7, NA, 7, NA, NA, 4, 3.2, NA, NA, 5, 6), `13/11/2023` = c("7.5", 
    NA, "2", NA, "7", "3", NA, NA, NA, "5.5", NA, "4.5", "4.25", 
    NA, NA), `14/11/2023` = c(8, NA, NA, NA, 2, NA, 6.5, 4.5, 
    NA, NA, NA, 3.5, 8.5, NA, NA), `15/11/2023` = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `16/11/2023` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    `17/11/2023` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L))

为了对这个数据框进行统计,我想对它进行透视,以便我把它们的所有名字都放在一列中,把所有的日期放在一列中,把所有工时都放在一列里。为此,我想按如下方式使用pivot_longer:

transposed_data <- Recensement_volume_horaire %>%
  pivot_longer(cols = -"SurveyorId", names_to = "Date", values_to = "Hours_Worked")

但是,运行该命令时出现以下错误消息:

Error in pivot_longer():
! Can't combine `16/10/2023` <double> and `09/11/2023` <character>.
Backtrace:
 1. Recensement_volume_horaire %>% ...
 3. tidyr:::pivot_longer.data.frame(., cols = -"SurveyorId", names_to = "Date", values_to = "Hours_Worked")

我已经尝试了一切将所有列的标题放入相同的格式,但我未能解决这个问题。我所有的解决方案都是代码块的变体,例如:

colnames(Recensement_volume_horaire) <- as.character(colnames(Recensement_volume_horaire))

你可以帮我吗?它会救我!!

R DataFrame DPLYR 标头

评论


答:

3赞 r2evans 11/16/2023 #1

查看该帧的 ucture 表明某些列不是数字:str

str(Recensement_volume_horaire)
# tibble [15 × 30] (S3: tbl_df/tbl/data.frame)
#  $ SurveyorId: chr [1:15] "Zineb" "Elisa" "AudreyB" "CamilleP" ...
#  $ 16/10/2023: num [1:15] 8 7 3 7 6 4 NA 2.5 NA 7 ...
#  $ 17/10/2023: num [1:15] 8 7 5.5 4 2.5 4.5 NA 2 NA 4 ...
#  $ 18/10/2023: num [1:15] 8 5 7 NA NA 2.5 NA NA 2 NA ...
#  $ 19/10/2023: num [1:15] 8 7 NA NA NA NA NA NA 1.5 NA ...
#  $ 20/10/2023: num [1:15] 8 NA 7.5 NA NA 2 NA 2 NA NA ...
#  $ 21/10/2023: num [1:15] NA NA NA 6 6 NA 7.5 5 4.5 4 ...
#  $ 23/10/2023: num [1:15] 8 9 NA 7.5 7 4 NA 1 NA 7.5 ...
#  $ 24/10/2023: num [1:15] 8 NA NA 4.5 2 4.5 8 3.75 NA NA ...
#  $ 25/10/2023: num [1:15] 8 5 NA NA NA 2 1.5 NA 1.5 NA ...
#  $ 26/10/2023: num [1:15] 8 8 8 NA NA NA NA 1 2 7 ...
#  $ 27/10/2023: num [1:15] 8 NA 4 NA NA 2 NA NA 2 NA ...
#  $ 28/10/2023: num [1:15] NA NA NA 7.5 6 NA 8 6 6.5 4 ...
#  $ 30/10/2023: num [1:15] 8 9 6 2 NA 4 NA 1 1.5 7 ...
#  $ 31/10/2023: num [1:15] 8 NA 4 2 NA 4 8 2.5 0.5 0 ...
#  $ 01/11/2023: num [1:15] NA 5 NA NA NA 2 NA NA NA 7 ...
#  $ 02/11/2023: num [1:15] 3 6.5 4 NA NA NA NA 2 2 NA ...
#  $ 03/11/2023: num [1:15] 8 NA NA NA NA NA NA NA 2 NA ...
#  $ 04/11/2023: num [1:15] 1 NA NA NA 7 NA NA 4.5 NA NA ...
#  $ 06/11/2023: num [1:15] 8 7 2 7 6 4 NA 2 NA 5 ...
#  $ 07/11/2023: num [1:15] 7.5 NA 0.5 4 3.5 4 8 4 NA NA ...
#  $ 08/11/2023: num [1:15] 7.5 6.5 2 NA NA NA 2 NA 3 NA ...
#  $ 09/11/2023: chr [1:15] "7.5" "5.5" NA NA ...
#  $ 10/11/2023: num [1:15] 9 NA 1.5 NA NA NA NA NA 2.5 NA ...
#  $ 11/11/2023: num [1:15] NA NA NA 7 7 NA 7 NA NA 4 ...
#  $ 13/11/2023: chr [1:15] "7.5" NA "2" NA ...
#  $ 14/11/2023: num [1:15] 8 NA NA NA 2 NA 6.5 4.5 NA NA ...
#  $ 15/11/2023: logi [1:15] NA NA NA NA NA NA ...
#  $ 16/11/2023: logi [1:15] NA NA NA NA NA NA ...
#  $ 17/11/2023: logi [1:15] NA NA NA NA NA NA ...

即,和 都代替了 .我们可以修复它们,然后进行调整:`09/11/2023``13/11/2023`chrnum

Recensement_volume_horaire %>%
  mutate(across(where(is.character) & ends_with("/2023"), ~ as.numeric(.))) %>%
  pivot_longer(cols = -"SurveyorId", names_to = "Date", values_to = "Hours_Worked")
# # A tibble: 435 × 3
#    SurveyorId Date       Hours_Worked
#    <chr>      <chr>             <dbl>
#  1 Zineb      16/10/2023            8
#  2 Zineb      17/10/2023            8
#  3 Zineb      18/10/2023            8
#  4 Zineb      19/10/2023            8
#  5 Zineb      20/10/2023            8
#  6 Zineb      21/10/2023           NA
#  7 Zineb      23/10/2023            8
#  8 Zineb      24/10/2023            8
#  9 Zineb      25/10/2023            8
# 10 Zineb      26/10/2023            8
# # ℹ 425 more rows
# # ℹ Use `print(n = ...)` to see more rows

我使用了一个明确的“是字符并以 2023 年结尾”的条件来确认所有都是数字,如果您需要收紧它(或松开它?