如何根据一组唯一的 ID 变量和一个连续增加和重置的变量在 R 中连接连续的行?

How do I concatenate consecutive rows in R based on a unique set of ID variables and a consecutively increasing and resetting variable?

提问人:Loudog3232 提问时间:10/13/2023 最后编辑:thelatemailLoudog3232 更新时间:10/13/2023 访问量:47

问:

数据框示例:

    df <- data.frame(ID = c(1, 1, 2, 2, 2, 2, 2),
                Name = c("Alice", "Alice", "Bob", "Bob", "Bob", "Bob", "Bob"),
                 Age = c(25, 25, 30, 30, 30, 30, 30),
                 LINE = c(1, 2, 1, 2, 1, 2, 3),
                 NOTE_TEXT = c("This is the fir", "st note",
                               "This is the seco", "nd note",
                               "This is ", "the th", "ird note"))

从本质上讲,由于来自数据源拉取的字符限制,我的完整“注释”被拆分为跨多行的“NOTE_TEXT”子字符串。属于同一“Note”的子字符串由另一个名为“LINE”的变量连续列出,该变量可以是 1 到 65 之间的任何值(绝大多数都在 4 行以内)。我想将属于同一“注释”的“NOTE_TEXT”合并为一行,并创建一个新变量来表示属于同一组 ID、名称、年龄、变量的每个“注释”的唯一性。

生成的 DataFrame 如下所示:

    data.frame(ID = c(1, 2, 2),
                 Name = c("Alice", "Bob", "Bob"),
                 Age = c(25, 30, 30),
                 Note = c(1, 1, 2),
                 NOTE_TEXT = c("This is the first note",
                               "This is the second note",
                               "This is the third note"))

我想我需要使用某种for循环来循环每组唯一变量的“LINE”,但我不确定从哪里开始。 感谢您的帮助!

R dplyr 字符串串联

评论


答:

1赞 Jon Spring 10/13/2023 #1
  df |>
    mutate(note_num = cumsum(LINE == 1), .by = c(ID, Name, Age)) |>
    summarize(NOTE_TEXT = paste(NOTE_TEXT, collapse = ""), 
           .by = c(ID, Name, Age, note_num))

结果

  ID  Name Age note_num               NOTE_TEXT
1  1 Alice  25        1  This is the first note
2  2   Bob  30        1 This is the second note
3  2   Bob  30        2  This is the third note

评论

0赞 Loudog3232 10/14/2023
谢谢乔恩,这正是我需要的。感谢您的帮助。
1赞 gl00ten 10/13/2023 #2
    i <- 1
    text <- df$NOTE_TEXT[1]
    CONCAT_NOTE_TEXT <- character(0)
    
    for (i in 2:nrow(df)) {
      if (df$LINE[i] != 1) {
        text <- paste0(text, df$NOTE_TEXT[i])
      } else {
        CONCAT_NOTE_TEXT <- c(CONCAT_NOTE_TEXT, text)
        text <- df$NOTE_TEXT[i]
      }
    }
    
    CONCAT_NOTE_TEXT <- c(CONCAT_NOTE_TEXT, text)
    
    result_df <- data.frame(
      ID = df$ID[df$LINE == 1],
      Name = df$Name[df$LINE == 1],
      Age = df$Age[df$LINE == 1],
      CONCAT_NOTE_TEXT = CONCAT_NOTE_TEXT
    )
  ID  Name Age        CONCAT_NOTE_TEXT
1  1 Alice  25  This is the first note
2  2   Bob  30 This is the second note
3  2   Bob  30  This is the third note

评论

1赞 Loudog3232 10/14/2023
感谢 gl00ten 的回复并向我展示如何通过循环在基础 r 中完成此操作。我喜欢 dplyr 方法,因为它更简洁且可扩展到我的实际数据集,但我从这个答案中学到了很多东西。