如何根据一组唯一的 ID 变量和一个连续增加和重置的变量在 R 中连接连续的行？

How do I concatenate consecutive rows in R based on a unique set of ID variables and a consecutively increasing and resetting variable?

提问人：Loudog3232 提问时间：10/13/2023 最后编辑：thelatemailLoudog3232 更新时间：10/13/2023 访问量：47

问：

数据框示例：

    df <- data.frame(ID = c(1, 1, 2, 2, 2, 2, 2),
                Name = c("Alice", "Alice", "Bob", "Bob", "Bob", "Bob", "Bob"),
                 Age = c(25, 25, 30, 30, 30, 30, 30),
                 LINE = c(1, 2, 1, 2, 1, 2, 3),
                 NOTE_TEXT = c("This is the fir", "st note",
                               "This is the seco", "nd note",
                               "This is ", "the th", "ird note"))

从本质上讲，由于来自数据源拉取的字符限制，我的完整“注释”被拆分为跨多行的“NOTE_TEXT”子字符串。属于同一“Note”的子字符串由另一个名为“LINE”的变量连续列出，该变量可以是 1 到 65 之间的任何值（绝大多数都在 4 行以内）。我想将属于同一“注释”的“NOTE_TEXT”合并为一行，并创建一个新变量来表示属于同一组 ID、名称、年龄、变量的每个“注释”的唯一性。

生成的 DataFrame 如下所示：

    data.frame(ID = c(1, 2, 2),
                 Name = c("Alice", "Bob", "Bob"),
                 Age = c(25, 30, 30),
                 Note = c(1, 1, 2),
                 NOTE_TEXT = c("This is the first note",
                               "This is the second note",
                               "This is the third note"))

我想我需要使用某种for循环来循环每组唯一变量的“LINE”，但我不确定从哪里开始。感谢您的帮助！

R dplyr 字符串串联

答：

1赞 Jon Spring 10/13/2023 #1

  df |>
    mutate(note_num = cumsum(LINE == 1), .by = c(ID, Name, Age)) |>
    summarize(NOTE_TEXT = paste(NOTE_TEXT, collapse = ""), 
           .by = c(ID, Name, Age, note_num))

结果

  ID  Name Age note_num               NOTE_TEXT
1  1 Alice  25        1  This is the first note
2  2   Bob  30        1 This is the second note
3  2   Bob  30        2  This is the third note

0赞 Loudog3232 10/14/2023

谢谢乔恩，这正是我需要的。感谢您的帮助。

1赞 gl00ten 10/13/2023 #2

    i <- 1
    text <- df$NOTE_TEXT[1]
    CONCAT_NOTE_TEXT <- character(0)
    
    for (i in 2:nrow(df)) {
      if (df$LINE[i] != 1) {
        text <- paste0(text, df$NOTE_TEXT[i])
      } else {
        CONCAT_NOTE_TEXT <- c(CONCAT_NOTE_TEXT, text)
        text <- df$NOTE_TEXT[i]
      }
    }
    
    CONCAT_NOTE_TEXT <- c(CONCAT_NOTE_TEXT, text)
    
    result_df <- data.frame(
      ID = df$ID[df$LINE == 1],
      Name = df$Name[df$LINE == 1],
      Age = df$Age[df$LINE == 1],
      CONCAT_NOTE_TEXT = CONCAT_NOTE_TEXT
    )

  ID  Name Age        CONCAT_NOTE_TEXT
1  1 Alice  25  This is the first note
2  2   Bob  30 This is the second note
3  2   Bob  30  This is the third note

1赞 Loudog3232 10/14/2023

感谢 gl00ten 的回复并向我展示如何通过循环在基础 r 中完成此操作。我喜欢 dplyr 方法，因为它更简洁且可扩展到我的实际数据集，但我从这个答案中学到了很多东西。

上一个：条件字符串串联的 T-SQL 查询

下一个：为什么数组可以包含字符串的元素？

如何根据一组唯一的 ID 变量和一个连续增加和重置的变量在 R 中连接连续的行？

How do I concatenate consecutive rows in R based on a unique set of ID variables and a consecutively increasing and resetting variable?

评论

评论

评论