选择已知索引之间的行，并使用每个索引中的第一个值填充新列-解网

问：

我有一个 df，看起来像：

structure(list(id = c("2023021112", "2023021112", "2023021112", 
"2023021112", "2023021112", "2023021112", "2023021112", "2023021112", 
"2023021112", "2023021112", "2023021113", "2023021113", "2023021113", 
"2023021113", "2023021113", "2023021113", "2023021113", "2023021113", 
"2023021113", "2023021113"), response = c("1", "Happy", "Sad", 
"Neutral", "Fearful", "2", "Disgusted", "Happy", "Sad", "Surprised", "2", "Sad", "Sad", 
"Neutral", "Fearful", "1", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L, 6L, 7L, 8L, 9L, 10L, 77L, 78L, 79L, 80L, 91L), class = "data.frame")

对于从第 2 列中出现的第一个数字到下一个数字之前的行的行，我想在 id 中添加 -0n（即第一个设置行为 2023021112-01）。然后，我想删除第 2 列中包含这些数值的所有行，以便：

structure(list(id = c("2023021112-01", "2023021112-01", 
"2023021112-01", "2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02", 
"2023021112-02", "2023021112-02", "2023021113-02", "2023021113-02", "2023021113-02", 
"2023021113-01", "2023021113-01", "2023021113-01", 
"2023021113-01"), response = c("Happy", "Sad", 
"Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Surprised", "Sad", "Sad", 
"Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(2L, 3L, 4L, 5L, 73L, 74L, 75L, 76L, 7L, 8L, 9L, 10L, 78L, 79L, 80L, 91L), class = "data.frame")

在之前的问答中，@M--提供了以下部分解决方案：

suppressWarnings(
df1 %>% 
  mutate(id = paste(id, 
                    sprintf("%02d", cumsum(!is.na(as.numeric(response)))),
                    sep = "-")) %>% 
  filter(is.na(as.numeric(response)))
)

但由于，它添加了“-01”、“-02”、“-03”、“-04”，而不是“-01”、“-02”、“-02”、“-01”：cumsum

structure(list(id = c("2023021112-01", "2023021112-01", 
"2023021112-01", "2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02", 
"2023021112-02", "2023021112-03", "2023021113-03", "2023021113-03", "2023021113-03", 
"2023021113-04", "2023021113-04", "2023021113-04", "2023021113-04"), response = c("Happy", "Sad", "Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Surprised", "Sad", "Sad", "Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(2L, 3L, 4L, 5L, 73L, 74L, 75L, 76L, 7L, 8L, 9L, 10L, 78L, 79L, 80L, 91L), class = "data.frame")

我怎样才能换成能够提供预期结果的东西（如代码块 2 所示）？cumsum

R 数据帧 DPLYR

           id  response           id2
1  2023021112         1 2023021112-01
2  2023021112     Happy 2023021112-01
3  2023021112       Sad 2023021112-01
4  2023021112   Neutral 2023021112-01
5  2023021112   Fearful 2023021112-01
72 2023021112         2 2023021112-02
73 2023021112 Disgusted 2023021112-02
74 2023021112     Happy 2023021112-02
75 2023021112       Sad 2023021112-02
76 2023021112 Surprised 2023021112-02
6  2023021113         2 2023021113-02
7  2023021113       Sad 2023021113-02
8  2023021113       Sad 2023021113-02
9  2023021113   Neutral 2023021113-02
10 2023021113   Fearful 2023021113-02
77 2023021113         1 2023021113-01
78 2023021113 Disgusted 2023021113-01
79 2023021113     Happy 2023021113-01
80 2023021113       Sad 2023021113-01
91 2023021113   Fearful 2023021113-01

我把它留了下来，所以你可以看到比较。idid2

           id  response           id2
1  2023021112         1 2023021112-01
2  2023021112     Happy 2023021112-01
3  2023021112       Sad 2023021112-01
4  2023021112   Neutral 2023021112-01
5  2023021112   Fearful 2023021112-01
72 2023021112         2 2023021112-02
73 2023021112 Disgusted 2023021112-02
74 2023021112     Happy 2023021112-02
75 2023021112       Sad 2023021112-02
76 2023021112 Surprised 2023021112-02
6  2023021113         2 2023021113-02
7  2023021113       Sad 2023021113-02
8  2023021113       Sad 2023021113-02
9  2023021113   Neutral 2023021113-02
10 2023021113   Fearful 2023021113-02
77 2023021113         1 2023021113-01
78 2023021113 Disgusted 2023021113-01
79 2023021113     Happy 2023021113-01
80 2023021113       Sad 2023021113-01
91 2023021113   Fearful 2023021113-01

^{创建于 2023-11-16 with reprex v2.0.2}

1赞 M-- 11/17/2023 #3

library(dplyr)
library(tidyr)

suppressWarnings(
df %>% 
  mutate(id = if_else(is.na(as.numeric(response)), NA, 
                      paste(id, sprintf("%02d", as.numeric(response)), 
                            sep = "-"))) %>% 
  fill(id, .direction ="down") %>% 
  filter(is.na(as.numeric(response))) 
)
#>               id  response
#> 1  2023021112-01     Happy
#> 2  2023021112-01       Sad
#> 3  2023021112-01   Neutral
#> 4  2023021112-01   Fearful
#> 5  2023021112-02 Disgusted
#> 6  2023021112-02     Happy
#> 7  2023021112-02       Sad
#> 8  2023021112-02 Surprised
#> 9  2023021113-02       Sad
#> 10 2023021113-02       Sad
#> 11 2023021113-02   Neutral
#> 12 2023021113-02   Fearful
#> 13 2023021113-01 Disgusted
#> 14 2023021113-01     Happy
#> 15 2023021113-01       Sad
#> 16 2023021113-01   Fearful

^{创建于 2023-11-16 with reprex v2.0.2}

上一个：从同时满足两个条件的数据帧中删除列

下一个：如何在 R 中对多个分类变量进行热编码

选择已知索引之间的行，并使用每个索引中的第一个值填充新列

Select rows between known index and fill a new column with the first value in each index

评论

评论