选择已知索引之间的行,并使用每个索引中的第一个值填充新列

Select rows between known index and fill a new column with the first value in each index

提问人:grace.cutler 提问时间:11/17/2023 最后编辑:M--grace.cutler 更新时间:11/17/2023 访问量:62

问:

我有一个 df,看起来像:

structure(list(id = c("2023021112", "2023021112", "2023021112", 
"2023021112", "2023021112", "2023021112", "2023021112", "2023021112", 
"2023021112", "2023021112", "2023021113", "2023021113", "2023021113", 
"2023021113", "2023021113", "2023021113", "2023021113", "2023021113", 
"2023021113", "2023021113"), response = c("1", "Happy", "Sad", 
"Neutral", "Fearful", "2", "Disgusted", "Happy", "Sad", "Surprised", "2", "Sad", "Sad", 
"Neutral", "Fearful", "1", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L, 6L, 7L, 8L, 9L, 10L, 77L, 78L, 79L, 80L, 91L), class = "data.frame")

对于从第 2 列中出现的第一个数字到下一个数字之前的行的行,我想在 id 中添加 -0n(即第一个设置行为 2023021112-01)。然后,我想删除第 2 列中包含这些数值的所有行,以便:

structure(list(id = c("2023021112-01", "2023021112-01", 
"2023021112-01", "2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02", 
"2023021112-02", "2023021112-02", "2023021113-02", "2023021113-02", "2023021113-02", 
"2023021113-01", "2023021113-01", "2023021113-01", 
"2023021113-01"), response = c("Happy", "Sad", 
"Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Surprised", "Sad", "Sad", 
"Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(2L, 3L, 4L, 5L, 73L, 74L, 75L, 76L, 7L, 8L, 9L, 10L, 78L, 79L, 80L, 91L), class = "data.frame")

之前的问答中,@M--提供了以下部分解决方案:

suppressWarnings(
df1 %>% 
  mutate(id = paste(id, 
                    sprintf("%02d", cumsum(!is.na(as.numeric(response)))),
                    sep = "-")) %>% 
  filter(is.na(as.numeric(response)))
)

但由于 ,它添加了“-01”、“-02”、“-03”、“-04”,而不是“-01”、“-02”、“-02”、“-01”:cumsum

structure(list(id = c("2023021112-01", "2023021112-01", 
"2023021112-01", "2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02", 
"2023021112-02", "2023021112-03", "2023021113-03", "2023021113-03", "2023021113-03", 
"2023021113-04", "2023021113-04", "2023021113-04", "2023021113-04"), response = c("Happy", "Sad", "Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Surprised", "Sad", "Sad", "Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(2L, 3L, 4L, 5L, 73L, 74L, 75L, 76L, 7L, 8L, 9L, 10L, 78L, 79L, 80L, 91L), class = "data.frame")

我怎样才能换成能够提供预期结果的东西(如代码块 2 所示)?cumsum

R 数据帧 DPLYR

评论

0赞 grace.cutler 11/29/2023
哦,好吧,我以为这是告诉 OP 删除问题的一种方式

答:

1赞 akash87 11/17/2023 #1

为此,您将需要两个库

library(tidyverse)
library(zoo)



data %>% 
mutate(id2 = na.locf(if_else(grepl("([0-9]+).*$", response), paste0(id, '-0', gsub("([0-9]+).*$", "\\1", response)), NA_character_))) %>%
filter(!grepl("([0-9]+).*$", response))

测试以查看 中的值是否为任意数字,如果是,则此代码将和 0.包具有一个函数,该函数将获取前一个值并将其滚动到下一个非 NA 值。greplresponsepaste0idresponsezoona.locf

           id  response           id2
1  2023021112         1 2023021112-01
2  2023021112     Happy 2023021112-01
3  2023021112       Sad 2023021112-01
4  2023021112   Neutral 2023021112-01
5  2023021112   Fearful 2023021112-01
72 2023021112         2 2023021112-02
73 2023021112 Disgusted 2023021112-02
74 2023021112     Happy 2023021112-02
75 2023021112       Sad 2023021112-02
76 2023021112 Surprised 2023021112-02
6  2023021113         2 2023021113-02
7  2023021113       Sad 2023021113-02
8  2023021113       Sad 2023021113-02
9  2023021113   Neutral 2023021113-02
10 2023021113   Fearful 2023021113-02
77 2023021113         1 2023021113-01
78 2023021113 Disgusted 2023021113-01
79 2023021113     Happy 2023021113-01
80 2023021113       Sad 2023021113-01
91 2023021113   Fearful 2023021113-01

我把它留了下来,所以你可以看到比较。idid2

评论

0赞 grace.cutler 11/17/2023
这效果很好,但它将数字留在响应列中,因此我在下面检查了另一个答案
0赞 akash87 11/17/2023
我更新了代码。
2赞 jkatam 11/17/2023 #2

请尝试以下代码,只需library(tidyverse)

df %>% mutate(id2 = ifelse(str_detect(response, '\\d'), 
                           paste0(id,'-0',response), NA)) %>% 
       fill(id2)
           id  response           id2
1  2023021112         1 2023021112-01
2  2023021112     Happy 2023021112-01
3  2023021112       Sad 2023021112-01
4  2023021112   Neutral 2023021112-01
5  2023021112   Fearful 2023021112-01
72 2023021112         2 2023021112-02
73 2023021112 Disgusted 2023021112-02
74 2023021112     Happy 2023021112-02
75 2023021112       Sad 2023021112-02
76 2023021112 Surprised 2023021112-02
6  2023021113         2 2023021113-02
7  2023021113       Sad 2023021113-02
8  2023021113       Sad 2023021113-02
9  2023021113   Neutral 2023021113-02
10 2023021113   Fearful 2023021113-02
77 2023021113         1 2023021113-01
78 2023021113 Disgusted 2023021113-01
79 2023021113     Happy 2023021113-01
80 2023021113       Sad 2023021113-01
91 2023021113   Fearful 2023021113-01

创建于 2023-11-16 with reprex v2.0.2

1赞 M-- 11/17/2023 #3
library(dplyr)
library(tidyr)

suppressWarnings(
df %>% 
  mutate(id = if_else(is.na(as.numeric(response)), NA, 
                      paste(id, sprintf("%02d", as.numeric(response)), 
                            sep = "-"))) %>% 
  fill(id, .direction ="down") %>% 
  filter(is.na(as.numeric(response))) 
)
#>               id  response
#> 1  2023021112-01     Happy
#> 2  2023021112-01       Sad
#> 3  2023021112-01   Neutral
#> 4  2023021112-01   Fearful
#> 5  2023021112-02 Disgusted
#> 6  2023021112-02     Happy
#> 7  2023021112-02       Sad
#> 8  2023021112-02 Surprised
#> 9  2023021113-02       Sad
#> 10 2023021113-02       Sad
#> 11 2023021113-02   Neutral
#> 12 2023021113-02   Fearful
#> 13 2023021113-01 Disgusted
#> 14 2023021113-01     Happy
#> 15 2023021113-01       Sad
#> 16 2023021113-01   Fearful

创建于 2023-11-16 with reprex v2.0.2