提问人:grace.cutler 提问时间:11/17/2023 最后编辑:M--grace.cutler 更新时间:11/17/2023 访问量:62
选择已知索引之间的行,并使用每个索引中的第一个值填充新列
Select rows between known index and fill a new column with the first value in each index
问:
我有一个 df,看起来像:
structure(list(id = c("2023021112", "2023021112", "2023021112",
"2023021112", "2023021112", "2023021112", "2023021112", "2023021112",
"2023021112", "2023021112", "2023021113", "2023021113", "2023021113",
"2023021113", "2023021113", "2023021113", "2023021113", "2023021113",
"2023021113", "2023021113"), response = c("1", "Happy", "Sad",
"Neutral", "Fearful", "2", "Disgusted", "Happy", "Sad", "Surprised", "2", "Sad", "Sad",
"Neutral", "Fearful", "1", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(1L, 2L, 3L, 4L, 5L, 72L, 73L, 74L, 75L, 76L, 6L, 7L, 8L, 9L, 10L, 77L, 78L, 79L, 80L, 91L), class = "data.frame")
对于从第 2 列中出现的第一个数字到下一个数字之前的行的行,我想在 id 中添加 -0n(即第一个设置行为 2023021112-01)。然后,我想删除第 2 列中包含这些数值的所有行,以便:
structure(list(id = c("2023021112-01", "2023021112-01",
"2023021112-01", "2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02",
"2023021112-02", "2023021112-02", "2023021113-02", "2023021113-02", "2023021113-02",
"2023021113-01", "2023021113-01", "2023021113-01",
"2023021113-01"), response = c("Happy", "Sad",
"Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Surprised", "Sad", "Sad",
"Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(2L, 3L, 4L, 5L, 73L, 74L, 75L, 76L, 7L, 8L, 9L, 10L, 78L, 79L, 80L, 91L), class = "data.frame")
在之前的问答中,@M--提供了以下部分解决方案:
suppressWarnings(
df1 %>%
mutate(id = paste(id,
sprintf("%02d", cumsum(!is.na(as.numeric(response)))),
sep = "-")) %>%
filter(is.na(as.numeric(response)))
)
但由于 ,它添加了“-01”、“-02”、“-03”、“-04”,而不是“-01”、“-02”、“-02”、“-01”:cumsum
structure(list(id = c("2023021112-01", "2023021112-01",
"2023021112-01", "2023021112-01", "2023021112-02", "2023021112-02", "2023021112-02",
"2023021112-02", "2023021112-03", "2023021113-03", "2023021113-03", "2023021113-03",
"2023021113-04", "2023021113-04", "2023021113-04", "2023021113-04"), response = c("Happy", "Sad", "Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Surprised", "Sad", "Sad", "Neutral", "Fearful", "Disgusted", "Happy", "Sad", "Fearful"
)), row.names = c(2L, 3L, 4L, 5L, 73L, 74L, 75L, 76L, 7L, 8L, 9L, 10L, 78L, 79L, 80L, 91L), class = "data.frame")
我怎样才能换成能够提供预期结果的东西(如代码块 2 所示)?cumsum
答:
1赞
akash87
11/17/2023
#1
为此,您将需要两个库
library(tidyverse)
library(zoo)
data %>%
mutate(id2 = na.locf(if_else(grepl("([0-9]+).*$", response), paste0(id, '-0', gsub("([0-9]+).*$", "\\1", response)), NA_character_))) %>%
filter(!grepl("([0-9]+).*$", response))
测试以查看 中的值是否为任意数字,如果是,则此代码将和 0.包具有一个函数,该函数将获取前一个值并将其滚动到下一个非 NA 值。grepl
response
paste0
id
response
zoo
na.locf
id response id2
1 2023021112 1 2023021112-01
2 2023021112 Happy 2023021112-01
3 2023021112 Sad 2023021112-01
4 2023021112 Neutral 2023021112-01
5 2023021112 Fearful 2023021112-01
72 2023021112 2 2023021112-02
73 2023021112 Disgusted 2023021112-02
74 2023021112 Happy 2023021112-02
75 2023021112 Sad 2023021112-02
76 2023021112 Surprised 2023021112-02
6 2023021113 2 2023021113-02
7 2023021113 Sad 2023021113-02
8 2023021113 Sad 2023021113-02
9 2023021113 Neutral 2023021113-02
10 2023021113 Fearful 2023021113-02
77 2023021113 1 2023021113-01
78 2023021113 Disgusted 2023021113-01
79 2023021113 Happy 2023021113-01
80 2023021113 Sad 2023021113-01
91 2023021113 Fearful 2023021113-01
我把它留了下来,所以你可以看到比较。id
id2
评论
0赞
grace.cutler
11/17/2023
这效果很好,但它将数字留在响应列中,因此我在下面检查了另一个答案
0赞
akash87
11/17/2023
我更新了代码。
2赞
jkatam
11/17/2023
#2
请尝试以下代码,只需library(tidyverse)
df %>% mutate(id2 = ifelse(str_detect(response, '\\d'),
paste0(id,'-0',response), NA)) %>%
fill(id2)
id response id2
1 2023021112 1 2023021112-01
2 2023021112 Happy 2023021112-01
3 2023021112 Sad 2023021112-01
4 2023021112 Neutral 2023021112-01
5 2023021112 Fearful 2023021112-01
72 2023021112 2 2023021112-02
73 2023021112 Disgusted 2023021112-02
74 2023021112 Happy 2023021112-02
75 2023021112 Sad 2023021112-02
76 2023021112 Surprised 2023021112-02
6 2023021113 2 2023021113-02
7 2023021113 Sad 2023021113-02
8 2023021113 Sad 2023021113-02
9 2023021113 Neutral 2023021113-02
10 2023021113 Fearful 2023021113-02
77 2023021113 1 2023021113-01
78 2023021113 Disgusted 2023021113-01
79 2023021113 Happy 2023021113-01
80 2023021113 Sad 2023021113-01
91 2023021113 Fearful 2023021113-01
创建于 2023-11-16 with reprex v2.0.2
1赞
M--
11/17/2023
#3
library(dplyr)
library(tidyr)
suppressWarnings(
df %>%
mutate(id = if_else(is.na(as.numeric(response)), NA,
paste(id, sprintf("%02d", as.numeric(response)),
sep = "-"))) %>%
fill(id, .direction ="down") %>%
filter(is.na(as.numeric(response)))
)
#> id response
#> 1 2023021112-01 Happy
#> 2 2023021112-01 Sad
#> 3 2023021112-01 Neutral
#> 4 2023021112-01 Fearful
#> 5 2023021112-02 Disgusted
#> 6 2023021112-02 Happy
#> 7 2023021112-02 Sad
#> 8 2023021112-02 Surprised
#> 9 2023021113-02 Sad
#> 10 2023021113-02 Sad
#> 11 2023021113-02 Neutral
#> 12 2023021113-02 Fearful
#> 13 2023021113-01 Disgusted
#> 14 2023021113-01 Happy
#> 15 2023021113-01 Sad
#> 16 2023021113-01 Fearful
创建于 2023-11-16 with reprex v2.0.2
评论