提问人:Daniel AG 提问时间:4/8/2023 更新时间:4/8/2023 访问量:63
当新序列在数据帧中开始时标识第一行
Identify the first row when a new sequence begins in a dataframe
问:
我有一个更大的数据集,其中包含很多序列。例:
Number <- c(1, 1, 1, 1, 2, 2, 2, 2)
Day <- c(1, 2, 3, 4, 1, 2, 3, 4)
Letter <- c("a", "a", "a", "a", "b", "b", "b", "b")
df <- data.frame(Number, Day, Letter)
df
#> Number Day Letter
#> 1 1 1 a
#> 2 1 2 a
#> 3 1 3 a
#> 4 1 4 a
#> 5 2 1 b
#> 6 2 2 b
#> 7 2 3 b
#> 8 2 4 b
创建于 2023-04-08 使用 reprex v2.0.2
我想创建一个新列,告诉我新序列何时开始。例:
df_des
#> Number Day Letter first
#> 1 1 1 a yes
#> 2 1 2 a no
#> 3 1 3 a no
#> 4 1 4 a no
#> 5 2 1 b yes
#> 6 2 2 b no
#> 7 2 3 b no
#> 8 2 4 b no
创建于 2023-04-08 使用 reprex v2.0.2
答:
2赞
Ronak Shah
4/8/2023
#1
下面是一个基本的 R 方法 -
# Columns to consider for sequence change
cols <- c('Number', 'Letter')
# Create a new column with everything as "No"
df$first <- 'No'
# Replace the first value of each sequence to "Yes"
df$first[!duplicated(df[cols])] <- 'Yes'
df
# Number Day Letter first
#1 1 1 a Yes
#2 1 2 a No
#3 1 3 a No
#4 1 4 a No
#5 2 1 b Yes
#6 2 2 b No
#7 2 3 b No
#8 2 4 b No
2赞
TarJae
4/8/2023
#2
下面是一种在语句中使用的方法:dplyr
row_number
ifelse
library(dplyr) #>= 1.1.0
df %>%
mutate(first = ifelse(row_number() == 1, "yes", "no"), .by=Number)
Number Day Letter first
1 1 1 a yes
2 1 2 a no
3 1 3 a no
4 1 4 a no
5 2 1 b yes
6 2 2 b no
7 2 3 b no
8 2 4 b no
1赞
akrun
4/8/2023
#3
一个选项base R
transform(df, first = c("no", "yes")[1 + !duplicated(Number)])
-输出
Number Day Letter first
1 1 1 a yes
2 1 2 a no
3 1 3 a no
4 1 4 a no
5 2 1 b yes
6 2 2 b no
7 2 3 b no
8 2 4 b no
上一个:R 每 n 行删除几行
评论
dplyr::group_by
dplyr::consecutive_id
data.table::rleid
Number
Letter
1
"a"
group_by(df, Number, Letter) %>% mutate(first = c("yes", rep("no", n()-1)))