提问人:littleworth 提问时间:2/14/2020 最后编辑:ThomasIsCodinglittleworth 更新时间:8/1/2021 访问量:723
如何将 NA 替换为一组值
How to replace NA with set of values
问:
我有以下数据框:
library(dplyr)
library(tibble)
df <- tibble(
source = c("a", "b", "c", "d", "e"),
score = c(10, 5, NA, 3, NA ) )
df
它看起来像这样:
# A tibble: 5 x 2
source score
<chr> <dbl>
1 a 10 . # current max value
2 b 5
3 c NA
4 d 3
5 e NA
我想做的是将分数列中的值替换为现有范围。其中范围从 1 到NA
max + n
n
df
导致这个(手工编码):
source score
a 10
b 5
c 11 # obtained from 10 + 1
d 3
e 12 # obtained from 10 + 2
我怎样才能做到这一点?
答:
6赞
ThomasIsCoding
2/14/2020
#1
基本 R 解决方案
df$score[is.na(df$score)] <- seq(which(is.na(df$score))) + max(df$score,na.rm = TRUE)
这样
> df
# A tibble: 5 x 2
source score
<chr> <dbl>
1 a 10
2 b 5
3 c 11
4 d 3
5 e 12
评论
0赞
s_baldur
2/14/2020
已经是最简洁的,但可以缩短到seq(which(is.na(df$score)))
1:sum(is.na(df$score))
0赞
ThomasIsCoding
2/14/2020
@sindri_baldur谢谢。那个是由 stackoverflow.com/a/60222864/12158757 提供的
3赞
Rui Barradas
2/14/2020
#2
一个解决方案。dplyr
df %>%
mutate(na_count = cumsum(is.na(score)),
score = ifelse(is.na(score), max(score, na.rm = TRUE) + na_count, score)) %>%
select(-na_count)
## A tibble: 5 x 2
# source score
# <chr> <dbl>
#1 a 10
#2 b 5
#3 c 11
#4 d 3
#5 e 12
6赞
Sotos
2/14/2020
#3
这里有一种方法,dplyr
df %>%
mutate(score = replace(score,
is.na(score),
(max(score, na.rm = TRUE) + (cumsum(is.na(score))))[is.na(score)])
)
这给了,
# A tibble: 5 x 2 source score <chr> <dbl> 1 a 10 2 b 5 3 c 11 4 d 3 5 e 12
4赞
Aron Strandberg
2/14/2020
#4
跟:dplyr
library(dplyr)
df %>%
mutate_at("score", ~ ifelse(is.na(.), max(., na.rm = TRUE) + cumsum(is.na(.)), .))
结果:
# A tibble: 5 x 2
source score
<chr> <dbl>
1 a 10
2 b 5
3 c 11
4 d 3
5 e 12
10赞
Ronak Shah
2/14/2020
#5
另一种选择:
transform(df, score = pmin(max(score, na.rm = TRUE) +
cumsum(is.na(score)), score, na.rm = TRUE))
# source score
#1 a 10
#2 b 5
#3 c 11
#4 d 3
#5 e 12
如果你想在dplyr
library(dplyr)
df %>% mutate(score = pmin(max(score, na.rm = TRUE) +
cumsum(is.na(score)), score, na.rm = TRUE))
2赞
Łukasz Deryło
2/14/2020
#6
另一个,与 ThomasIsCoding 的解决方案非常相似:
> df$score[is.na(df$score)]<-max(df$score, na.rm=T)+(1:sum(is.na(df$score)))
> df
# A tibble: 5 x 2
source score
<chr> <dbl>
1 a 10
2 b 5
3 c 11
4 d 3
5 e 12
2赞
Serhii
2/14/2020
#7
与基本 R 解决方案相比,它不是很优雅,但仍然可能:
library(data.table)
setDT(df)
max.score = df[, max(score, na.rm = TRUE)]
df[is.na(score), score :=(1:.N) + max.score]
或者在一行中,但速度稍慢:
df[is.na(score), score := (1:.N) + df[, max(score, na.rm = TRUE)]]
df
source score
1: a 10
2: b 5
3: c 11
4: d 3
5: e 12
评论