问：

我在数据框中有两个逻辑向量：

df <- data.frame(log1 = c(FALSE, FALSE, TRUE, FALSE, TRUE), log2 = c(TRUE, FALSE, FALSE, FALSE, TRUE))

我想通过结合这两者来制作第三列。但是，这个新列不应仅包含逻辑值。相反，它应该将三个值之一（“high”、“outlier”或“normal”）分配给第三列。“高”优先，因此第三列应显示第 5 行的“高”而不是“异常值”。

我想可以通过使用和来做到这一点，但我无法使用以下代码使其工作：ifelse

df$new <- NA
if(df$log1 == TRUE){
  df$new <-  "high"
  } else if(df$log2 == TRUE) {
    df$new  <-  "outlier"
    } else {
      df$new  <-  "normal"
      }

谁能帮忙？

r 逻辑运算符

基础 R

ifelse(df$log1, "high", ifelse(df$log2, "outlier", "normal"))
# [1] "outlier" "normal" "high"   "normal" "high"

德普莱尔

我们可以嵌套，但嵌套通常鼓励我们使用 .dplyr::if_elsecase_when

library(dplyr)
df %>%
  mutate(
    new1 = if_else(log1, "high", if_else(log2, "outlier", "normal")), 
    new2 = case_when(log1 ~ "high", log2 ~ "outlier", TRUE ~ "normal")
  )
#    log1  log2    new1    new2
# 1 FALSE  TRUE outlier outlier
# 2 FALSE FALSE  normal  normal
# 3  TRUE FALSE    high    high
# 4 FALSE FALSE  normal  normal
# 5  TRUE  TRUE    high    high

数据表

同样，和：fifelsefcase

library(data.table)
as.data.table(df)[, new1 := fifelse(log1, "high", fifelse(log2, "outlier", "normal"))
  ][, new2 := fcase(log1, "high", log2, "outlier", default = "normal")][]
#      log1   log2    new1    new2
#    <lgcl> <lgcl>  <char>  <char>
# 1:  FALSE   TRUE outlier outlier
# 2:  FALSE  FALSE  normal  normal
# 3:   TRUE  FALSE    high    high
# 4:  FALSE  FALSE  normal  normal
# 5:   TRUE   TRUE    high    high

请注意，虽然上面使用波浪号公式，如，但变体使用交替参数。dplyr::case_whencond1 ~ value1, cond2 ~ value2fcasecond1, value1, cond2, value2, ...)

此外，只要参数是常量，它就有效。如果需要动态默认值（即基于表内容），则需要有一个全真向量，如中所示。default=fcase(..., rep(TRUE, .N), NEWVALUE)

这是一个很好的替代方法，而且效果很好！一个建议：每当我在代码中使用这种方法时，我总是把最高优先级的重新分配放在最后。虽然这里的逻辑是完全不重叠的（因此这些数据没有变化），但在更广义的意义上，逻辑可能不那么完美互补，让它最后可以确保它覆盖任何其他值。

2赞 jblood94 9/28/2023 #3

使用索引：

df$new <- with(df, c("normal", "outlier", "high", "high")[2L*log1 + log2 + 1L])
df
#>    log1  log2     new
#> 1 FALSE  TRUE outlier
#> 2 FALSE FALSE  normal
#> 3  TRUE FALSE    high
#> 4 FALSE FALSE  normal
#> 5  TRUE  TRUE    high

@r2evans指出的解决方案是最快的，也是最节省内存的。索引胜出于基本解决方案。data.table

f1 <- function(df) {
  within(df, new <- ifelse(log1, "high", ifelse(log2, "outlier", "normal")))
}

f2 <- function(df) {
  df %>%
    mutate(
      new = if_else(log1, "high", if_else(log2, "outlier", "normal")), 
    )
}

f3 <- function(df) {
  df %>%
    mutate(
      new = case_when(log1 ~ "high", log2 ~ "outlier", TRUE ~ "normal")
    )
}

f4 <- function(df) {
  setDT(df)[, new := fifelse(log1, "high", fifelse(log2, "outlier", "normal"))]
}

f5 <- function(df) {
  setDT(df)[, new := fcase(log1, "high", log2, "outlier", default = "normal")]
}

f6 <- function(df) {
  df$new <- "normal"
  df[df$log1, ]$new <- "high"
  df[!df$log1 & df$log2, ]$new <- "outlier"
  df
}

f7 <- function(df) {
  within(df, new <- c("normal", "outlier", "high", "high")[2L*log1 + log2 + 1L])
}

基准：

df <- data.frame(log1 = sample(!0:1, 1e5, 1), log2 = sample(!0:1, 1e5, 1))

bench::mark(
  ifelse = f1(df),
  if_else = f2(df),
  case_when = f3(df),
  fifelse = f4(df),
  fcase = f5(df),
  subsetting = f6(df),
  indexing = f7(df),
  check = FALSE # mix of data.table and data.frame
)
#> # A tibble: 7 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 ifelse      40.32ms   41.9ms      24.0   10.32MB     12.0
#> 2 if_else      8.13ms   8.92ms     111.     9.95MB     33.0
#> 3 case_when    6.58ms   7.09ms     137.     7.36MB     45.7
#> 4 fifelse      1.68ms   2.01ms     483.     3.42MB     18.0
#> 5 fcase        1.45ms   1.57ms     618.      1.2MB     22.6
#> 6 subsetting   6.99ms    8.2ms     120.    10.85MB     59.9
#> 7 indexing     1.57ms   2.54ms     389.     4.27MB     50.9

组合逻辑向量以创建非逻辑向量

Combine logical vectors to create non-logical vector

评论

基础 R

德普莱尔

数据表

评论

评论

评论