在 R 中使用 str_replace_all 重命名两个以上字符串类型的列

rename columns in R using str_replace_all for more than two string types

提问人:Jen 提问时间:11/10/2023 最后编辑:r2evansJen 更新时间:11/11/2023 访问量:34

问:

我有一个数据集(dataraw),其中包含列标签,例如

condition1_men、condition1_women、condition2_men、condition3_women(等)

我想将字符串“condition1”、“condition2”替换为它们的名称。

条件 1_women = 相关_women;

条件 2_men = 无关_men;

条件3_men = 填充物_men;

当前代码:

data <- dataraw %>%
 rename_all(~ str_replace_all(str_replace(., 'condition1', "related"), 'condition2', "unrelated"))

这适用于最多 2 个字符串,每次我尝试添加第三个字符串时,我都会遇到意外的符号错误。

 data <- dataraw %>%
rename_all(~ str_replace_all(str_replace((., 'condition1', "related"), 'condition2', "unrelated"), 'condition3', "filler")))

我相信这一定很简单,但无论我尝试哪种组合,我都会遇到错误。 有人能指出我所犯的简单错误吗? 谢谢。

r dplyr 替换 纵梁

评论

2赞 Limey 11/10/2023
当您想要进行多次替换(或定义函数)时,您需要同时传递 和 作为向量。详细信息在在线文档中。或者,您可以 、 操作 和 。或者(我的首选选项),只是为了使您的数据框整洁patternreplacementpivot_longerpivot_widerpivot_longer

答:

3赞 r2evans 11/10/2023 #1

rename_all在 6 年前被取代,取而代之的是 ,我将使用它:rename_with

library(dplyr)
dataraw <- data.frame(condition1_men=1, condition1_women=2, condition2_men=3, condition2_women=4, condition3_men=5)
dataraw
#   condition1_men condition1_women condition2_men condition2_women condition3_men
# 1              1                2              3                4              5
dataraw |>
  rename_with(.fn = ~ sub("^condition1_", "related_", sub("^condition2_", "unrelated_", .)))
#   related_men related_women unrelated_men unrelated_women condition3_men
# 1           1             2             3               4              5

如果你有一个 “from=to” 赋值的(命名)向量,我们也可以这样做,更通用一点:

conds <- c(condition1="related", condition2="unrelated")
dataraw |>
  rename_with(.fn = ~ Reduce(function(st, i) sub(names(conds)[i], conds[i], st), seq_along(conds), init = .x))
#   related_men related_women unrelated_men unrelated_women condition3_men
# 1           1             2             3               4              5

我们需要,因为我们需要保留先前条件映射的所有更改。Reduce

我经常发现这样的数据在长格式中做得更好(在以后的数据整理/分析中)(正如 Limey 所建议的那样)。为此,我们还可以做到:

dataraw |>
  tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
                      names_to = c("cond", ".value")) |>
  mutate(cond2 = conds[match(sub("_.*", "", cond), names(conds))])
# # A tibble: 3 × 4
#   cond         men women cond2    
#   <chr>      <dbl> <dbl> <chr>    
# 1 condition1     1     2 related  
# 2 condition2     3     4 unrelated
# 3 condition3     5    NA NA       

尽管如果您的映射位于不同的帧中,它可能会更简单(数据管理、可视化、更新等),我们可以将其合并/联接到原始数据上:

cond_df <- tribble(
  ~ cond, ~ cond2
  , "condition1", "related"
  , "condition2", "unrelated"
)
dataraw |>
  tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
                      names_to = c("cond", ".value")) |>
  left_join(cond_df, by = "cond")
# # A tibble: 3 × 4
#   cond         men women cond2    
#   <chr>      <dbl> <dbl> <chr>    
# 1 condition1     1     2 related  
# 2 condition2     3     4 unrelated
# 3 condition3     5    NA NA       

评论

1赞 Jen 11/10/2023
太棒了 - 这已经完成了我需要的。非常感谢您抽出宝贵时间写下清晰的解释以及解决方案!
0赞 Adriano Mello 11/11/2023 #2

@r2evans的答案是最好的。如果出于某种原因需要关注枢轴,这里有一个可能的解决方案,包括 和:colnames()purrr

dataraw <- tibble(
  condition1_women = c("a", "b", "c"),
  condition2_men   = c("x", "y", "z"),
  condition3_men   = c("i", "j", "k"))

# A tibble: 3 × 3 -------------------
  condition1_women condition2_men condition3_men
  <chr>            <chr>          <chr>         
1 a                x              i             
2 b                y              j             
3 c                z              k

labels <- tribble(
  ~ old, ~ new,
  "condition1", "related",
  "condition2", "unrelated",
  "condition3", "filler")

# A tibble: 3 × 2 -------------------
  old        new      
  <chr>      <chr>    
1 condition1 related  
2 condition2 unrelated
3 condition3 filler 

old_names <- colnames(dataraw)

# -----------------------------------
[1] "condition1_women" "condition2_men"   "condition3_men"  

new_names <- map2_chr(
  labels$old, labels$new,
  \(y, z) modify(
    keep(old_names,\(x) str_detect(x, y)), 
    \(x) str_replace(x, y, z)))

# -----------------------------------   
[1] "related_women" "unrelated_men" "filler_men"

colnames(dataraw) <- new_names

# A tibble: 3 × 3 -------------------
  related_women unrelated_men filler_men
  <chr>         <chr>         <chr>     
1 a             x             i         
2 b             y             j         
3 c             z             k