提问人:Jen 提问时间:11/10/2023 最后编辑:r2evansJen 更新时间:11/11/2023 访问量:34
在 R 中使用 str_replace_all 重命名两个以上字符串类型的列
rename columns in R using str_replace_all for more than two string types
问:
我有一个数据集(dataraw),其中包含列标签,例如
condition1_men、condition1_women、condition2_men、condition3_women(等)
我想将字符串“condition1”、“condition2”替换为它们的名称。
条件 1_women = 相关_women;
条件 2_men = 无关_men;
条件3_men = 填充物_men;
当前代码:
data <- dataraw %>%
rename_all(~ str_replace_all(str_replace(., 'condition1', "related"), 'condition2', "unrelated"))
这适用于最多 2 个字符串,每次我尝试添加第三个字符串时,我都会遇到意外的符号错误。
data <- dataraw %>%
rename_all(~ str_replace_all(str_replace((., 'condition1', "related"), 'condition2', "unrelated"), 'condition3', "filler")))
我相信这一定很简单,但无论我尝试哪种组合,我都会遇到错误。 有人能指出我所犯的简单错误吗? 谢谢。
答:
3赞
r2evans
11/10/2023
#1
rename_all
在 6 年前被取代,取而代之的是 ,我将使用它:rename_with
library(dplyr)
dataraw <- data.frame(condition1_men=1, condition1_women=2, condition2_men=3, condition2_women=4, condition3_men=5)
dataraw
# condition1_men condition1_women condition2_men condition2_women condition3_men
# 1 1 2 3 4 5
dataraw |>
rename_with(.fn = ~ sub("^condition1_", "related_", sub("^condition2_", "unrelated_", .)))
# related_men related_women unrelated_men unrelated_women condition3_men
# 1 1 2 3 4 5
如果你有一个 “from=to” 赋值的(命名)向量,我们也可以这样做,更通用一点:
conds <- c(condition1="related", condition2="unrelated")
dataraw |>
rename_with(.fn = ~ Reduce(function(st, i) sub(names(conds)[i], conds[i], st), seq_along(conds), init = .x))
# related_men related_women unrelated_men unrelated_women condition3_men
# 1 1 2 3 4 5
我们需要,因为我们需要保留先前条件映射的所有更改。Reduce
我经常发现这样的数据在长格式中做得更好(在以后的数据整理/分析中)(正如 Limey 所建议的那样)。为此,我们还可以做到:
dataraw |>
tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
names_to = c("cond", ".value")) |>
mutate(cond2 = conds[match(sub("_.*", "", cond), names(conds))])
# # A tibble: 3 × 4
# cond men women cond2
# <chr> <dbl> <dbl> <chr>
# 1 condition1 1 2 related
# 2 condition2 3 4 unrelated
# 3 condition3 5 NA NA
尽管如果您的映射位于不同的帧中,它可能会更简单(数据管理、可视化、更新等),我们可以将其合并/联接到原始数据上:
cond_df <- tribble(
~ cond, ~ cond2
, "condition1", "related"
, "condition2", "unrelated"
)
dataraw |>
tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
names_to = c("cond", ".value")) |>
left_join(cond_df, by = "cond")
# # A tibble: 3 × 4
# cond men women cond2
# <chr> <dbl> <dbl> <chr>
# 1 condition1 1 2 related
# 2 condition2 3 4 unrelated
# 3 condition3 5 NA NA
评论
1赞
Jen
11/10/2023
太棒了 - 这已经完成了我需要的。非常感谢您抽出宝贵时间写下清晰的解释以及解决方案!
0赞
Adriano Mello
11/11/2023
#2
@r2evans的答案是最好的。如果出于某种原因需要关注枢轴,这里有一个可能的解决方案,包括 和:colnames()
purrr
dataraw <- tibble(
condition1_women = c("a", "b", "c"),
condition2_men = c("x", "y", "z"),
condition3_men = c("i", "j", "k"))
# A tibble: 3 × 3 -------------------
condition1_women condition2_men condition3_men
<chr> <chr> <chr>
1 a x i
2 b y j
3 c z k
labels <- tribble(
~ old, ~ new,
"condition1", "related",
"condition2", "unrelated",
"condition3", "filler")
# A tibble: 3 × 2 -------------------
old new
<chr> <chr>
1 condition1 related
2 condition2 unrelated
3 condition3 filler
old_names <- colnames(dataraw)
# -----------------------------------
[1] "condition1_women" "condition2_men" "condition3_men"
new_names <- map2_chr(
labels$old, labels$new,
\(y, z) modify(
keep(old_names,\(x) str_detect(x, y)),
\(x) str_replace(x, y, z)))
# -----------------------------------
[1] "related_women" "unrelated_men" "filler_men"
colnames(dataraw) <- new_names
# A tibble: 3 × 3 -------------------
related_women unrelated_men filler_men
<chr> <chr> <chr>
1 a x i
2 b y j
3 c z k
评论
pattern
replacement
pivot_longer
pivot_wider
pivot_longer