提问人:Yahel 提问时间:11/13/2023 更新时间:11/13/2023 访问量:35
提取与 R 中列范围之外的字符串匹配的值
Extract value that match a string out of range of columns in R
问:
在 R 中编写相关代码。我有一个数据帧(称为 realData),其中包含 10 个名为“RealAttribute_X”的变量,其中 X 是 1 到 10(包括)之间的数字。 每列包含十个属性之一:“智慧”、“吸引力”、“魅力”、“野心”、“懒惰”、“慷慨”、“快乐”、“友好”、“傲慢”、“冷静”。这些属性被随机分配给 10 个“RealAttribute_X”中的 1 个。
dt<-structure(list(session_id = c("17472631", "17472632", "17472633",
"17472635", "17472636", "17472638"), RealAttribute_1 = c("Moderately ugly",
"Very dull", "Very distant", "Very joyful", "Moderately joyful",
"Very distant"), RealAttribute_2 = c("Very nervous", "Very gloomy",
"Very generous", "Moderately charismatic", "Moderately hard working",
"Moderately modest"), RealAttribute_3 = c("Slightly generous",
"Moderately ugly", "Moderately arrogant", "Moderately calm",
"Moderately charismatic", "Moderately charismatic"), RealAttribute_4 = c("Moderately arrogant",
"Slightly generous", "Very dull", "Moderately distant", "Slightly distant",
"Slightly ambitious"), RealAttribute_5 = c("Slightly unambitious",
"Moderately calm", "Moderately unambitious", "Moderately lazy",
"Very modest", "Moderately intelligent"), RealAttribute_6 = c("Slightly dull",
"Slightly ambitious", "Very calm", "Moderately ambitious", "Moderately generous",
"Moderately generous"), RealAttribute_7 = c("Very intelligent",
"Slightly distant", "Very intelligent", "Slightly ugly", "Very good-looking",
"Very lazy"), RealAttribute_8 = c("Very joyful", "Slightly modest",
"Slightly joyful", "Very arrogant", "Very ambitious", "Slightly good-looking"
), RealAttribute_9 = c("Very distant", "Very lazy", "Slightly good-looking",
"Very generous", "Moderately intelligent", "Moderately gloomy"
), RealAttribute_10 = c("Slightly lazy", "Moderately intelligent",
"Slightly hard working", "Moderately intelligent", "Very calm",
"Moderately nervous")), class = "data.frame", row.names = c(NA,
-6L))
head(dt)
每个属性可以有六个值之一。属性及其可选值为:intelligence = “非常不智能”、“中度不智能”、“稍微不智能”、“稍微智能”、“中等智能”、“非常智能” 吸引力=“非常丑陋”、“中度丑陋”、“略微丑陋”、“稍微好看”、“适度好看”、“非常好看” 懒惰 = “非常懒惰”, “适度懒惰”, “稍微懒惰”, “稍微努力”, “适度努力”, “非常努力” 友好度 = “非常遥远”, “适度疏远”, “稍微疏远”, “稍微友好”, “适度友好”, “非常友好” 魅力=“非常沉闷”、“适度沉闷”、“略显沉闷”、“略显魅力”、“中等魅力”、“非常有魅力” calmness = “非常紧张”, “中度紧张”, “稍微紧张”, “稍微平静”, “适度平静”, “非常平静” 慷慨 = “非常吝啬”, “适度吝啬”, “略显吝啬”, “略显慷慨”, “适度慷慨”, “非常慷慨” joyfullness = “非常忧郁”, “适度忧郁”, “略显忧郁”, “略有喜悦”, “适度喜悦”, “非常喜悦” 傲慢=“非常傲慢”、“适度傲慢”、“略微傲慢”、“略微谦虚”、“适度谦虚”、“非常谦虚” ambition = “非常没有野心”, “适度没有野心”, “有点没有野心”, “有点雄心勃勃”, “适度雄心勃勃”, “非常雄心勃勃”
我正在尝试在 R 中编写一个代码,该代码将创建 10 个新列,每个属性(“智力”、“吸引力”、“魅力”、“野心”、“懒惰”、“慷慨”、“快乐”、“友好”、“傲慢”、“冷静”),在包含相关属性的现有数据帧中找到正确的变量,并将该属性的值分配给正确的新变量。
例如,这是 DataFrame 的前四行: RealAttribute_1:“适度丑陋”、“非常沉闷”、“非常遥远”、“非常快乐”<br RealAttribute_2:“非常紧张”、“非常阴郁”、“非常慷慨”、“中等魅力”<br RealAttribute_3:“略显慷慨”“适度丑陋”“适度傲慢”“适度冷静”<br RealAttribute_4:“适度傲慢”、“略显慷慨”、“非常沉闷”、“适度疏远”<br RealAttribute_5:“有点没有野心”,“适度冷静”,“适度没有野心”,“适度懒惰”<br RealAttribute_6:“有点沉闷”,“有点野心”,“非常冷静”,“适度雄心勃勃”<br RealAttribute_7:“非常聪明”、“有点疏远”、“非常聪明”、“有点丑”<br RealAttribute_8:“非常快乐”、“略显谦虚”、“略显快乐”、“非常傲慢”<br RealAttribute_9 “很遥远”, “很懒惰”, “稍微好看”, “很大方”<br RealAttribute_10:“稍微懒惰”“中等聪明”,“稍微勤奋”,“中等聪明”<br
因此,前四行的最终结果应如下所示: 智能:“非常智能”、“中等智能”、“非常智能”、“中等智能”<br 吸引力:“适度丑陋”、“适度丑陋”、“略微好看”、“略微丑陋”<br 懒惰:“有点懒惰”,“非常懒惰”,“稍微努力”,“适度懒惰”<br 友好度: “非常遥远”, “稍微遥远”, “非常遥远”, “适度遥远”<br 魅力:“有点沉闷”,“非常沉闷”,“非常沉闷”,“中等魅力”<br 冷静:“非常紧张”、“适度冷静”、“非常冷静”、“适度冷静”<br 慷慨: “稍微慷慨”, “稍微慷慨”, “非常慷慨”, “非常慷慨”<br 快乐:“非常高兴”, “非常阴郁”, “稍微高兴”, “非常高兴”<br 傲慢:“适度傲慢”,“略显谦虚”,“适度傲慢”,“非常傲慢”<br 野心:“有点野心”,“有点野心”,“适度没有野心”,“适度雄心勃勃”<br
我定义了所有属性值
attribute_values <- list(
intelligence = c("Very unintelligent", "Moderately unintelligent", "Slightly unintelligent",
"Slightly intelligent", "Moderately intelligent", "Very intelligent"),
attractiveness = c("Very ugly", "Moderately ugly", "Slightly ugly",
"Slightly good-looking", "Moderately good-looking", "Very good-looking"),
laziness = c("Very lazy", "Moderately lazy", "Slightly lazy",
"Slightly hard working", "Moderately hard working", "Very hard working"),
friendliness = c("Very distant", "Moderately distant", "Slightly distant",
"Slightly friendly", "Moderately friendly", "Very friendly"),
charisma = c("Very dull", "Moderately dull", "Slightly dull",
"Slightly charismatic", "Moderately charismatic", "Very charismatic"),
calmness = c("Very nervous", "Moderately nervous", "Slightly nervous",
"Slightly calm", "Moderately calm", "Very calm"),
generosity = c("Very stingy", "Moderately stingy", "Slightly stingy",
"Slightly generous", "Moderately generous", "Very generous"),
joyfullness = c("Very gloomy", "Moderately gloomy", "Slightly gloomy",
"Slightly joyful", "Moderately joyful", "Very joyful"),
arrogance = c("Very arrogant", "Moderately arrogant", "Slightly arrogant",
"Slightly modest", "Moderately modest", "Very modest"),
ambition = c("Very unambitious", "Moderately unambitious", "Slightly unambitious",
"Slightly ambitious", "Moderately ambitious", "Very ambitious")
)
但是开始检查我的代码中的第一个变量(智能):
intelligence_values <- c("Very unintelligent", "Moderately unintelligent", "Slightly unintelligent",
"Slightly intelligent", "Moderately intelligent", "Very intelligent")
realData$Intelligence <- apply(dt[, grep("RealAttribute_", colnames(realData), value = TRUE)], 1, function(row) {
match_value <- which(row %in% intelligence_values)[1]
if (is.na(match_value)) {
return(NA)
} else {
return(intelligence_values[match_value])
}
})
此代码仅将 NA 返回到“智能”列。 我也试过:
realData <- realData %>%
rowwise() %>%
mutate(Intelligence = intelligence_values[match(c_across(starts_with("RealAttribute")), intelligence_values)])
但出现以下错误:
错误:
!只能在数据屏蔽谓词(如 、 和 )中使用。c_across()
mutate()
filter()
group_by()
运行 rlang::last_trace() 后:
<错误/rlang_error>
错误:
!只能在数据屏蔽谓词(如 、 和 )中使用。
**---
回溯:
xc_across()
mutate()
filter()
group_by()
- +-realData %>% rowwise() %>% ...
- -plyr::突变(...)
- -base::eval(cols[[col]], .data, parent.frame())
-
\-base::eval(cols[[col]], .data, parent.frame())
-
+-base::match(c_across(starts_with("RealAttribute")), intelligence_values)
-
\-dplyr::c_across(starts_with("RealAttribute"))
运行 rlang::last_trace(drop = FALSE) 查看 4 个隐藏帧。
任何想法,哪里出了问题,或者我应该怎么写? 提前致谢!
答:
我认为这更简单,方法是制作一个表格,将我们可能遇到的属性(每个类别看起来像 1 或 2 个)与它们各自的类别连接起来。然后,我们可以对数据进行整形,提取属性,将其连接到查找表,并使用它来定义条目应该放在哪一列中。
library(tidyverse)
attribute_table <- data.frame(
attribute = c("ugly", "nervous", "generous", "arrogant", "unambitious", "dull",
"intelligent", "joyful", "distant", "lazy", "gloomy", "calm",
"ambitious", "modest", "good-looking", "hard working", "charismatic"),
category = c("attractiveness", "calmness", "generosity", "arrogance", "ambition", "charisma",
"intelligence", "joyfulness", "friendliness", "laziness", "joyfulness", "calmness",
"ambition", "arrogance", "attractiveness", "laziness", "charisma")
)
dt |>
pivot_longer(-session_id) |>
separate(value, c("degree", "attribute"), sep = " ", extra = "merge", remove = FALSE) |>
left_join(attribute_table) |>
select(session_id, value, category) |>
pivot_wider(names_from = category, values_from = value)
结果
# A tibble: 6 × 11
session_id attractiveness calmness generosity arrogance ambition charisma intelligence joyfulness friendliness laziness
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 17472631 Moderately ugly Very nervous Slightly generous Moderately arrogant Slightly unambitious Slightl… Very intell… Very joyf… Very distant Slightl…
2 17472632 Moderately ugly Moderately calm Slightly generous Slightly modest Slightly ambitious Very du… Moderately … Very gloo… Slightly di… Very la…
3 17472633 Slightly good-looking Very calm Very generous Moderately arrogant Moderately unambiti… Very du… Very intell… Slightly … Very distant Slightl…
4 17472635 Slightly ugly Moderately calm Very generous Very arrogant Moderately ambitious Moderat… Moderately … Very joyf… Moderately … Moderat…
5 17472636 Very good-looking Very calm Moderately generous Very modest Very ambitious Moderat… Moderately … Moderatel… Slightly di… Moderat…
6 17472638 Slightly good-looking Moderately nervous Moderately generous Moderately modest Slightly ambitious Moderat… Moderately … Moderatel… Very distant Very la…
评论