提取与 R 中列范围之外的字符串匹配的值

Extract value that match a string out of range of columns in R

提问人:Yahel 提问时间:11/13/2023 更新时间:11/13/2023 访问量:35

问:

在 R 中编写相关代码。我有一个数据帧(称为 realData),其中包含 10 个名为“RealAttribute_X”的变量,其中 X 是 1 到 10(包括)之间的数字。 每列包含十个属性之一:“智慧”、“吸引力”、“魅力”、“野心”、“懒惰”、“慷慨”、“快乐”、“友好”、“傲慢”、“冷静”。这些属性被随机分配给 10 个“RealAttribute_X”中的 1 个。

dt<-structure(list(session_id = c("17472631", "17472632", "17472633", 
"17472635", "17472636", "17472638"), RealAttribute_1 = c("Moderately ugly", 
"Very dull", "Very distant", "Very joyful", "Moderately joyful", 
"Very distant"), RealAttribute_2 = c("Very nervous", "Very gloomy", 
"Very generous", "Moderately charismatic", "Moderately hard working", 
"Moderately modest"), RealAttribute_3 = c("Slightly generous", 
"Moderately ugly", "Moderately arrogant", "Moderately calm", 
"Moderately charismatic", "Moderately charismatic"), RealAttribute_4 = c("Moderately arrogant", 
"Slightly generous", "Very dull", "Moderately distant", "Slightly distant", 
"Slightly ambitious"), RealAttribute_5 = c("Slightly unambitious", 
"Moderately calm", "Moderately unambitious", "Moderately lazy", 
"Very modest", "Moderately intelligent"), RealAttribute_6 = c("Slightly dull", 
"Slightly ambitious", "Very calm", "Moderately ambitious", "Moderately generous", 
"Moderately generous"), RealAttribute_7 = c("Very intelligent", 
"Slightly distant", "Very intelligent", "Slightly ugly", "Very good-looking", 
"Very lazy"), RealAttribute_8 = c("Very joyful", "Slightly modest", 
"Slightly joyful", "Very arrogant", "Very ambitious", "Slightly good-looking"
), RealAttribute_9 = c("Very distant", "Very lazy", "Slightly good-looking", 
"Very generous", "Moderately intelligent", "Moderately gloomy"
), RealAttribute_10 = c("Slightly lazy", "Moderately intelligent", 
"Slightly hard working", "Moderately intelligent", "Very calm", 
"Moderately nervous")), class = "data.frame", row.names = c(NA, 
-6L))

head(dt)

每个属性可以有六个值之一。属性及其可选值为:intelligence = “非常不智能”、“中度不智能”、“稍微不智能”、“稍微智能”、“中等智能”、“非常智能” 吸引力=“非常丑陋”、“中度丑陋”、“略微丑陋”、“稍微好看”、“适度好看”、“非常好看” 懒惰 = “非常懒惰”, “适度懒惰”, “稍微懒惰”, “稍微努力”, “适度努力”, “非常努力” 友好度 = “非常遥远”, “适度疏远”, “稍微疏远”, “稍微友好”, “适度友好”, “非常友好” 魅力=“非常沉闷”、“适度沉闷”、“略显沉闷”、“略显魅力”、“中等魅力”、“非常有魅力” calmness = “非常紧张”, “中度紧张”, “稍微紧张”, “稍微平静”, “适度平静”, “非常平静” 慷慨 = “非常吝啬”, “适度吝啬”, “略显吝啬”, “略显慷慨”, “适度慷慨”, “非常慷慨” joyfullness = “非常忧郁”, “适度忧郁”, “略显忧郁”, “略有喜悦”, “适度喜悦”, “非常喜悦” 傲慢=“非常傲慢”、“适度傲慢”、“略微傲慢”、“略微谦虚”、“适度谦虚”、“非常谦虚” ambition = “非常没有野心”, “适度没有野心”, “有点没有野心”, “有点雄心勃勃”, “适度雄心勃勃”, “非常雄心勃勃”

我正在尝试在 R 中编写一个代码,该代码将创建 10 个新列,每个属性(“智力”、“吸引力”、“魅力”、“野心”、“懒惰”、“慷慨”、“快乐”、“友好”、“傲慢”、“冷静”),在包含相关属性的现有数据帧中找到正确的变量,并将该属性的值分配给正确的新变量。

例如,这是 DataFrame 的前四行: RealAttribute_1:“适度丑陋”、“非常沉闷”、“非常遥远”、“非常快乐”<br RealAttribute_2:“非常紧张”、“非常阴郁”、“非常慷慨”、“中等魅力”<br RealAttribute_3:“略显慷慨”“适度丑陋”“适度傲慢”“适度冷静”<br RealAttribute_4:“适度傲慢”、“略显慷慨”、“非常沉闷”、“适度疏远”<br RealAttribute_5:“有点没有野心”,“适度冷静”,“适度没有野心”,“适度懒惰”<br RealAttribute_6:“有点沉闷”,“有点野心”,“非常冷静”,“适度雄心勃勃”<br RealAttribute_7:“非常聪明”、“有点疏远”、“非常聪明”、“有点丑”<br RealAttribute_8:“非常快乐”、“略显谦虚”、“略显快乐”、“非常傲慢”<br RealAttribute_9 “很遥远”, “很懒惰”, “稍微好看”, “很大方”<br RealAttribute_10:“稍微懒惰”“中等聪明”,“稍微勤奋”,“中等聪明”<br

因此,前四行的最终结果应如下所示: 智能:“非常智能”、“中等智能”、“非常智能”、“中等智能”<br 吸引力:“适度丑陋”、“适度丑陋”、“略微好看”、“略微丑陋”<br 懒惰:“有点懒惰”,“非常懒惰”,“稍微努力”,“适度懒惰”<br 友好度: “非常遥远”, “稍微遥远”, “非常遥远”, “适度遥远”<br 魅力:“有点沉闷”,“非常沉闷”,“非常沉闷”,“中等魅力”<br 冷静:“非常紧张”、“适度冷静”、“非常冷静”、“适度冷静”<br 慷慨: “稍微慷慨”, “稍微慷慨”, “非常慷慨”, “非常慷慨”<br 快乐:“非常高兴”, “非常阴郁”, “稍微高兴”, “非常高兴”<br 傲慢:“适度傲慢”,“略显谦虚”,“适度傲慢”,“非常傲慢”<br 野心:“有点野心”,“有点野心”,“适度没有野心”,“适度雄心勃勃”<br

我定义了所有属性值

attribute_values <- list(
  intelligence = c("Very unintelligent", "Moderately unintelligent", "Slightly unintelligent", 
                   "Slightly intelligent", "Moderately intelligent", "Very intelligent"),
  attractiveness = c("Very ugly", "Moderately ugly", "Slightly ugly", 
                     "Slightly good-looking", "Moderately good-looking", "Very good-looking"),
  laziness = c("Very lazy", "Moderately lazy", "Slightly lazy", 
               "Slightly hard working", "Moderately hard working", "Very hard working"),
  friendliness = c("Very distant", "Moderately distant", "Slightly distant", 
                   "Slightly friendly", "Moderately friendly", "Very friendly"),
  charisma = c("Very dull", "Moderately dull", "Slightly dull", 
               "Slightly charismatic", "Moderately charismatic", "Very charismatic"),
  calmness = c("Very nervous", "Moderately nervous", "Slightly nervous", 
               "Slightly calm", "Moderately calm", "Very calm"),
  generosity = c("Very stingy", "Moderately stingy", "Slightly stingy", 
                 "Slightly generous", "Moderately generous", "Very generous"),
  joyfullness = c("Very gloomy", "Moderately gloomy", "Slightly gloomy", 
                  "Slightly joyful", "Moderately joyful", "Very joyful"),
  arrogance = c("Very arrogant", "Moderately arrogant", "Slightly arrogant", 
                "Slightly modest", "Moderately modest", "Very modest"),
  ambition = c("Very unambitious", "Moderately unambitious", "Slightly unambitious", 
               "Slightly ambitious", "Moderately ambitious", "Very ambitious")
)

但是开始检查我的代码中的第一个变量(智能):

intelligence_values <- c("Very unintelligent", "Moderately unintelligent", "Slightly unintelligent", 
                         "Slightly intelligent", "Moderately intelligent", "Very intelligent")

realData$Intelligence <- apply(dt[, grep("RealAttribute_", colnames(realData), value = TRUE)], 1, function(row) {
  match_value <- which(row %in% intelligence_values)[1]
  if (is.na(match_value)) {
    return(NA)
  } else {
    return(intelligence_values[match_value])
  }
})

此代码仅将 NA 返回到“智能”列。 我也试过:

realData <- realData %>%
  rowwise() %>%
  mutate(Intelligence = intelligence_values[match(c_across(starts_with("RealAttribute")), intelligence_values)])

但出现以下错误: 错误: !只能在数据屏蔽谓词(如 、 和 )中使用。c_across()mutate()filter()group_by()

运行 rlang::last_trace() 后: <错误/rlang_error> 错误: !只能在数据屏蔽谓词(如 、 和 )中使用。 **--- 回溯: xc_across()mutate()filter()group_by()

  1. +-realData %>% rowwise() %>% ...
  2. -plyr::突变(...)
  3. -base::eval(cols[[col]], .data, parent.frame())
  4. \-base::eval(cols[[col]], .data, parent.frame())
    
  5.   +-base::match(c_across(starts_with("RealAttribute")), intelligence_values)
    
  6.   \-dplyr::c_across(starts_with("RealAttribute"))
    

运行 rlang::last_trace(drop = FALSE) 查看 4 个隐藏帧。

任何想法,哪里出了问题,或者我应该怎么写? 提前致谢!

R 搜索

评论

0赞 Jon Spring 11/14/2023
在代码中要注意的另一件事是,您似乎同时加载了 plyr 和 dplyr。这两个包有一些共同的函数名称,例如 mutate 和 summarize,这可能会导致问题,因为它们的工作方式不同(例如 plyr::mutate 看不到使用 group_by 创建的组)。

答:

1赞 Jon Spring 11/13/2023 #1

我认为这更简单,方法是制作一个表格,将我们可能遇到的属性(每个类别看起来像 1 或 2 个)与它们各自的类别连接起来。然后,我们可以对数据进行整形,提取属性,将其连接到查找表,并使用它来定义条目应该放在哪一列中。

library(tidyverse)

attribute_table <- data.frame(
  attribute = c("ugly", "nervous", "generous", "arrogant", "unambitious", "dull", 
                "intelligent", "joyful", "distant", "lazy", "gloomy", "calm", 
                "ambitious", "modest", "good-looking", "hard working", "charismatic"),
  category = c("attractiveness", "calmness", "generosity", "arrogance", "ambition", "charisma",
               "intelligence", "joyfulness", "friendliness", "laziness", "joyfulness", "calmness",
               "ambition", "arrogance", "attractiveness", "laziness", "charisma")
)

dt |>
  pivot_longer(-session_id) |>
  separate(value, c("degree", "attribute"), sep = " ", extra = "merge", remove = FALSE) |>
  left_join(attribute_table) |>
  select(session_id, value, category) |>
  pivot_wider(names_from = category, values_from = value)

结果

# A tibble: 6 × 11
  session_id attractiveness        calmness           generosity          arrogance           ambition             charisma intelligence joyfulness friendliness laziness
  <chr>      <chr>                 <chr>              <chr>               <chr>               <chr>                <chr>    <chr>        <chr>      <chr>        <chr>   
1 17472631   Moderately ugly       Very nervous       Slightly generous   Moderately arrogant Slightly unambitious Slightl… Very intell… Very joyf… Very distant Slightl…
2 17472632   Moderately ugly       Moderately calm    Slightly generous   Slightly modest     Slightly ambitious   Very du… Moderately … Very gloo… Slightly di… Very la…
3 17472633   Slightly good-looking Very calm          Very generous       Moderately arrogant Moderately unambiti… Very du… Very intell… Slightly … Very distant Slightl…
4 17472635   Slightly ugly         Moderately calm    Very generous       Very arrogant       Moderately ambitious Moderat… Moderately … Very joyf… Moderately … Moderat…
5 17472636   Very good-looking     Very calm          Moderately generous Very modest         Very ambitious       Moderat… Moderately … Moderatel… Slightly di… Moderat…
6 17472638   Slightly good-looking Moderately nervous Moderately generous Moderately modest   Slightly ambitious   Moderat… Moderately … Moderatel… Very distant Very la…