从同时满足两个条件的数据帧中删除列-解网

问：

我有几十个调查数据文件。每个列都有几列包含数字数据，然后还有一些包含字符数据的列。我需要根据必须同时满足的两个条件，动态地从每个列中删除一列：

该列是数字，并且...
该列包含“总体”一词

我不能简单地取消选择任何/所有包含“Overall”一词的列，因为我必须保留的一个字符列的标题中包含“Overall”。

我不能简单地按名称或位置取消选择它，因为不同的文件在位置或名称方面不一致。它们唯一的共同点是该列是数字，并且标题中包含“总体”。此外，并非所有文件都有这样的列，这就是我尝试动态执行此操作的原因。

下面是一个非常简化的此类文件的示例，因为它显示在 DataFrame 中：

#### reproducible example ####

columns <- c("rating A", "rating B", "Student Overall Rating", 
              "feedback 1", "feedback 2", "Student Overall Feedback")
c1 <- c(4, 4, 3)
c2 <- c(5, 4, 4)
c3 <- c(4.5, 4, 3.5)
c4 <- c("blah", "blah", "blah")
c5 <- c("blah", "blah", "blah")
c6 <- c("blahblah", "blahblah", "blahblah")

df <- as.data.frame(cbind(c1, c2, c3, c4, c5, c6))
names(df) <- columns
df$`rating A` <- as.numeric(df$`rating A`)
df$`rating B` <- as.numeric(df$`rating B`)
df$`Student Overall Rating` <- as.numeric(df$`Student Overall Rating`)

str(df)  # shows relative structure I am dealing with

'data.frame':   3 obs. of  6 variables:
 $ rating A                : num  4 4 3
 $ rating B                : num  5 4 4
 $ Student Overall Rating  : num  4.5 4 3.5
 $ feedback 1              : chr  "blah" "blah" "blah"
 $ feedback 2              : chr  "blah" "blah" "blah"
 $ Student Overall Feedback: chr  "blahblah" "blahblah" "blahblah"

我进行了广泛的搜索，并尝试了几件事：

df <- df %>% select(!intersect(is.numeric(df), df %like% "Overall"))

这给了我：

Error in `select()`:
! Can't subset columns with `intersect(is.numeric(df), df %like% "Overall")`.
✖ `intersect(is.numeric(df), df %like% "Overall")` must be numeric or character, not `FALSE`.

我也试过了......

df <- df %>% select(!where(is.numeric | contains("Overall")))

其结果是：

Error in `select()`:
! Problem while evaluating `where(is.numeric | contains("Overall"))`.
Caused by error in `is.numeric | contains("Overall")`:
! operations are possible only for numeric, logical or complex types

对于具有数字“学生总体评分”字段的文件，我想要的结果是这样的：

'data.frame':   3 obs. of  5 variables:
 $ rating A                : num  4 4 3
 $ rating B                : num  5 4 4
 $ feedback 1              : chr  "blah" "blah" "blah"
 $ feedback 2              : chr  "blah" "blah" "blah"
 $ Student Overall Feedback: chr  "blahblah" "blahblah" "blahblah"

我知道我可以单独做任何一个条件，但是有没有办法同时满足两个条件？有没有其他方法可以做到这一点？我真的在努力避免手动操作每个文件。select(where())

R 数据帧 DPLYR

顺便说一句，在构造示例数据时，必须返回并转换列的原因是 . 创建一个矩阵，该矩阵只能有一种数据类型，因此所有内容都转换为字符。如果更改为，则跳过创建矩阵并跳过那些错误的转换，这意味着您不必将数据转换回应有的状态。您通常应该避免，只需使用即可。df$`rating A` <- as.numeric(df$`rating A`)cbindcbindas.data.frame(cbind(...))data.frame()as.data.frame(cbind())data.frame()

1赞 Gregor Thomas 11/17/2023

欢迎来到本网站！感谢您包含一个最小的、可重现的和可验证的示例，并展示了大量的努力！这是一个非常好的第一个问题！

0赞 Brandon Signorino 11/17/2023

@GregorThomas感谢您关于构建我的样本数据的说明 - 这是一件快速而肮脏的事情，我对主要问题感到沮丧，所以我没有以最有效的方式（显然）做到这一点。另外，感谢您的客气话 - 我经常站在提供故障排除的一边，所以当我需要帮助时，我会努力帮助别人帮助我。

0赞 Gregor Thomas 11/21/2023

是的，这只是我经常看到的一种反模式，有时会导致问题，所以我在看到它时会尝试指出它。as.data.frame(cbind())

答：

2赞 Gregor Thomas 11/17/2023 #1

你离得太近了！

library(dplyr)
df |> select(!(where(is.numeric) & contains("Overall")))
#   rating A rating B feedback 1 feedback 2 Student Overall Feedback
# 1        4        5       blah       blah                 blahblah
# 2        4        4       blah       blah                 blahblah
# 3        3        4       blah       blah                 blahblah

一些解释：

df <- df %>% select(!intersect(is.numeric(df), df %like% "Overall"))

上述操作失败，因为您正在申请，数据框。数据框是，它不是数字，是。同样，对于.您需要使用列名称，而不是数据框本身。is.numericdflistis.numeric(df)FALSE%like%

df <- df %>% select(!where(is.numeric | contains("Overall")))

这个很接近，它就像在我的工作解决方案中使用一样。这里的问题是它内部需要一个函数。是一个函数，但不是函数。放进去也不太管用。将评估您在每列上赋予它的函数，一次一个。希望一次查看一堆列的名称。!wherewhere()is.numericis.numeric | contains("Overall")contains()where()where()contains()

从同时满足两个条件的数据帧中删除列

Remove a column from a dataframe where two conditions are true at the same time

评论

评论