如何删除出现在另一列中的特定行的第一个单词？-解网

问：

当“关键字”列中存在单词时，有没有办法删除“内容”列的前 n 个单词？

我正在使用类似于这样的数据帧：

keyword <- c("Mr. Jones", "My uncle Sam", "Tom", "", "The librarian")
content <- c("Mr. Jones is drinking coffee", "My uncle Sam is sitting in the kitchen with my uncle Richard", "Tom is playing with Tom's family's dog", "Cassandra is jogging for her first time", "The librarian is jogging with her")
data <- data.frame(keyword, content)
data

在某些情况下，“键盘”刺痛的前几个单词包含在“content”字符串中。在其他情况下，“关键字”字符串保持为空，仅填充“内容”。

我在这里要实现的是删除出现在“content”中同一行的“关键字”中单词组合的第一次出现。不幸的是，我只能创建删除所有匹配单词的代码。但正如你所看到的，有些词（如“叔叔”或“汤姆”）在一个单元格中出现不止一次。我只想删除第一次出现，并将之后的所有内容都保留在同一个单元格中。

我的下一个最佳解决方案是使用以下代码：

data$content <- mapply(function(x,y)gsub(x,"",y) ,gsub(" ", "|",data$keyword),data$content)

此代码旨在从同一行的“关键字”中存在的“内容”中删除所有单词。（它最初发布在这里）。

我尝试的另一个选择是为此设计一个函数：我首先创建了一个新变量，该变量计算相应行的“关键字”字符串中包含的单词数：

numw <- lengths(gregexpr("\\S+", data$keyword))
data <- cbind(data, numw)

其次，我尝试制定一个函数来删除 content[i] 的前 n 个单词，其中 n = numw[i]

shorten <- function(v, z){
  v <- gsub(".*^\\w+", z, v)
}

shorten(data$content, data$numw)

不幸的是，我无法使该函数正常工作，并且将生成以下错误消息：

gsub（“.*^\w+”， z， v）中的错误：无效的“替换”参数

所以，如果有人能帮助我制定一个可以更恰当地处理这个问题的函数，我会非常高兴。

r 字符串数据操作 gsub

library(tidyverse )

data |> 
  mutate(keyword = na_if(keyword, '')) |> 
  mutate(content = case_when(
    !is.na(keyword) ~ str_remove(content, keyword),
    is.na(keyword) ~content))
#>         keyword                                          content
#> 1     Mr. Jones                               is drinking coffee
#> 2  My uncle Sam  is sitting in the kitchen with my uncle Richard
#> 3           Tom               is playing with Tom's family's dog
#> 4          <NA>          Cassandra is jogging for her first time
#> 5 The librarian                              is jogging with her

如何删除出现在另一列中的特定行的第一个单词？

How to remove the first words of specific rows that appear in another column?

评论

评论