提问人:CrunchyTopping 提问时间:9/2/2023 最后编辑:CrunchyTopping 更新时间:9/2/2023 访问量:36
如果在一组行中重复,则从字符串中删除单词
Remove words from string if duplicated in a group of rows
问:
我有一个表,在第一个单元格中至少有两个字符串,我需要从中进行选择,并且只从较长的字符串中保留其中一个。
library(qdap)
library(magrittr)
t<- read.table(text="
V1,V2
Video of all presentations and discussionPart 1Part 2Video,Part 1
Video of all presentations and discussionPart 1Part 2Video,Part 2
Video of all presentations and discussionPart 1Part 2Video,Video
Background - PDFVideo Soil management and update (pdf)Video,PDF
Background - PDFVideo Soil management and update (pdf)Video,Video
Background - PDFVideo Soil management and update (pdf)Video,Soil management and update (pdf)
Background - PDFVideo Soil management and update (pdf)Video,Video
",
header=T,sep = ",")
因此,对于此示例,我想省略 V1 第一行中的“第 2 部分”,并省略 V1 第二行中的“第 1 部分”。
这是我尝试过的:
t%>%
split(.,.$V1)%>%
lapply(.,function(x){(unique(x$V2))})%>%
lapply(.,function(y){mgsub(pattern=y[[1]],replacement="",names(y))})
此尝试既不会更改较长的字符串,也不会保留唯一的较小字符串。
答案应如下所示:
t<- read.table(text="
V1,V2
Video of all presentations and discussionPart 1,Part 1
Video of all presentations and discussionPart 2,Part 2
Video of all presentations and discussionVideo,Video
Background - PDF,PDF
Background - Video,Video
Background - Soil management and update (pdf),Soil management and update (pdf)
Background - Video,Video
",
header=T,sep = ",")
答:
1赞
andrew_reece
9/2/2023
#1
如果“-”之前的所有内容都是正确的,并且破折号之后你唯一想要的就是 中的字符串,那么你可以抓住字符串的第一部分并将其连接起来:V2
V2
library(tidyverse)
t |>
mutate(str_segment = str_split(V1, "-", n = 2)) |>
unnest_wider(str_segment, names_sep = "_") |>
mutate(new_v1 = paste0(str_segment_1, "-", V2)) |>
select(new_v1, V2)
# A tibble: 5 × 2
new_v1 V2
<chr> <chr>
1 " Video of Animals-Elephant" Elephant
2 " Video of Animals-Rhino" Rhino
3 " Audio at loud volume-Sirens" Sirens
4 " Audio at loud volume-Horns" Horns
5 " Audio at loud volume-Crickets" Crickets
交互:
t |>
mutate(prefix = map(str_split(t$V1, "-", n=2), \(x) pluck(x, 1)),
new_v1 = paste0(prefix, "-", V2)) |>
select(new_v1, V2)
评论
0赞
r2evans
9/2/2023
这很简单。你必须和吗?我想你可以str_split
unnest
mutate(V1new = mapply(sub, x = V1, pattern = "(-)[^-]*$", replace = paste0("-", V2)))
0赞
CrunchyTopping
9/2/2023
我的坏 - 并不总是有一个破折号“-”来做 FYI 的str_split
0赞
andrew_reece
9/2/2023
@r2evans是的,你是对的,谢谢你的简化。
0赞
andrew_reece
9/2/2023
@CrunchTopping考虑在您的帖子中添加一些关于数据的哪些部分是固定/可靠的,哪些部分是可变的。
0赞
CrunchyTopping
9/2/2023
好的,谢谢
评论