提问人:Selk 提问时间:11/15/2023 最后编辑:ThomasIsCodingSelk 更新时间:11/17/2023 访问量:868
如何在不忽略尾随拆分字符的情况下按字符拆分字符串?
How do I split a string by a character without ignoring trailing split-characters?
问:
我有一个类似于以下内容的字符串
my_string <- "apple,banana,orange,"
我想拆分以产生输出:,
list(c('apple', 'banana', 'orange', ""))
我以为 strsplit 会做到这一点,但它将尾随的 ',' 视为不存在
my_string <- "apple,banana,orange,"
strsplit(my_string, split = ',')
#> [[1]]
#> [1] "apple" "banana" "orange"
创建于 2023-11-15 由 reprex 软件包 (v2.0.1)
实现所需输出的最简单方法是什么?
更多带有示例字符串和所需输出的测试用例
string1 = "apple,banana,orange,"
output1 = list(c('apple', 'banana', 'orange', ''))
string2 = "apple,banana,orange,pear"
output2 = list(c('apple', 'banana', 'orange', 'pear'))
string3 = ",apple,banana,orange"
output3 = list(c('', 'apple', 'banana', 'orange'))
## Examples of non-comma separated strings
# '|' separator
string4 = "|apple|banana|orange|"
output4 = list(c('', 'apple', 'banana', 'orange', ''))
# 'x' separator
string5 = "xapplexbananaxorangex"
output5 = list(c('', 'apple', 'banana', 'orange', ''))
编辑:
理想情况下,解决方案应泛化到任何拆分字符
也更喜欢 base-R 解决方案(尽管仍然链接任何提供此功能的包,因为它们的源代码可能有助于查看!
答:
12赞
GuedesBF
11/15/2023
#1
使用纵梁
library(stringr)
str_split(my_string, ",")
[[1]]
[1] "apple" "banana" "orange" ""
评论
1赞
thelatemail
11/15/2023
这有效 (+1),但有趣的是,仍然不适用于 ,而不是 。strsplit
stringr::str_split
1赞
rawr
11/15/2023
这是意料之中的## Note that final empty strings are not produced
3赞
Selk
11/15/2023
我认为这个答案可以简化为仅使用,因为它处理前导和尾随字符串,stringr::str_split()
stringr::str_split(",apple,banana,orange,", pattern = ",")
3赞
Selk
11/15/2023
这是一个很好的解决方案,可能对未来的观众有用。它没有被标记为答案的唯一原因是由于对 base-R 解决方案的偏好
1赞
Adriano Mello
11/15/2023
如果需要简单性,将返回 a 而不是 : 。stringr::str_split_1(my_string, ",")
character vector
list
[1] "apple" "banana" "orange" ""
4赞
Adesoji Alu
11/15/2023
#2
我用过这个
my_string <- "apple,banana,orange,"
# Now, i Append an extra character (here I use 'X') and then splitting
result <- strsplit(paste0(my_string, "X"), ",X")
result
然后是用例
split_string <- function(s) {
# Add a special character at the beginning and end if the string starts or ends with a comma
if (startsWith(s, ",")) {
s <- paste0("SPECIALCHAR", s)
}
if (endsWith(s, ",")) {
s <- paste0(s, "SPECIALCHAR")
}
# Split the string by comma
parts <- strsplit(s, ",", fixed = TRUE)[[1]]
# Replace the special character with an empty string
parts <- gsub("SPECIALCHAR", "", parts)
return(parts)
}
# Test cases
string1 <- "apple,banana,orange,"
string2 <- "apple,banana,orange,pear"
string3 <- ",apple,banana,orange"
output1 <- split_string(string1)
output2 <- split_string(string2)
output3 <- split_string(string3)
output1 # Expected: "apple", "banana", "orange", ""
output2 # Expected: "apple", "banana", "orange", "pear"
output3 # Expected: "", "apple", "banana", "orange"
评论
3赞
thelatemail
11/15/2023
这是行不通的 - 它不会在末尾添加一个空白字符串,也不会拆分原始字符串。
2赞
thelatemail
11/15/2023
尽管我认为您最初的想法是正确的 - 只需添加另一个分隔符,然后 - 应该可以工作。strsplit
strsplit(paste0(my_string, ","), ",")
0赞
Selk
11/15/2023
@thelatemail strsplit(paste0(my_string, “,”), “,”) 是另一个简洁的解决方案,但值得注意的是,这不会推广到拆分的正则表达式/转义值。可以解决我所有的测试用例,但前提是对于“|”分隔符,您使用 fixed=TRUE,而不是尝试用“\\|”对其进行转义
0赞
thelatemail
11/15/2023
@Selk - 我认为对我有用 -strsplit(paste(string4, "|"), split="\\|")
1赞
Selk
11/15/2023
@thelatemail是的,我的意思是,如果您将其概括为适用于任何分隔符的函数,即,您必须添加一些逻辑以在粘贴之前从字符串中去除双反斜杠,如果这有意义的话。我同意这一点,您的扫描解决方案看起来像是最有前途的 base-R 解决方案。如果您有兴趣整理一个描述这两种方法的答案,我认为这将是一个很好的官方答案!(strsplit2(x, sep))
sep
14赞
ThomasIsCoding
11/15/2023
#3
为什么没有给出想要的输出?strsplit
键入 时,您将阅读以下语句?strsplit
请注意,这意味着如果 (非空)字符串,则输出的第一个元素是 “”,但如果 字符串末尾有匹配项,输出与 删除匹配项。
这就是您在使用 .""
strsplit
以下是一些演示
> strsplit("apple,banana,orange,", ",")
[[1]]
[1] "apple" "banana" "orange"
> strsplit(",apple,banana,orange,", ",")
[[1]]
[1] "" "apple" "banana" "orange"
> strsplit(",apple,banana,orange", ",")
[[1]]
[1] "" "apple" "banana" "orange"
> strsplit("apple,banana,orange", ",")
[[1]]
[1] "apple" "banana" "orange"
基本 R 解决方法
如果你想进行编码练习,一个基本的 R 选项可以是定义一个自定义函数(递归),如下所示
f <- function(x, sep = ",") {
pat <- sprintf("^(.*?)%s.*", sep)
s1 <- sub(pat, "\\1", x)
s2 <- sub(paste0("^.*?", sep), "", x)
if (s2 == x) {
return(x)
}
c(s1, Recall(s2, sep))
}
或具有substr
+ regexpr
f <- function(x, sep = ",") {
idx <- regexpr(sep, x)
s1 <- substr(x, 1, idx - 1)
s2 <- substr(x, idx + 1, nchar(x))
if (s2 == x) {
return(x)
}
c(s1, Recall(s2, sep))
}
这样
> f("apple,banana,orange,")
[1] "apple" "banana" "orange" ""
> f(",apple,banana,orange,")
[1] "" "apple" "banana" "orange" ""
> f(",apple,banana,orange")
[1] "" "apple" "banana" "orange"
> f("apple,banana,orange")
[1] "apple" "banana" "orange"
12赞
thelatemail
11/15/2023
#4
在末尾粘贴另一个分隔符应该允许按预期运行。
否则,您可以回退到使用支撑函数的函数:strsplit
scan
read.csv/table
strsplit(paste0(string1, ","), ",")
##[[1]]
##[1] "apple" "banana" "orange" ""
一般而言,考虑到正则表达式替换:
L <- list(string1, string2, string3, string4, string5)
mapply(
function(x,s) strsplit(paste0(x, gsub("\\\\", "", s)), split=s),
L,
c(",", ",", ",", "\\|", "x")
)
##[[1]]
##[1] "apple" "banana" "orange" ""
##
##[[2]]
##[1] "apple" "banana" "orange" "pear"
##
##[[3]]
##[1] "" "apple" "banana" "orange"
##
##[[4]]
##[1] "" "apple" "banana" "orange" ""
##
##[[5]]
##[1] "" "apple" "banana" "orange" ""
scan
选择:
scan(text=string1, sep=",", what="")
##Read 4 items
##[1] "apple" "banana" "orange" ""
概括:
mapply(
function(x,s) scan(text=x, sep=s, what=""),
L,
c(",", ",", ",", "|", "x")
)
评论
0赞
ThomasIsCoding
11/15/2023
我认为是这个问题最便宜的基础 R 解决方法,干杯!scan
2赞
Selk
11/15/2023
标记为答案满足所有条件(基本 R 实现,输出完全符合问题中所述)。为了将来参考,ThomasIsCoding的答案描述了一个替代的baseR解决方案,它也非常好。任何不需要 baseR 实现的人都应该看到 GuedesBF 对使用 stringr 的简单解决方案的回答
上一个:列表中最后一个数据帧的最后一行
评论
stringi::stri_split_fixed
strsplit
scan(text=my_string, sep=",", what="")