提问人:bvowe 提问时间:11/9/2023 最后编辑:thelatemailbvowe 更新时间:11/11/2023 访问量:47
R stringr 解析案例字母
R stringr Parse Cases Letters
问:
HAVE WANT1 WANT2
CLStephen Five CL Stephen Five
RTQQuent Lou X RTQ Quent Lou X
我们学校系统上存在数据输入错误,我有列“HAVE”,并希望将其分为“WANT1”和“WANT2”
WANT1 = take the first n-1 CAPITAL letters
WANT2 = take the remaining letters
答:
2赞
thelatemail
11/9/2023
#1
在 stringr 和 base R 中尝试:
x <- c("CLStephen Five","RTQQuent Lou X")
library(stringr)
str_remove(x, "[A-Z][^A-Z].+")
#[1] "CL" "RTQ"
str_extract(x, "[A-Z][^A-Z].+")
#[1] "Stephen Five" "Quent Lou X"
sub("[A-Z][^A-Z].+", "", x)
#[1] "CL" "RTQ"
sub("[A-Z]+([A-Z][^A-Z].+)", "\\1", x)
#[1] "Stephen Five" "Quent Lou X"
1赞
Adriano Mello
11/11/2023
#2
另一个新的解决方案:tidyr::separate_wider_regex
library(dplyr)
library(tidyr)
df <- tibble(have = c("CLStephen Five","RTQQuent Lou X"))
# --------------------------
separate_wider_regex(
df,
cols = have,
patterns = c(
want1 = "[A-Z]+(?=[A-Z][^A-Z])",
want2 = "[A-Z]{1}[^A-Z].*"))
# A tibble: 2 × 2 ---------
want1 want2
<chr> <chr>
1 CL Stephen Five
2 RTQ Quent Lou X
评论