提问人:emily20wood 提问时间:5/31/2023 最后编辑:oguz ismailemily20wood 更新时间:6/20/2023 访问量:91
在 R 中重新排列名称列表,从“姓氏名字”到“名字首字母”。姓氏”
Re-arranging a list of names in R from "SURNAMES first names", to "first initial. SURNAMES"
问:
我有一个名字列表,如下所示:
c("CASEY Aoife", "CREMEN Margaret", "MORCH-PEDERSEN Marie",
"RORVIK Jenny Marie", "MIGUEL GOMES Natalia", "ROHNER Maria-Clara")
为了将它们显示在表格中,我希望它们看起来像这样
c("A. CASEY", "M. CREMEN", "M. MORCH-PEDERSEN",
"J. RORVIK", "N. MIGUEL GOMES", "M. ROHNER")
存在挑战,因为有些人有多个名字和多个姓氏等,以及处理连字符等。
我尝试了如下函数,但没有得到我想要的输出:
convert_name <- function(name) {
parts <- str_split(name, " ")[[1]] # Split name into parts
# Extract initials and last name
initials <- str_extract(parts, "\\b\\p{L}") # Extract first letter of each part
last_name <- parts[length(parts)]
# Concatenate initials and last name with space
converted_name <- paste(initials, last_name, sep = ". ")
return(converted_name)
}
答:
1赞
Vons
5/31/2023
#1
sapply
在每个名称上都有一个函数来随机播放名称。
x=c("CASEY Aoife", "CREMEN Margaret", "MORCH-PEDERSEN Marie",
"RORVIK Jenny Marie", "MIGUEL GOMES Natalia", "ROHNER Maria-Clara")
sapply(strsplit(x, " "), \(y) {
j = 1
for (i in 1:length(y)) {
if (identical(y[i], toupper(y[i]))) {
j = i
} else {
break
}
}
paste0(substr(y[j+1], 1, 1), ". ", paste0(y[1:j], collapse=" "))
})
另一个没有forloop的选项:
sapply(strsplit(x, " "), function(y){
ix <- y == toupper(y)
paste0(substr(y[ !ix ][ 1 ], 1, 1), ". ", paste(y[ ix ], collapse = " "))
})
输出
[1] "A. CASEY" "M. CREMEN" "M. MORCH-PEDERSEN"
[4] "J. RORVIK" "N. MIGUEL GOMES" "M. ROHNER"
3赞
GKi
5/31/2023
#2
您可以使用以下工具:sub
sub("(.*[A-Z]) ([A-Z]).*", "\\2. \\1", s)
#[1] "A. CASEY" "M. CREMEN" "M. MORCH-PEDERSEN"
#[4] "J. RORVIK" "N. MIGUEL GOMES" "M. ROHNER"
#[7] "P. FERNANDES-Da-VEIGA" "W. Van-DORP" "G. De-VITA"
Where 匹配以大写字母后跟空格结尾的任何内容。 将匹配项存储在 中。后跟大写字母,存储后跟任何内容。(.*[A-Z])
()
\\1
\\2
.*
数据
s <- c("CASEY Aoife", "CREMEN Margaret", "MORCH-PEDERSEN Marie",
"RORVIK Jenny Marie", "MIGUEL GOMES Natalia", "ROHNER Maria-Clara",
"FERNANDES-Da-VEIGA Paulo", "Van-DORP Wianka", "De-VITA Giuseppe")
评论
0赞
emily20wood
6/1/2023
谢谢,这与我的例子配合得很好。我已经在更大的数据集上尝试过它,并意识到我有一些更复杂的名字,姓氏中有一些小写字母,例如.关于如何解释这些以及我原始示例中的那些有什么想法吗?c("FERNANDES-Da-VEIGA Paulo", "Van-DORP Wianka", "De-VITA Giuseppe")
0赞
GKi
6/1/2023
请参阅更新。希望这适用于其他情况。
评论
sub("^([-A-Z]+)\\s+([A-Z]).+$", "\\2. \\1", x, perl=TRUE)
([-A-Z]+)
([A-Z ]+)
Z
"MIGUEL GOMES Natalia"
perl=TRUE