提问人:george1994 提问时间:10/11/2023 更新时间:10/11/2023 访问量:47
如何在 R 中的字符向量中准确检测城市名称?
How can I accurately detect city names in a character vector in R?
问:
我有一个名称的字符向量,并希望准确识别每个元素是否包含城市名称。为了实现这一点,我最初使用了以下代码:
name <- c( "Business Applications for New York" ,"Proprietors' Farm Income in New York" ,"Farm Business (Included in Nonfinancial Corporate and Noncorporate Business Sectors); Nonresidential Structures, Current Cost Basis, Transactions")
library(maps)
city=c()
for (j in 1:length(name)) {
testresult=c()
for (i in 1:length(us.cities$name)) {
testresult[i] = agrepl(us.cities$name[i], name[j], max.distance=3, ignore.case=TRUE,fixed = T)
}
if (sum(testresult>0)) {
city[j]=1
} else{
city[j]=0 }
}
city
但是,此代码错误地得出结论,即名称向量中的所有元素都包含城市名称。有没有更好的方法来准确检测 R 中字符向量的每个元素中的城市名称?您的见解和代码示例将不胜感激。谢谢!
答:
1赞
jpsmith
10/11/2023
#1
在这种情况下,一种方法可能是利用固有的 ,其中包含状态缩写,并使用 with 从数据集中删除这些缩写。然后用于查看是否有任何匹配项:state.abb
map
us.cities$name
gsub
paste(..., collapse = "|")
grepl
cities_only <- trimws(gsub(paste(state.abb, collapse = "|"), "", us.cities$name))
# See comparison:
head(us.cities$name)
# [1] "Abilene TX" "Akron OH" "Alameda CA" "Albany GA" "Albany NY" "Albany OR"
head(cities_only)
# [1] "Abilene" "Akron" "Alameda" "Albany" "Albany" "Albany"
grepl(paste0(cities_only, collapse = "|"), name)
# [1] TRUE TRUE FALSE
# or if you want it 1/0, add `+`:
+grepl(paste0(cities_only, collapse = "|"), name)
# [1] 1 1 0
(注意:修剪与此关联的空格trimws
gsub
)
评论