提问人:Gabriel Choo 提问时间:9/26/2023 最后编辑:Gabriel Choo 更新时间:9/26/2023 访问量:33
如何在 Sklearn 最近邻中实现自定义距离指标
How to Implement Custom Distance Metrics in Sklearn Nearest Neighbor
问:
我正在尝试在 Sklearn 最近邻中实现我自己的距离指标,特别是 Jaro 距离,但我得到了一些错误。我试着在网上查找,但没有找到解决方案。这是我所做的:
# libraries
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer
import jellyfish
# Jaro distance function
def jaro_distance(s1,s2):
return 1 - jellyfish.jaro_similarity(s1,s2)
# create samples and a namelist to compare against samples
samples = pd.DataFrame({'NAME':['Saige Fuentes','Bowen Higgins','Kylan Gentry','Amelie Griffith','Jaylen Blackwell']})
namelist = pd.DataFrame({'NAME':['Bowen Higgins','Jaylen Blackwell','Marceline Avila']})
cvec = CountVectorizer(ngram_range=(1,4))
X_names = cvec.fit_transform(namelist['NAME'])
nbrs = NearestNeighbors(n_neighbors = 1, metric = jaro_distance).fit(X_names)
input_vec = cvec.transform(samples['NAME'])
distances, indices = nbrs.kneighbors(input_vec, n_neighbors = 1)
这是我得到一个TypeError'csr_matrix'对象无法转换为'PyString'的地方。
我想知道如何解决这个问题。谢谢!
答: 暂无答案
下一个:Vespa中的喂养文件
评论