提问人:econ_grad12345 提问时间:5/22/2023 更新时间:5/22/2023 访问量:34
候选者是 0 但如何记录链接 python
candidates is 0 but how record linkage python
问:
我正在尝试使用记录链接通过模糊匹配来合并数据集。我确信这两个数据集中都没有重复的唯一 ID。但是,我得到的错误是没有潜在的候选人。如何修复此错误?
这是我的代码
import pandas as pd
import recordlinkage
reference_usa = pd.read_csv('all_reference_usa.csv', index_col='id')
oc_sample = pd.read_csv('oc_sample.csv', index_col='id', low_memory=False)
indexer = recordlinkage.Index()
indexer.block(left_on='state', right_on='state')
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))
compare = recordlinkage.Compare()
compare.exact('state', 'state', label='state')
compare.string('companyname',
'name',
threshold=0.95,
label='company')
features = compare.compute(candidates, reference_usa,
oc_sample)
这是错误
/Users/anaconda3/lib/python3.10/site-packages/recordlinkage/algorithms/string.py:55: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
答: 暂无答案
评论