候选者是 0 但如何记录链接 python

candidates is 0 but how record linkage python

提问人:econ_grad12345 提问时间:5/22/2023 更新时间:5/22/2023 访问量:34

问:

我正在尝试使用记录链接通过模糊匹配来合并数据集。我确信这两个数据集中都没有重复的唯一 ID。但是,我得到的错误是没有潜在的候选人。如何修复此错误?

这是我的代码

import pandas as pd 
import recordlinkage

reference_usa = pd.read_csv('all_reference_usa.csv', index_col='id')
oc_sample = pd.read_csv('oc_sample.csv', index_col='id', low_memory=False)

indexer = recordlinkage.Index()
indexer.block(left_on='state', right_on='state')
candidates = indexer.index(reference_usa, oc_sample)
print(len(candidates))

compare = recordlinkage.Compare()
compare.exact('state', 'state', label='state')
compare.string('companyname',
            'name',
            threshold=0.95,
            label='company')
features = compare.compute(candidates, reference_usa,
                        oc_sample)

这是错误

/Users/anaconda3/lib/python3.10/site-packages/recordlinkage/algorithms/string.py:55: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

python pandas 记录链接 候选密钥

评论


答: 暂无答案