提问人:spectre 提问时间:10/26/2023 最后编辑:XMehdi01spectre 更新时间:11/7/2023 访问量:51
根据一些自定义规则创建嵌套词典
Create nested dictionary based on some custom rules
问:
我有一个python字典,如下所示:
ip_dict = {
"img_folder/144-64ee3d9bb7-3.png": "COMMERCIAL PROPERTY ",
"img_folder/144-64ee3d9bb7-2.png": "CBIC COMMERCIAL ",
"img_folder/144-64ee3d9bb7-4.png": "CBIC COMMERCIAL GENERAL",
"img_folder/144-64ee3d9bb7-1.png": "Contractors Bonding",
"img_folder/144-64ee3d9bb7-5.png": "CBIC",
"img_folder/Excess-Liability-8.png": " Energy laswance ",
"img_folder/144-64ee3d9bb7-0.png": "CONTRACTORS BONDING AND INSURANCE ",
"img_folder/Excess-Liability-10.png": " FOLLOWING FORM",
"img_folder/Excess-Liability-14.png": " (2) property and",
"img_folder/Excess-Liability-0.png": " Energy ",
"img_folder/Excess-Liability-5.png": " The additional premium",
"img_folder/Excess-Liability-3.png": "Ein Enos asurance Maral",
"img_folder/Excess-Liability-4.png": " IV. Conditions ",
"img_folder/Excess-Liability-13.png": " FOLLOWING FORM ",
"img_folder/Excess-Liability-12.png": " FOLLOWING FORM EXCESS",
"img_folder/Excess-Liability-9.png": " Surplus Lines",
"img_folder/Excess-Liability-11.png": " ALL OTHER TERMS",
"img_folder/Excess-Liability-2.png": " Il. Limit of",
"img_folder/Excess-Liability-6.png": " (G) Notice of",
"img_folder/Excess-Liability-7.png": "Ss So Ss The ",
"img_folder/Excess-Liability-1.png": "eee ee ee"
}
它包含从 2 个不同的 pdf 文件(和 )的页面中提取的文本。我想将上面的字典转换为嵌套字典,其中全局键是pdf名称,嵌套字典与上面相同。因此,输出如下所示:144-64ee3d9bb7-3
Excess-Liability
op_dict = {
"144-64ee3d9bb7.png": {
"img_folder/144-64ee3d9bb7-3.png": "COMMERCIAL PROPERTY ",
"img_folder/144-64ee3d9bb7-2.png": "CBIC COMMERCIAL ",
"img_folder/144-64ee3d9bb7-4.png": "CBIC COMMERCIAL GENERAL",
"img_folder/144-64ee3d9bb7-1.png": "Contractors Bonding",
"img_folder/144-64ee3d9bb7-5.png": "CBIC",
"img_folder/144-64ee3d9bb7-0.png": "CONTRACTORS BONDING AND INSURANCE "
},
"Excess Liability.png": {
"img_folder/Excess Liability-8.png": " Energy laswance ",
"img_folder/Excess Liability-10.png": " FOLLOWING FORM",
"img_folder/Excess Liability-14.png": " (2) property and",
"img_folder/Excess Liability-0.png": " Energy ",
"img_folder/Excess Liability-5.png": " The additional premium",
"img_folder/Excess Liability-3.png": "Ein Enos asurance Maral",
"img_folder/Excess Liability-4.png": " IV. Conditions ",
"img_folder/Excess Liability-13.png": " FOLLOWING FORM ",
"img_folder/Excess Liability-12.png": " FOLLOWING FORM EXCESS",
"img_folder/Excess Liability-9.png": " Surplus Lines",
"img_folder/Excess Liability-11.png": " ALL OTHER TERMS",
"img_folder/Excess Liability-2.png": " Il. Limit of",
"img_folder/Excess Liability-6.png": " (G) Notice of",
"img_folder/Excess Liability-7.png": "Ss So Ss The ",
"img_folder/Excess Liability-1.png": "eee ee ee"
}
}
我尝试了以下逻辑,但它没有按预期工作:
op_dict = {}
for key, value in ip_dict.items():
doc_name = key.split("/")[-1]
if doc_name not in op_dict:
op_dict[doc_name] = {}
op_dict[doc_name][key] = value
任何帮助都是值得赞赏的!
答:
0赞
Marcin Mrugas
10/26/2023
#1
您还需要删除末尾的数字并在文件名中添加扩展名。
op_dict = {}
for key, value in ip_dict.items():
doc_name_with_number = key.split("/")[-1]
array_without_number = doc_name_with_number.split("-")[:-1]
doc_name = "-".join(array_without_number)
doc_name_with_extension = f"{doc_name}.png"
if doc_name_with_extension not in op_dict:
op_dict[doc_name_with_extension] = {}
op_dict[doc_name_with_extension][key] = value
0赞
krisstinkou
10/26/2023
#2
据我了解,您需要从文档名称中删除唯一编号。您可以按以下步骤操作(如果需要文件格式):
import re
op_dict = {}
for key, value in ip_dict.items():
doc_name = key.split("/")[-1]
doc_name = "".join(re.split(r"-\d+(\.\w+)$", doc_name))
if doc_name not in op_dict:
op_dict[doc_name] = {}
op_dict[doc_name][key] = value
在这种情况下,您将获得以下名称:144-64ee3d9bb7.png, Excess-Liability.png
或者,如果您只需要名称(没有文件格式)
import re
op_dict = {}
for key, value in ip_dict.items():
doc_name = key.split("/")[-1]
doc_name = re.split(r"-\d+\.\w+$", doc_name)[0]
if doc_name not in op_dict:
op_dict[doc_name] = {}
op_dict[doc_name][key] = value
在这种情况下,您将获得以下名称:144-64ee3d9bb7, Excess-Liability
评论