提问人:skeetastax 提问时间:11/8/2023 更新时间:11/8/2023 访问量:98
如何按键名计算 python 字典中所有值的总和和平均值?
How do I calculate the sum and the average of all values in a python dict by key name?
问:
我有一个对象,我想从中计算特定命名的总和和平均值。dict
values
设置:
tree_str = """{
'trees': [
{
'tree_idx': 0,
'dimensions': (1120, 640),
'branches': [
'leaves': [
{'geometry': [[0.190673828125, 0.0859375], [0.74609375, 0.1181640625]]},
{'geometry': [[0.1171875, 0.1162109375], [0.8076171875, 0.15625]]}
],
'leaves': [
{'geometry': [[0.2197265625, 0.1552734375], [0.7119140625, 0.1943359375]]},
{'geometry': [[0.2060546875, 0.1923828125], [0.730712890625, 0.23046875]]}
]
]
}
]
}"""
tree_dict = yaml.load(tree_str, Loader=yaml.Loader)
哪里:
# assume for the sake of coding
{'geometry': ((xmin, ymin), (xmax, ymax))}
# where dimensions are relative to an image of a tree
现在我有了对象,我该怎么做:dict
- 得到所有的叶子?
count
- 得到和所有的叶子?
average width
average height
我可以使用以下命令访问值并遍历树:
tree_dict['trees'][0]['branches'][0]['leaves'][0]['geometry'][1][1]
所以我可以使用嵌套的for循环来做到这一点:
leafcount = 0
leafwidth = 0
leafheight = 0
sumleafwidth = 0
sumleafheight = 0
avgleafwidth = 0
avgleafheight = 0
for tree in tree_dict['trees']:
print("TREE")
for branch in tree['branches']:
print("\tBRANCH")
for leaf in branch['leaves']:
leafcount += 1
(lxmin, lymin), (lxmax, lymax) = leaf['geometry']
leafwidth = lxmax - lxmin
leafheight = lymax - lymin
print("\t\tLEAF: x1({}), y1({}), x2({}), y2({})\n\t\t\tWIDTH: {}\n\t\t\tHEIGHT: {}".format(lxmin, lymin, lxmax, lymax, leafwidth, leafheight))
sumleafwidth += lxmax - lxmin
sumleafheight += lymax - lymin
avgleafwidth = sumleafwidth / leafcount
avgleafheight = sumleafheight / leafcount
print("LEAVES\n\tCOUNT: {}\n\tAVERAGE WIDTH: {}\n\tAVERAGE HEIGHT: {}".format(leafcount, avgleafwidth, avgleafheight))
但是有没有更好的方法呢?
# psuedo code
leafcount = count(tree_dict['trees'][*]['branches'][*]['leaves'][*])
leaves = (tree_dict['trees'][*]['branches'][*]['leaves'][*])
sumleafwidth = sum(leaves[*]['geometry'][1][*]-leaves[*]['geometry'][0][*])
sumleafheight = sum(leaves[*]['geometry'][*][1]-leaves[*]['geometry'][*][0])
avgleafwidth = sumleafwidth / leafcount
avgleafheight = sumleafheight / leafcount
答:
2赞
Matmozaur
11/8/2023
#1
我认为,尽管 python dict 在大多数情况下可以用作树表示,但如果您想处理更高级的树相关任务,如上所述,它不是最好的数据结构。 python 中有很多树状结构的实现,例如 treelib。 您可以从 dict 移动到 Tree,例如:
def dict_to_tree(data, parent_node=None, tree=None):
if tree is None:
tree = Tree()
for key, value in data.items():
if isinstance(value, dict):
# Create a node for the key
tree.create_node(tag=key, identifier=key, parent=parent_node)
# Recursively call the function to process the sub-dictionary
dict_to_tree(value, parent_node=key, tree=tree)
else:
# Create a node for the key and value
tree.create_node(tag=f"{key}: {value}", identifier=key, parent=parent_node)
return tree
您应该能够在正确的数据结构上以更简单、更优雅的方式解决您的问题。
2赞
Corralien
11/8/2023
#2
可能不是你所期望的答案,但如果你对数据分析感到满意,你可以使用并重塑你的数据集。pandas
numpy
# pip install pandas
import pandas as pd
import numpy as np
# Build trees
branches = pd.json_normalize(tree_dict['trees'], 'branches', 'tree_idx')
leaves = pd.json_normalize(branches.pop('leaves')).melt(var_name='branch_idx', value_name='geometry', ignore_index=False)
trees = leaves.merge(branches, left_on='branch_idx', right_index=True)
# Extract geometry
geom = np.concatenate(trees.pop('geometry').str['geometry'].values).reshape(4, -1)
geom = pd.DataFrame(geom, columns=['x1', 'y1', 'x2', 'y2'], index=leaves.index)
trees = pd.concat([trees, geom], axis=1).sort_index().reset_index(names='leaf_idx')
# Width and Height
trees['width'] = trees['x2'] - trees['x1']
trees['height'] = trees['y2'] - trees['y1']
输出将是:
>>> trees
leaf_idx branch_idx tree_idx x1 y1 x2 y2 width height
0 0 0 0 0.190674 0.085938 0.746094 0.118164 0.555420 0.032227
1 0 1 0 0.117188 0.116211 0.807617 0.156250 0.690430 0.040039
2 1 0 0 0.219727 0.155273 0.711914 0.194336 0.492188 0.039062
3 1 1 0 0.206055 0.192383 0.730713 0.230469 0.524658 0.038086
其他用途:
# Average width
>>> trees['width'].mean()
0.565673828125
# Average height
>>> trees['height'].mean()
0.037353515625
# How many trees?
>>> trees['tree_idx'].nunique()
1
# How many branches?
>>> trees['branch_idx'].nunique()
2
# How many leaves?
>>> len(trees)
4
评论
0赞
skeetastax
11/8/2023
你的答案很好,尽管就我的目的来说可能有点矫枉过正,我还没有完全理解它。
1赞
Corralien
11/8/2023
@skeetastax。这就是为什么我说如果你对数据分析感到满意:-)。如果你的树很大,使用 Pandas 可以很好地替代矢量化计算。
1赞
Swifty
11/8/2023
#3
好的,这是另一个答案;虽然这不是绝对必要的,但我利用 numpy 的矢量化来同时对叶子的宽度和高度求和:
import numpy as np
leaves = [leaf for tree in tree_dict['trees'] for branch in tree['branches'] for leaf in branch['leaves']]
leafsums = sum(np.array(leaf['geometry'][1]) - np.array(leaf['geometry'][0]) for leaf in leaves)
print(f"LEAVES\n\tCOUNT: {len(leaves)}\n\tAVERAGE WIDTH: {leafsums[0]/len(leaves)}\n\tAVERAGE HEIGHT: {leafsums[1]/len(leaves)}")
评论
leaves = [leaf for tree in tree-dict['trees'] for branch in tree['branches'] for leaf in branch['leaves']]
leafcount = len(leaves)