如何按键名计算 python 字典中所有值的总和和平均值?

How do I calculate the sum and the average of all values in a python dict by key name?

提问人:skeetastax 提问时间:11/8/2023 更新时间:11/8/2023 访问量:98

问:

我有一个对象,我想从中计算特定命名的总和平均值dictvalues

设置:

tree_str = """{
    'trees': [
        {
            'tree_idx': 0,
            'dimensions': (1120, 640),
            'branches': [
                'leaves': [
                    {'geometry': [[0.190673828125, 0.0859375], [0.74609375, 0.1181640625]]},
                    {'geometry': [[0.1171875, 0.1162109375], [0.8076171875, 0.15625]]}
                ],
                'leaves': [
                    {'geometry': [[0.2197265625, 0.1552734375], [0.7119140625, 0.1943359375]]},
                    {'geometry': [[0.2060546875, 0.1923828125], [0.730712890625, 0.23046875]]}
                ]
            ]
        }
    ]
}"""

tree_dict = yaml.load(tree_str, Loader=yaml.Loader)

哪里:

# assume for the sake of coding
{'geometry': ((xmin, ymin), (xmax, ymax))}
# where dimensions are relative to an image of a tree

现在我有了对象,我该怎么做:dict

  1. 得到所有的叶子?count
  2. 得到和所有的叶子?average widthaverage height

我可以使用以下命令访问值并遍历树:

tree_dict['trees'][0]['branches'][0]['leaves'][0]['geometry'][1][1]

所以我可以使用嵌套的for循环来做到这一点:

leafcount = 0
leafwidth = 0
leafheight = 0
sumleafwidth = 0
sumleafheight = 0
avgleafwidth = 0
avgleafheight = 0

for tree in tree_dict['trees']:
    print("TREE")
    for branch in  tree['branches']:
        print("\tBRANCH")
        for leaf in branch['leaves']:
            leafcount += 1
            (lxmin, lymin), (lxmax, lymax) = leaf['geometry']
            leafwidth = lxmax - lxmin
            leafheight = lymax - lymin
            print("\t\tLEAF: x1({}), y1({}), x2({}), y2({})\n\t\t\tWIDTH: {}\n\t\t\tHEIGHT: {}".format(lxmin, lymin, lxmax, lymax, leafwidth, leafheight))
            sumleafwidth += lxmax - lxmin
            sumleafheight += lymax - lymin

avgleafwidth = sumleafwidth / leafcount
avgleafheight = sumleafheight / leafcount

print("LEAVES\n\tCOUNT: {}\n\tAVERAGE WIDTH: {}\n\tAVERAGE HEIGHT: {}".format(leafcount, avgleafwidth, avgleafheight))

但是有没有更好的方法呢?

# psuedo code
leafcount = count(tree_dict['trees'][*]['branches'][*]['leaves'][*])
leaves = (tree_dict['trees'][*]['branches'][*]['leaves'][*])
sumleafwidth = sum(leaves[*]['geometry'][1][*]-leaves[*]['geometry'][0][*])
sumleafheight = sum(leaves[*]['geometry'][*][1]-leaves[*]['geometry'][*][0])
avgleafwidth = sumleafwidth / leafcount
avgleafheight = sumleafheight / leafcount
python 字典 总和 平均 键值

评论

0赞 Swifty 11/8/2023
以下是应该适合伪代码的第二行的内容: ;当然还有leaves = [leaf for tree in tree-dict['trees'] for branch in tree['branches'] for leaf in branch['leaves']]leafcount = len(leaves)
0赞 TylerH 12/2/2023
这个问题被用作“已知良好”的审计项目,但它实际上目前是偏离主题的,因为它提出了多个问题,其中一个是基于意见的问题。请编辑您的问题,只问一件事,并确保可以客观地回答。如果您有自己的工作解决方案,请将其作为答案发布,而不是将其添加到问题中。提出诸如“有没有更好的方法”之类的基于意见的问题是偏离主题的。

答:

2赞 Matmozaur 11/8/2023 #1

我认为,尽管 python dict 在大多数情况下可以用作树表示,但如果您想处理更高级的树相关任务,如上所述,它不是最好的数据结构。 python 中有很多树状结构的实现,例如 treelib。 您可以从 dict 移动到 Tree,例如:

def dict_to_tree(data, parent_node=None, tree=None):
    if tree is None:
        tree = Tree()
    
    for key, value in data.items():
        if isinstance(value, dict):
            # Create a node for the key
            tree.create_node(tag=key, identifier=key, parent=parent_node)
            # Recursively call the function to process the sub-dictionary
            dict_to_tree(value, parent_node=key, tree=tree)
        else:
            # Create a node for the key and value
            tree.create_node(tag=f"{key}: {value}", identifier=key, parent=parent_node)

    return tree 

您应该能够在正确的数据结构上以更简单、更优雅的方式解决您的问题。

2赞 Corralien 11/8/2023 #2

可能不是你所期望的答案,但如果你对数据分析感到满意,你可以使用并重塑你的数据集。pandasnumpy

# pip install pandas
import pandas as pd
import numpy as np

# Build trees
branches = pd.json_normalize(tree_dict['trees'], 'branches', 'tree_idx')
leaves = pd.json_normalize(branches.pop('leaves')).melt(var_name='branch_idx', value_name='geometry', ignore_index=False)
trees = leaves.merge(branches, left_on='branch_idx', right_index=True)

# Extract geometry
geom = np.concatenate(trees.pop('geometry').str['geometry'].values).reshape(4, -1)
geom = pd.DataFrame(geom, columns=['x1', 'y1', 'x2', 'y2'], index=leaves.index)
trees = pd.concat([trees, geom], axis=1).sort_index().reset_index(names='leaf_idx')

# Width and Height
trees['width'] = trees['x2'] - trees['x1']
trees['height'] = trees['y2'] - trees['y1']

输出将是:

>>> trees
   leaf_idx  branch_idx tree_idx        x1        y1        x2        y2     width    height
0         0           0        0  0.190674  0.085938  0.746094  0.118164  0.555420  0.032227
1         0           1        0  0.117188  0.116211  0.807617  0.156250  0.690430  0.040039
2         1           0        0  0.219727  0.155273  0.711914  0.194336  0.492188  0.039062
3         1           1        0  0.206055  0.192383  0.730713  0.230469  0.524658  0.038086

其他用途:

# Average width
>>> trees['width'].mean()
0.565673828125

# Average height
>>> trees['height'].mean()
0.037353515625

# How many trees?
>>> trees['tree_idx'].nunique()
1

# How many branches?
>>> trees['branch_idx'].nunique()
2

# How many leaves?
>>> len(trees)
4

评论

0赞 skeetastax 11/8/2023
你的答案很好,尽管就我的目的来说可能有点矫枉过正,我还没有完全理解它。
1赞 Corralien 11/8/2023
@skeetastax。这就是为什么我说如果你对数据分析感到满意:-)。如果你的树很大,使用 Pandas 可以很好地替代矢量化计算。
1赞 Swifty 11/8/2023 #3

好的,这是另一个答案;虽然这不是绝对必要的,但我利用 numpy 的矢量化来同时对叶子的宽度和高度求和:

import numpy as np

leaves = [leaf for tree in tree_dict['trees'] for branch in tree['branches'] for leaf in branch['leaves']]
leafsums = sum(np.array(leaf['geometry'][1]) - np.array(leaf['geometry'][0]) for leaf in leaves)

print(f"LEAVES\n\tCOUNT: {len(leaves)}\n\tAVERAGE WIDTH: {leafsums[0]/len(leaves)}\n\tAVERAGE HEIGHT: {leafsums[1]/len(leaves)}")