提问人:Ruan Carlo Weiers Britzke 提问时间:6/16/2023 更新时间:7/25/2023 访问量:54
如何逻辑地组合 pandas DataFrame 列上的数据以生成新的 DataFrame?
how to combine logically the data on columns of a pandas DataFrame to generate a new DataFrame?
问:
我制作了一个程序,它返回一个“归属表”DataFrame,该 DataFrame 的 Multigraph 边缘对电网进行建模。 每条线都是荷载和源之间的路径,列是将荷载连接到源的线的名称。
我编写的程序生成了一个输出 df,它看起来像这个,但要大得多。
import pandas as pd
belonging = pd.DataFrame({'A': {0: False, 1: False, 2: True, 3: True},
'B': {0: True, 1: True, 2: False, 3: False},
'C': {0: False, 1: True, 2: False, 3: True},
'D': {0: True, 1: False, 2: True, 3: False}})
>>>
A B C D E F
0 False False True False True True
1 False False True True False True
2 True True False False True True
3 True True False True False True
现在我需要生成一个“故障模式”表,该表给出的输出如下所示:
result = pd.DataFrame(
{'Failure Modes' : {0: 'F', 1: 'A // C', 2: "B // C", 3: "D // E"},
'Order of Failure' : {0: 1, 1: 2, 2: 2, 3: 2}
}
)
>>>
Failure Modes Order of Failure
0 F 1
1 A // C 2
2 B // C 2
3 D // E 2
如果列中的所有项都为 true,则根据列的布尔值构建失败模式表,则这是一阶 faliure。faliure 的第二阶尝试检查每两列的真值,但已发现是一阶的真值除外。
依此类推,n阶。跟。n <= len(belonging.columns)
描述它对我来说听起来比我用代码编写的要简单。先谢谢你。
答:
我将首先确定一阶列,然后使用 itertools.combinations
和 numpy.logical_xor
测试所有剩余列对,最后将结果与 pandas.concat
合并:
from itertools import combinations
first = belonging.columns[belonging.all()]
tmp = belonging.drop(columns=first)
out = pd.concat([
pd.DataFrame({'Failure Modes': first, 'Order of Failure': 1}),
pd.DataFrame({'Failure Modes': [f'{a}//{b}' for a,b in combinations(tmp, 2)
if np.logical_xor(tmp[a], tmp[b]).all()],
'Order of Failure': 2})
], ignore_index=True)
铌。该示例不明确,因此,如果您不需要独占 True
值,则可以使用 np.logical_or
代替 np.logical_xor
。
输出:
Failure Modes Order of Failure
0 F 1
1 A//C 2
2 B//C 2
3 D//E 2
评论
or
xor
要获得 n 阶故障模式,您需要构造每个列的 ,然后评估您的逻辑操作:powerset
import pandas as pd
from itertools import combinations
from numpy import logical_xor
belonging = pd.DataFrame({
'A': {0: False, 1: False, 2: True, 3: True},
'B': {0: False, 1: False, 2: True, 3: True},
'C': {0: True, 1: True, 2: False, 3: False},
'D': {0: False, 1: True, 2: False, 3: True},
'E': {0: True, 1: False, 2: True, 3: False},
'F': {0: True, 1: True, 2: True, 3: True},
})
def powerset(entities):
for i in range(len(entities)+1):
yield from combinations(entities, r=i)
failures = {}
for cols in powerset(belonging.columns):
if len(cols) == 0: continue
failures['//'.join(cols)] = {
'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
'order': len(cols)
}
test_df = pd.DataFrame.from_dict(failures, orient='index')
print(test_df)
failure order
A False 1
B False 1
C False 1
D False 1
E False 1
... ... ...
A//B//C//E//F False 5
A//B//D//E//F False 5
A//C//D//E//F True 5
B//C//D//E//F True 5
A//B//C//D//E//F False 6
[63 rows x 2 columns]
现在,我们已经对 powerset 的每个成员评估了我们的测试,我们可以使用一个简单的过滤操作来查找符合逻辑测试的成员:
print(
test_df.loc[lambda d: d['failure']]
)
failure order
F True 1
A//C True 2
B//C True 2
D//E True 2
A//B//F True 3
A//B//D//E True 4
A//C//D//E//F True 5
B//C//D//E//F True 5
为了考虑您在先前的顺序中遇到的列,我们可以按如下方式更新我们的 for 循环:
from collections import defaultdict
# same as the above code (data & powerset fn)
# ...
failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
if len(cols) == 0 or seen[len(cols)].issuperset(cols):
continue
failures['//'.join(cols)] = {
'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
'order': len(cols)
}
# updated current order seen data w/ previous order seen data
seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])
# update current order seen data w/ columns that exhibit a failure
if failures['//'.join(cols)]['failure']:
seen[len(cols)] = seen[len(cols)].union(cols)
# break out early once we have attributed failures to all columns
if len(seen[len(cols)]) == len(belonging.columns):
break
test_df = pd.DataFrame.from_dict(failures, orient='index')
print(test_df)
failure order
A False 1
B False 1
C False 1
D False 1
E False 1
F True 1
A//B False 2
A//C True 2
A//D False 2
A//E False 2
B//C True 2
B//D False 2
B//E False 2
C//D False 2
C//E False 2
D//E True 2
print(test_df.loc[lambda d: d['failure']])
failure order
F True 1
A//C True 2
B//C True 2
D//E True 2
评论
from collections import defaultdict
使用 @Cammeron Riddel 和 @Mozway 提出的解决方案,我得到了这个问题的解决方案:
bt = pd.DataFrame({
'a': [False, False, True,True],
'b': [False, False, True, True],
'c': [True, True, False, False],
'd': [False, True, False, True],
'e': [True, False, True, False],
'f': [True, True, True, True],
})
# bt = belonging_table
def power_set(entities):
for i in range(len(entities)+1):
yield from combinations(entities, r=i)
seen = defaultdict(set)
failures = pd.DataFrame(columns=['Oder'])
for columns in power_set(bt.columns):
order = len(columns)
last_order = order - 1
seen[order] = seen[order].union(seen[last_order])
if len(columns) == 0 or set(columns).intersection(seen[last_order]):
continue
if logical_xor.reduce(bt[list(columns)], axis=1).all():
failures.loc['//'.join([str(edge) for edge in columns])] = order
seen[order] = seen[order].union(columns)
if len(seen[order]) == len(bt.columns):
break
failures
Order
f 1
a//c 2
b//c 2
d//e 2
由于在具有多达 500 条不同路径的较大原点命运对中发现的错误,因此会发生变化
评论