提问人:Bakira 提问时间:1/8/2023 更新时间:1/8/2023 访问量:43
在列中找到特定字符串时,对子数据帧中的数据帧进行切片
Slice Dataframe in sub-dataframes when specific string in column is found
问:
假设我有数据帧 df,我想将其切成多个数据帧,并将每个数据帧存储在一个列表 (list_of_dfs) 中。
每个子数据帧应仅包含“Result”行。 当在“点”列中给出值“P1”时,在“X_Y”列中给出值“X”时,开始一个子数据帧。
我尝试了这个方法,首先找到每个“P1”的指示,然后使用“P1”的指示在列表推导中对整个数据帧进行切片。但是我收到一个包含两个空数据帧的列表。 有人可以建议吗? 谢谢!
import pandas as pd
df = pd.DataFrame(
{
"Step": (
"1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result",
"1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "Result", "Result", "Result", "Result", "Result"
),
"Point": (
"P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3",
"P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3", "P1", "P2", "P2", "P3", "P3",
),
"X_Y": (
"X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y",
"X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y", "X", "X", "Y", "X", "Y",
),
"Value A": (
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
),
"Value B": (
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72, 70, 68, 66.75, 68.08, 66.72,
),
}
)
dff = df.loc[df["Step"] == "Result"]
value = "P1"
tuple_of_positions = list()
result = dff.isin([value])
seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)
for col in columnNames:
rows = list(result[col][result[col] == True].index)
for row in rows:
tuple_of_positions.append((row, col))
length_of_one_df = (len(dff["Point"].unique().tolist()) * 2 ) - 1
list_of_dfs = [dff.iloc[x : x + length_of_one_df] for x in rows]
print(list_of_dfs)
答:
0赞
Mustafa Aydın
1/8/2023
#1
sub = df.query("Step == \"Result\"")
pivots = sub[["Point", "X_Y"]].eq(["P1", "X"]).all(axis=1)
out = [fr for _, fr in sub.groupby(pivots.cumsum())]
- 获取 Step 等于“Result”的帧的子集
- 检查哪些行有“P1”和“X”序列
- 给出一个 True/False 系列
- 它的累积总和决定了该组,因为“枢轴”(转弯)点将为 True,因为在数字上下文中 False == 0
- 遍历 GroupBy 对象会产生“group_label, sub_frame”对,我们从中提取sub_frames
获取
>>> out
[ Step Point X_Y Value A Value B
10 Result P1 X 70.00 70.00
11 Result P2 X 68.00 68.00
12 Result P2 Y 66.75 66.75
13 Result P3 X 68.08 68.08
14 Result P3 Y 66.72 66.72,
Step Point X_Y Value A Value B
25 Result P1 X 70.00 70.00
26 Result P2 X 68.00 68.00
27 Result P2 Y 66.75 66.75
28 Result P3 X 68.08 68.08
29 Result P3 Y 66.72 66.72]
中间人在哪里
>>> sub
Step Point X_Y Value A Value B
10 Result P1 X 70.00 70.00
11 Result P2 X 68.00 68.00
12 Result P2 Y 66.75 66.75
13 Result P3 X 68.08 68.08
14 Result P3 Y 66.72 66.72
25 Result P1 X 70.00 70.00
26 Result P2 X 68.00 68.00
27 Result P2 Y 66.75 66.75
28 Result P3 X 68.08 68.08
29 Result P3 Y 66.72 66.72
>>> pivots
10 True
11 False
12 False
13 False
14 False
25 True
26 False
27 False
28 False
29 False
dtype: bool
# groups
>>> pivots.cumsum()
10 1
11 1
12 1
13 1
14 1
25 2
26 2
27 2
28 2
29 2
dtype: int32
评论