提问人:Fazli 提问时间:10/11/2021 更新时间:10/11/2021 访问量:566
Python 根据条件从列中获取数据
Python get data from columns based on condition
问:
给定一个数据帧,我想检查 DS1.ColA 或 DS1。ColB 包含“Type 1”,如果有,我想插入相应的 DS1。Val 到列 Value。DS2 也是如此,检查 DS2 是否如此。ColA 或 DS2。ColB 包含“Type 1”,如果有,我想插入相应的 DS2。Val 到列 Value。
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB03', 'AB04', 'AB05','AB06'],
'DS1.ColA': ["Type 1","Undef",np.nan,"Undef",
"Type 1", ""],
'DS1.ColB': ["N","Type 1","","",
"Y", np.nan],
'DS1.Val': [85,87,18,94,
81, 54],
'DS2.ColA': ["Type 1","Undef","Type 1","Undef",
"Type 1", ""],
'DS2.ColB': ["N","Type 2","","",
"Y", "Type 1"],
'DS2.Val': [45,98,1,45,66,36]
}
)
var_check = "Type 1"
ds1_col_check = ["DS1.ColA","DS1.ColB","DS1.Val"]
ds2_col_check = ["DS2.ColA","DS2.ColB","DS2.Val"]
ds1_col_check 和 ds2_col_check 的最后一个元素始终是要放置在新列中的元素(列表中可能有更多列要检查)。最终结果 df 应如下所示。如何在 python 中实现这一点?
答:
3赞
jezrael
10/11/2021
#1
如果有多个列表是可能的,则可以创建列表,并且对于每个子列表测试是否匹配条件并将值设置为列,为了避免覆盖值,请使用 Series.fillna
:L
Value
var_check = "Type 1"
ds1_col_check = ["DS1.ColA","DS1.ColB","DS1.Val"]
ds2_col_check = ["DS2.ColA","DS2.ColB","DS2.Val"]
L = [ds1_col_check, ds2_col_check]
df['Value'] = np.nan
for val in L:
df.loc[df[val[:-1]].eq(var_check).any(axis=1), 'Value'] = df['Value'].fillna(df[val[-1]])
print (df)
ID DS1.ColA DS1.ColB DS1.Val DS2.ColA DS2.ColB DS2.Val Value
0 AB01 Type 1 N 85 Type 1 N 45 85.0
1 AB02 Undef Type 1 87 Undef Type 2 98 87.0
2 AB03 NaN 18 Type 1 1 1.0
3 AB04 Undef 94 Undef 45 NaN
4 AB05 Type 1 Y 81 Type 1 Y 66 81.0
5 AB06 NaN 54 Type 1 36 36.0
艺术
var_check = "Type 1"
ds1_col_check = ["DS1.ColA","DS1.ColB","DS1.Val"]
ds2_col_check = ["DS2.ColA","DS2.ColB","DS2.Val"]
df.loc[df[ds1_col_check[:-1]].eq(var_check).any(axis=1), 'Value'] = df[ds1_col_check[-1]]
df.loc[df[ds2_col_check[:-1]].eq(var_check).any(axis=1), 'Value'] = df['Value'].fillna(df[ds2_col_check[-1]])
0赞
sammywemmy
10/11/2021
#2
pyjanitor 有一个case_when实现,在这种情况下可能会有所帮助,可以抽象多个条件(在后台,它使用 pd。系列.mask):dev
# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn
# it has a syntax of
# condition, value,
# condition, value,
# more condition, value pairing,
# default if none of the conditions match
# column name to assign values to
# similar to a case when in SQL
df.case_when(
df['DS1.ColA'].str.contains('Type 1') | df['DS1.ColB'].str.contains('Type 1'), df['DS1.Val'],
df['DS2.ColA'].str.contains('Type 1') | df['DS2.ColB'].str.contains('Type 1'), df['DS2.Val'],
np.nan,
column_name = 'Value')
ID DS1.ColA DS1.ColB DS1.Val DS2.ColA DS2.ColB DS2.Val Value
0 AB01 Type 1 N 85 Type 1 N 45 85.0
1 AB02 Undef Type 1 87 Undef Type 2 98 87.0
2 AB03 NaN 18 Type 1 1 1.0
3 AB04 Undef 94 Undef 45 NaN
4 AB05 Type 1 Y 81 Type 1 Y 66 81.0
5 AB06 NaN 54 Type 1 36 36.0
上一个:基于子字符串拆分列表元素
评论