Python pandas 根据条件和模式添加具有值的列-解网

问：

我的python pandas数据帧最初使用openpyxl引擎来处理excel处理，可以用简单的形式描述为

df1 = pd.DataFrame({"col1":["",99,88,np.nan,66,55,np.nan,11,22],"col2":['Catg0','Asset1','Other','Catg1','H & F','Large Item','Catg2','Fragile','Delicate item'],"col3":["",0,0,np.nan,99,155,np.nan,83,115]})

  col1           col2 col3
0               Catg0
1   99         Asset1    0
2   88          Other    0
3  NaN          Catg1  NaN
4   66          H & F   99
5   55     Large Item  155
6  NaN          Catg2  NaN
7   11        Fragile   83
8   22  Delicate item  115

当我尝试通过从其他列（col2）中透视值来进一步修改它时，当该行的其他列数据为空或 nan 时，通过从其他列（col2）中透视值来进一步修改它，直到下一个这样的条件满足
它应该在透视后删除该行

   col1   col4           col2  col3
0    99  Catg0         Asset1     0
1    88  Catg0          Other     0
2    66  Catg1          H & F    99
3    55  Catg1     Large Item   155
4    11  Catg2        Fragile    83
5    22  Catg2  Delicate item   115

import pandas as pd
import numpy as np
df1 = pd.DataFrame({"col1":["",99,88,np.nan,66,55,np.nan,11,22],"col2":['Catg0','Asset1','Other','Catg1','H & F','Large Item','Catg2','Fragile','Delicate item'],"col3":["",0,0,np.nan,99,155,np.nan,83,115]})
df1.insert(1, "col4", 'Catg')

我正在尝试找到添加这些基于模式或conditin的逻辑来填充“col4”并丢弃该行的方法

Python 熊猫透视

m = df1.drop('col2', axis=1).fillna('').eq('').all(axis=1)
df1.insert(1, "col4", df1['col2'].where(m).ffill())

out = df1[~m].reset_index(drop=True)
print (out)
  col1   col4           col2 col3
0   99  Catg0         Asset1    0
1   88  Catg0          Other    0
2   66  Catg1          H & F   99
3   55  Catg1     Large Item  155
4   11  Catg2        Fragile   83
5   22  Catg2  Delicate item  115

另一种方法是使用模式获取值 - 例如，这里以数字开头：col2Catg

m = df1['col2'].str.contains('^Catg\d+$')
df1.insert(1, "col4", df1['col2'].where(m).ffill())

out = df1[~m].reset_index(drop=True)
print (out)
  col1   col4           col2 col3
0   99  Catg0         Asset1    0
1   88  Catg0          Other    0
2   66  Catg1          H & F   99
3   55  Catg1     Large Item  155
4   11  Catg2        Fragile   83
5   22  Catg2  Delicate item  115

0赞 PaulS 11/15/2023 #2

另一种可能的解决方案是使用 cumsum 定义组，然后使用 groupby 和 apply 来修改分组的数据帧：

(df1.assign(grp = (df1['col3'].isna() | df1['col3'].eq('')).cumsum())
 .groupby('grp', as_index = False)
 .apply(lambda x: x.tail(len(x)-1)
        .assign(col4 = x.head(1)['col2'].tolist() * (len(x)-1)))
 .reset_index(drop = True)[['col1','col4','col2','col3']])

输出：

  col1   col4           col2 col3
0   99  Catg0         Asset1    0
1   88  Catg0          Other    0
2   66  Catg1          H & F   99
3   55  Catg1     Large Item  155
4   11  Catg2        Fragile   83
5   22  Catg2  Delicate item  115

Python pandas 根据条件和模式添加具有值的列

Python pandas add column with values based on condition and pattern

评论

评论