pandas:使用方法链接按切片分配列值

pandas: assign a column values by slice with method chaining

提问人:gregV 提问时间:8/22/2023 更新时间:8/23/2023 访问量:54

问:

在下面的玩具示例中,我尝试根据外部合并结果添加一个状态列。挑战在于保留 tom 博客中最好描述的链式方法。注释掉的行是我的尝试,但它不起作用

import pandas as pd

# Create sample data frames A and B
A = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
})

B = pd.DataFrame({
    'key': ['C', 'D', 'E', 'F'],
    'value': [3, 4, 5, 6]
})

# Merge data frames A and B on the 'key' column and add an indicator column
merged = pd.merge(A, B, on='key', how='outer', indicator=True)

# add a status column
#{'both':'no change',
 #'left_only': 'added',
 #'right_only': 'removed'}

merged = (merged
          .assign (status = 'no change')
          #.assign(status = lambda x: x.loc[x._merge == 'left_only'], 'added')
          .drop('_merge', axis=1)
          )
pandas 方法链接

评论


答:

2赞 sammywemmy 8/22/2023 #1

像这样的东西应该就足够了 - 通常对于切片,因为你要分配,你需要使用条件(、、等)mapnp.wherenp.selectpd.where

(A
.merge(B, on='key', how='outer', indicator=True)
.assign(status = lambda f: f._merge.map({"left_only":"added", 
                                          "both":"no change", 
                                          "right_only":"removed"}))
)
1赞 taller 8/22/2023 #2

添加以获取状态。DataFrame.apply

merged = (merged
          .assign(status = merged.apply(lambda x: 
                  'added' if x._merge == "left_only" else "", axis=1))
          .drop('_merge', axis=1)
          )
  key  value_x  value_y status                                                                
0   A      1.0      NaN  added                                                                
1   B      2.0      NaN  added                                                                
2   C      3.0      3.0                                                                       
3   D      4.0      4.0                                                                       
4   E      NaN      5.0                                                                       
5   F      NaN      6.0   
1赞 Scott Boston 8/22/2023 #3

这里有一种方法,可以在一行中使用“walrus”运算符,使用预定义的字典,并将列名更改为字符串::=mapindicatormerge

import pandas as pd

# Create sample data frames A and B
A = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
})

B = pd.DataFrame({
    'key': ['C', 'D', 'E', 'F'],
    'value': [3, 4, 5, 6]
})

d = {'both':'no_change',
     'left_only':'added',
     'right_only':'removed'}

merged = (merged_out:=pd.merge(A, B, on='key', how='outer', indicator='status'))\
            .assign(status=merged_out['status'].map(d))

merged 

输出:

  key  value_x  value_y     status
0   A      1.0      NaN      added
1   B      2.0      NaN      added
2   C      3.0      3.0  no_change
3   D      4.0      4.0  no_change
4   E      NaN      5.0    removed
5   F      NaN      6.0    removed