提问人:Hugoz13 提问时间:7/25/2023 最后编辑:Hugoz13 更新时间:7/26/2023 访问量:42
尝试根据列表值和 len 动态填充列
Trying to dynamically filled columns based on list values and len
问:
我一直在尝试尽可能地优化由两部分组成的数据操作。我正在处理一个 pandas 列,其中每个值都是具有不同长度或 NaN 值的列表,我想提取这些值以根据它们在列表中的位置将它们放在列上。
这是一个可复制的示例,以及我到目前为止编写的内容:
df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
'last_name': ['George', 'George2', 'George3'],
'banking_number': [['NaN'], ['UK421'], ['UK123', 'FR789']]}
)
len_banking_number = max(
df.loc[
df[
'banking_number'
].notna()
]['banking_number'].str.len()
)
len_list = list(
range(
1, len_banking_number + 1
)
)
for i in len_list:
df[f'bank_{i}'] = np.nan
你如何得到这个结果?
df_final = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
'last_name': ['George', 'George2', 'George3'],
'banking_number': [['NaN'], ['UK421'], ['UK123', 'FR789']],
'bank_1': ['NaN', 'UK', 'UK'],
'bank_2': ['NaN', 'NaN', 'FR']}
)
感谢您的时间和帮助
答:
1赞
Juliette B
7/26/2023
#1
您可以尝试以下操作:
import pandas as pd
df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
'last_name': ['George', 'George2', 'George3'],
'banking_number': [[None], ['UK421'], ['UK123', 'FR789']]}
)
# split the banking_number column
df[['bank_1','bank_2']] = pd.DataFrame(df['banking_number'].tolist(), index= df.index)
# keep only the first 2 characters (maybe not needed but I wanted to match your expected output)
def get_first_2_char(x):
x=x[0:2] if x else x
return x
for col in ['bank_1', 'bank_2']:
df[col] = df[col].apply(lambda x: get_first_2_char(x))
编辑
@itprorh86评论之后,下面是一个更新版本,可以处理不同数量的银行。
import pandas as pd
df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
'last_name': ['George', 'George2', 'George3'],
'banking_number': [[None], ['UK421'], ['UK123', 'FR789']]}
)
max_len = max(df['banking_number'].apply(lambda x: len(x)))
column_names = [f'bank_{i}' for i in range(1, max_len+1)]
# split the banking_number column
df[column_names] = pd.DataFrame(df['banking_number'].tolist(), index= df.index)
# keep only the first 2 characters
def get_first_2_char(x):
x=x[0:2] if x else x
return x
for col in column_names:
df[col] = df[col].apply(lambda x: get_first_2_char(x))
评论
0赞
Hugoz13
7/26/2023
您会说您的解决方案比第二个解决方案具有更好的扩展/性能配置文件吗?我很难判断。无论如何,非常感谢您对此进行调查!!
1赞
Juliette B
7/26/2023
我不认为迭代行是一件好事,但老实说,我没有花时间比较这两种解决方案!
0赞
itprorh66
7/26/2023
#2
这是一种与@JulietteB建议的方法略有不同的方法。
def parseDF(df_in: pd.DataFrame) -> pd.DataFrame:
key_cols = list(df_in.columns)
max_banks = max(list(len(x) for x in df_in['banking_number'].to_list()))
for nk in range(max_banks):
key_cols.append(f'bank_{nk+1}')
rslt = []
for r in range(df.shape[0]):
row = []
row.append(df.iloc[r]['first_name'])
row.append(df.iloc[r]['last_name'])
b_number = df.iloc[r]['banking_number']
print(b_number)
row.append(b_number)
for nw in range(max_banks):
if nw < len(b_number):
print(b_number[nw])
row.append(b_number[nw])
else:
row.append('NaN')
rslt.append(row)
print(rslt)
return pd.DataFrame(data= rslt, columns = key_cols)
这种方法将提供相同的答案,但具有处理不同数量的银行的优点
给定您的输入数据,执行将产生:parseDF(df)
first_name last_name banking_number bank_1 bank_2
0 Charles George [NaN] NaN NaN
1 Charles2 George2 [UK421] UK421 NaN
2 Charles3 George3 [UK123, FR789] UK123 FR789
评论