在 Pandas Dataframe 中连接备用行（一个包含数据，一个包含一些 NaN）-解网

问：

我有一个从 PDF 阅读器获取的数据帧，因此数据读取有点混乱。

A列	B列	C列	D列	E 列
美国广播公司	5	10	你好	美国广播公司
DEF的	南	南	南	南
GHI公司	25	30	再见	不。
JKL公司	南	南	南	南

我看到的数据帧类型的示例。A 列由字符串组成。每个备用行都包含一个用于 A 列的字符串，但所有其他列包含一个 NaN。

我想得到以下信息：

A列	B列	C列	D列	E 列
ABCDEF公司	5	10	你好	美国广播公司
吉吉克勒	25	30	再见	不。

我尝试每 2 行和 2 行隔离一次，但没有获得所需的输出。agg(sum)

python pandas 数据帧字符串连接

评论

答：

0赞 mozway 9/22/2023 #1

使用自定义 groupby.agg 并假设 “Column B” 可用于标识组：

f = {col: ''.join if is_string_dtype(df[col]) else 'first'
     for col in df}
group = df['Column B'].notna().cumsum()

out = df.groupby(group, as_index=False).agg(f)

或者，在不缺少值的行上开始组：

group = df.notna().all(axis=1).cumsum()

或者，如果您确实有成对的行：

import numpy as np

group = np.arange(len(df))//2

输出：

  Column A  Column B  Column C Column D Column E
0   ABCDEF       5.0      10.0    Hello      ABC
1   GHIJKL      25.0      30.0      Bye      Lol

上一个：在 PowerShell 中连接字符串变量

下一个：字符串连接在 C 中使用 sizeof（）无法正常工作，但它适用于 strlen（） [duplicate]