Python 数据帧将字符串列拆分为多个

Python dataframe split a string column into many

提问人:Mainland 提问时间:10/31/2023 更新时间:10/31/2023 访问量:54

问:

我导入的数据以不规则的方式出现。

df = 
# following data all in one column
1 CABATT   CAR BATTERY VOLTAGE  -10.0 200.0
2 CPTEMP   CAR DAS PANEL TEMP C -10.0  200.0
3 CAPTMA   CAR PANEL A TEMP C  -10.0   200.0

205 SPPT4P   SPEED INPUT 4 CYCLINDER  0.0  32000.0

# Slicing the first three digital numbers as SNO

print(df[df.columns[0]].str.extract('(?P<SNO>\S\d{3}\S)'))

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN

预期输出:

SNo ABBRE    DESCRIPTION                                                     MIN        MAX
1 CABATT   CAR BATTERY VOLTAGE                                              -10.0      200.0
2 CPTEMP   CAR DAS PANEL TEMP C                                             -10.0      200.0
3 CAPTMA   CAR PANEL A TEMP C                                                -10.0      200.0

205 SPPT4P   SPEED INPUT 4 CYCLINDER                                              0.0    32000.0
Python Pandas 正则表达式 数据帧

评论

0赞 Xukrao 10/31/2023
您的代码示例有点令人困惑,因为它同时包含 Python 代码和(我假设是)控制台输出。你能修改你的代码示例,使其只包含可运行的代码吗?

答:

1赞 Nick 10/31/2023 #1

您可以使用此正则表达式来提取数据:

^(?P<SNo>\d+)\s+(?P<ABBRE>\w+)\s+(?P<DESCRIPTION>.*?)\s+(?P<MIN>-?\d+(?:\.\d+)?)\s+(?P<MAX>-?\d+(?:\.\d+)?)$

它匹配:

  • ^:字符串的开头
  • (?P<SNo>\d+):一些数字,分组捕获SNo
  • (?P<ABBRE>\w+):一定数量的单词字符,分组捕获ABBREV
  • (?P<DESCRIPTION>.*?) : a minimal number of characters, captured in group 描述'
  • (?P<MIN>-?\d+(?:\.\d+)?):一个可能的负整数或浮点数,在组中捕获MIN
  • (?P<MAX>-?\d+(?:\.\d+)?):一个可能的负整数或浮点数,在组中捕获MAX
  • $:字符串末尾

每个捕获组之间都由一定数量的空格分隔\s+

regex101 上的正则表达式演示

在 python 中:

out = df[df.columns[0]].str.extract(r'^(?P<SNo>\d+)\s+(?P<ABBRE>\w+)\s+(?P<DESCRIPTION>.*?)\s+(?P<MIN>-?\d+(?:\.\d+)?)\s+(?P<MAX>-?\d+(?:\.\d+)?)$')

输出:

   SNo   ABBRE              DESCRIPTION    MIN      MAX
0    1  CABATT      CAR BATTERY VOLTAGE  -10.0    200.0
1    2  CPTEMP     CAR DAS PANEL TEMP C  -10.0    200.0
2    3  CAPTMA       CAR PANEL A TEMP C  -10.0    200.0
3  205  SPPT4P  SPEED INPUT 4 CYCLINDER    0.0  32000.0