提问人:Mainland 提问时间:10/31/2023 更新时间:10/31/2023 访问量:54
Python 数据帧将字符串列拆分为多个
Python dataframe split a string column into many
问:
我导入的数据以不规则的方式出现。
df =
# following data all in one column
1 CABATT CAR BATTERY VOLTAGE -10.0 200.0
2 CPTEMP CAR DAS PANEL TEMP C -10.0 200.0
3 CAPTMA CAR PANEL A TEMP C -10.0 200.0
205 SPPT4P SPEED INPUT 4 CYCLINDER 0.0 32000.0
# Slicing the first three digital numbers as SNO
print(df[df.columns[0]].str.extract('(?P<SNO>\S\d{3}\S)'))
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
预期输出:
SNo ABBRE DESCRIPTION MIN MAX
1 CABATT CAR BATTERY VOLTAGE -10.0 200.0
2 CPTEMP CAR DAS PANEL TEMP C -10.0 200.0
3 CAPTMA CAR PANEL A TEMP C -10.0 200.0
205 SPPT4P SPEED INPUT 4 CYCLINDER 0.0 32000.0
答:
1赞
Nick
10/31/2023
#1
您可以使用此正则表达式来提取数据:
^(?P<SNo>\d+)\s+(?P<ABBRE>\w+)\s+(?P<DESCRIPTION>.*?)\s+(?P<MIN>-?\d+(?:\.\d+)?)\s+(?P<MAX>-?\d+(?:\.\d+)?)$
它匹配:
^
:字符串的开头(?P<SNo>\d+)
:一些数字,分组捕获SNo
(?P<ABBRE>\w+)
:一定数量的单词字符,分组捕获ABBREV
(?P<DESCRIPTION>.*?) : a minimal number of characters, captured in group
描述'(?P<MIN>-?\d+(?:\.\d+)?)
:一个可能的负整数或浮点数,在组中捕获MIN
(?P<MAX>-?\d+(?:\.\d+)?)
:一个可能的负整数或浮点数,在组中捕获MAX
$
:字符串末尾
每个捕获组之间都由一定数量的空格分隔\s+
regex101 上的正则表达式演示
在 python 中:
out = df[df.columns[0]].str.extract(r'^(?P<SNo>\d+)\s+(?P<ABBRE>\w+)\s+(?P<DESCRIPTION>.*?)\s+(?P<MIN>-?\d+(?:\.\d+)?)\s+(?P<MAX>-?\d+(?:\.\d+)?)$')
输出:
SNo ABBRE DESCRIPTION MIN MAX
0 1 CABATT CAR BATTERY VOLTAGE -10.0 200.0
1 2 CPTEMP CAR DAS PANEL TEMP C -10.0 200.0
2 3 CAPTMA CAR PANEL A TEMP C -10.0 200.0
3 205 SPPT4P SPEED INPUT 4 CYCLINDER 0.0 32000.0
评论