提问人:Anwar San 提问时间:7/23/2021 最后编辑:MT0Anwar San 更新时间:11/3/2023 访问量:476
如何在Python中将列中的数据拆分为一些单独的列?
How to split data in a column into some separate columns in Python?
问:
所以,我有一个数据框,下面给出:
import pandas as pd
df = pd.DataFrame(
{
"id": [8233037, 8233313],
"geometry": [
"{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}",
"{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}",
],
}
)
df
我想在og数据帧中分别使用一些单行字符串,例如[107.625764,-6.910353],[107.625871,-6.910358],拆分为107.625764,-6.910353。预期结果的详细信息如下图所示。预期结果
据我所知,我们可以应用str.split方法指定任何特定的分隔符。方法如下:
df[
["coordinate1", "coordinate2", "coordinate3", "coordinate4", "coordinate-n"]
] = df.geometry.str.split(
" ",
expand=True,
)
问题:我不知道我应该在str.split(“ ”)之后放置正确的分隔符。
如何操作数据帧列中的值,直到我得到预期的表,如下图所示?预期结果
答:
1赞
nay
7/23/2021
#1
我有一个纯python的解决方案。
首先,它是一个类似JSON的字符串,但是它有语法错误,键是单引号,JSON需要双引号,所以我用 .
然后我只需要格式化它以列出数据geometry
yaml
import yaml
df = pd.DataFrame({ 'id':[8233037,8233313],
'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]})
data = []
for _,row in df.iterrows():
id = row['id']
geo = yaml.load(row['geometry'])['coordinates']
geos = []
for g in geo:
geos += g
data += [[id,g[0],g[1]] for g in geos]
df_new = pd.DataFrame(data,columns=['id','latitude','longtitude'])
df_new
id latitude longtitude
0 8233037 107.612018 -6.921755
1 8233037 107.611888 -6.923030
2 8233037 107.611715 -6.924730
3 8233037 107.611715 -6.924890
4 8233037 107.611729 -6.925015
... ... ... ...
199 8233313 107.623747 -6.909832
200 8233313 107.625456 -6.910273
201 8233313 107.625764 -6.910353
202 8233313 107.625871 -6.910358
203 8233313 107.626035 -6.910264
204 rows × 3 columns
评论
0赞
Anwar San
7/24/2021
谢谢,这个简单的语法就是我所需要的,语法和你的解释都可以理解。我还不知道yaml,但我认为它真的很强大。你给出了简单的方法。谢谢你,它对我有用。
0赞
furas
7/23/2021
#2
geometry
看起来像字符串,所以首先我会使用模块将其从字符串转换为普通的列表/字典。而后期,您可以更简单地访问值。JSON
json
但这是不正确的,所以我可以使用模块JSON
dirtyjson
df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
print(df['data'])
或者(幸运的是)我可以替换为以获得正确的 JSON'
"
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
结果
0 {'type': 'MultiLineString', 'coordinates': [[[...
1 {'type': 'MultiLineString', 'coordinates': [[[...
Name: data, dtype: object
接下来我只得到coordinates
df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])
结果
0 [[[107.612018, -6.921755], [107.611888, -6.923...
1 [[[107.614077, -6.91033], [107.614837, -6.9100...
Name: coordinates, dtype: object
它是嵌套列表,所以我把它扁平化了
def flatten(row):
result = []
for item in row:
result += item
return result
df['coordinates'] = df['coordinates'].apply(flatten)
print(df['coordinates'])
或者我可以用 作为起始值sum()
[]
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
结果
0 [[107.612018, -6.921755], [107.611888, -6.9230...
1 [[107.614077, -6.91033], [107.614837, -6.91005...
Name: coordinates, dtype: object
现在我可以将每一对放在单独的行中explode
id
df = df.explode('coordinates')
print(df[['id', 'coordinates']])
结果
id coordinates
0 8233037 [107.612018, -6.921755]
0 8233037 [107.611888, -6.92303]
0 8233037 [107.611715, -6.92473]
0 8233037 [107.611715, -6.92489]
0 8233037 [107.611729, -6.925015]
.. ... ...
1 8233313 [107.623747, -6.909832]
1 8233313 [107.625456, -6.910273]
1 8233313 [107.625764, -6.910353]
1 8233313 [107.625871, -6.910358]
1 8233313 [107.626035, -6.910264]
[204 rows x 2 columns]
我可以用它来转换成两行Series
coordinates
df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df)
结果:
id lat long
0 8233037 107.612018 -6.921755
0 8233037 107.611888 -6.923030
0 8233037 107.611715 -6.924730
0 8233037 107.611715 -6.924890
0 8233037 107.611729 -6.925015
.. ... ... ...
1 8233313 107.623747 -6.909832
1 8233313 107.625456 -6.910273
1 8233313 107.625764 -6.910353
1 8233313 107.625871 -6.910358
1 8233313 107.626035 -6.910264
[204 rows x 3 columns]
完整的工作代码
import pandas as pd
import json
import dirtyjson
df = pd.DataFrame({
'id': [8233037, 8233313],
'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]
})
#df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
print(df['data'])
df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])
def flatten(row):
result = []
for item in row:
result += item
return result
#df['coordinates'] = df['coordinates'].apply(flatten)
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
print(df['coordinates'])
df = df.explode('coordinates')
print(df[['id', 'coordinates']])
df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df[['id', 'lat', 'long']])
评论
0赞
Anwar San
7/24/2021
非常详细!您解释的步骤是可以理解的。如果有任何相似之处,我将把这些步骤用于另一种情况。但是我用简单的方法得到了解决方案,用于解析刺痛中单引号的语法错误。我非常感谢您解释的详细步骤。yaml
评论
geometry
json