提问人:Mehran 提问时间:3/24/2023 最后编辑:Mehran 更新时间:3/25/2023 访问量:67
如何在 Python 中展平嵌套列表,同时保留以逗号分隔的列表元素?
How to flatten a nested lists while retaining elements of the list separated by comma in Python?
问:
我想为track_ip字典的键平展我在下面创建的嵌套列表(或首先避免创建它),同时保留用逗号分隔的值。
我有一个数据集 df。我试图在一个名为 track_ip 的字典中跟踪source_ip和destination_ip。我还创建了一个列来表示特定source_ip自上次事件以来的时间,以及destination_ip与上次事件相比是否不同。
对于每个作为键的source_ip,我想要一个引用destination_ip的值列表(可以重复)。我想使用 append() 方法,但它不允许它(因为值是一个字符串),除非我将键的值包装在列表中。当我这样做时,我会得到一个嵌套的列表列表,然后我需要将其展平。如果我使用我使用的方法展平,我将无法保留用逗号分隔的值的元素。
以下是缩短的数据集:
df.head(10).to_dict('list')
{'source_ip': ['135.b1d10.d1c38.20',
'135.0777d.04511.237',
'135.0777d.04511.237',
'135.b1d10.d1c38.119',
'135.b1d10.13fe9.56',
'135.b1d10.d1c38.72',
'135.b1d10.d1c38.126',
'135.0777d.04511.237',
'135.0777d.04511.237',
'135.0777d.04511.237'],
'destination_ip': ['135.0777d.04511.237',
'135.b1d10.13fe9.91',
'135.b1d10.13fe9.71',
'135.0777d.04511.237',
'135.0777d.04511.237',
'135.0777d.04511.237',
'135.0777d.04511.237',
'135.b1d10.d1c38.37',
'135.b1d10.d1c38.112',
'135.b1d10.d1c38.20'],
'start_time': [1415749946,
1415477729,
1415702327,
1415754478,
1415749597,
1415745508,
1415754317,
1415427333,
1415584036,
1415582789]}
这是我的一段代码:
import numpy as np
import pandas as pd
#import the dataframe
df = pd.read_csv('df.csv')
#loop through the data
df.loc[:, 'time_since_last'] = 0
df.loc[:, 'diff_destination_ip'] = 0
last_time = dict()
track_ip = dict()
for i,row in df.iterrows():
if row['source_ip'] in last_time:
#record delta time since last time under time_since_last
df.loc[i,'time_since_last']=df.loc[i,'start_time']-last_time[row['source_ip']]
#check if detination_ip was different for the source_ip and set value diff_destination_ip to 1
if row['destination_ip'] not in track_ip[row['source_ip']]:
df.loc[i,'diff_destination_ip'] = 1
#record the current time as last time for the source_ip
last_time[row['source_ip']] = row['start_time']
#record destination_ip, if source_ip already present add the destination_ip to the list
if row['source_ip'] in track_ip:
track_ip[row['source_ip']] = [track_ip[row['source_ip']],row['destination_ip']]
#flatten nested lists for track_ip[row['source_ip']]
out = []
for sublist in track_ip[row['source_ip']]:
out.extend(sublist)
track_ip[row['source_ip']] = out
else:
track_ip[row['source_ip']] = row['destination_ip']
我试图得到的是track_ip的输出,如下所示:
print(track_ip)
{'135.b1d10.d1c38.20': '135.0777d.04511.237', '135.0777d.04511.237': ['135.b1d10.13fe9.91', '135.b1d10.13fe9.71', '135.b1d10.d1c38.37', '135.b1d10.d1c38.112', '135.b1d10.d1c38.20'], '135.b1d10.d1c38.119': '135.0777d.04511.237', '135.b1d10.13fe9.56': '135.0777d.04511.237', '135.b1d10.d1c38.72': '135.0777d.04511.237', '135.b1d10.d1c38.126': '135.0777d.04511.237'}
实际数据集有 3.5 个 e5 行。我不能在track_ip中嵌套列表。
如果我使用我使用的方法展平,我会得到以下输出:
{'135.b1d10.d1c38.20': '135.0777d.04511.237', '135.0777d.04511.237': ['1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', '1', '3', 'f', 'e', '9', '.', '9', '1', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', '1', '3', 'f', 'e', '9', '.', '7', '1', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', 'd', '1', 'c', '3', '8', '.', '3', '7', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', 'd', '1', 'c', '3', '8', '.', '1', '1', '2', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', 'd', '1', 'c', '3', '8', '.', '2', '0'], '135.b1d10.d1c38.119': '135.0777d.04511.237', '135.b1d10.13fe9.56': '135.0777d.04511.237', '135.b1d10.d1c38.72': '135.0777d.04511.237', '135.b1d10.d1c38.126': '135.0777d.04511.237'}
如果我不使用扁平化方法,我将获得键“135.0777d.04511.237”的嵌套列表,如下所示:
{'135.b1d10.d1c38.20': '135.0777d.04511.237', '135.0777d.04511.237': [[[['135.b1d10.13fe9.91', '135.b1d10.13fe9.71'], '135.b1d10.d1c38.37'], '135.b1d10.d1c38.112'], '135.b1d10.d1c38.20'], '135.b1d10.d1c38.119': '135.0777d.04511.237', '135.b1d10.13fe9.56': '135.0777d.04511.237', '135.b1d10.d1c38.72': '135.0777d.04511.237', '135.b1d10.d1c38.126': '135.0777d.04511.237'}
答: 暂无答案
评论
df = pd.read_csv('df.csv')
df = pd.DataFrame(...)
df.head().to_dict('list')
track_ip = df.groupby("source_ip")["destination_ip"].agg(np.stack).to_dict()
track_ip.setdefault(row['source_ip'], []).append(row['destination_ip'])