如何在 Python 中展平嵌套列表，同时保留以逗号分隔的列表元素？-解网

问：

我想为track_ip字典的键平展我在下面创建的嵌套列表（或首先避免创建它），同时保留用逗号分隔的值。

我有一个数据集 df。我试图在一个名为 track_ip 的字典中跟踪source_ip和destination_ip。我还创建了一个列来表示特定source_ip自上次事件以来的时间，以及destination_ip与上次事件相比是否不同。

对于每个作为键的source_ip，我想要一个引用destination_ip的值列表（可以重复）。我想使用 append（）方法，但它不允许它（因为值是一个字符串），除非我将键的值包装在列表中。当我这样做时，我会得到一个嵌套的列表列表，然后我需要将其展平。如果我使用我使用的方法展平，我将无法保留用逗号分隔的值的元素。

以下是缩短的数据集：

df.head(10).to_dict('list')



{'source_ip': ['135.b1d10.d1c38.20',
  '135.0777d.04511.237',
  '135.0777d.04511.237',
  '135.b1d10.d1c38.119',
  '135.b1d10.13fe9.56',
  '135.b1d10.d1c38.72',
  '135.b1d10.d1c38.126',
  '135.0777d.04511.237',
  '135.0777d.04511.237',
  '135.0777d.04511.237'],
 'destination_ip': ['135.0777d.04511.237',
  '135.b1d10.13fe9.91',
  '135.b1d10.13fe9.71',
  '135.0777d.04511.237',
  '135.0777d.04511.237',
  '135.0777d.04511.237',
  '135.0777d.04511.237',
  '135.b1d10.d1c38.37',
  '135.b1d10.d1c38.112',
  '135.b1d10.d1c38.20'],
 'start_time': [1415749946,
  1415477729,
  1415702327,
  1415754478,
  1415749597,
  1415745508,
  1415754317,
  1415427333,
  1415584036,
  1415582789]}

这是我的一段代码：

import numpy as np
import pandas as pd

#import the dataframe
df = pd.read_csv('df.csv')

#loop through the data
df.loc[:, 'time_since_last'] = 0
df.loc[:, 'diff_destination_ip'] = 0

last_time = dict()
track_ip = dict()

for i,row in df.iterrows():
    if row['source_ip'] in last_time:
        #record delta time since last time under time_since_last
        df.loc[i,'time_since_last']=df.loc[i,'start_time']-last_time[row['source_ip']]
        #check if detination_ip was different for the source_ip and set value diff_destination_ip to 1
        if row['destination_ip'] not in track_ip[row['source_ip']]:
            df.loc[i,'diff_destination_ip'] = 1
    #record the current time as last time for the source_ip
    last_time[row['source_ip']] = row['start_time']
    #record destination_ip, if source_ip already present add the destination_ip to the list
    if row['source_ip'] in track_ip:
        track_ip[row['source_ip']] = [track_ip[row['source_ip']],row['destination_ip']]
        #flatten nested lists for track_ip[row['source_ip']]
        out = []
        for sublist in track_ip[row['source_ip']]:
            out.extend(sublist)
        track_ip[row['source_ip']] = out
        
    else:
        track_ip[row['source_ip']] = row['destination_ip']

我试图得到的是track_ip的输出，如下所示：

print(track_ip)

{'135.b1d10.d1c38.20': '135.0777d.04511.237', '135.0777d.04511.237': ['135.b1d10.13fe9.91', '135.b1d10.13fe9.71', '135.b1d10.d1c38.37', '135.b1d10.d1c38.112', '135.b1d10.d1c38.20'], '135.b1d10.d1c38.119': '135.0777d.04511.237', '135.b1d10.13fe9.56': '135.0777d.04511.237', '135.b1d10.d1c38.72': '135.0777d.04511.237', '135.b1d10.d1c38.126': '135.0777d.04511.237'}

实际数据集有 3.5 个 e5 行。我不能在track_ip中嵌套列表。

如果我使用我使用的方法展平，我会得到以下输出：

{'135.b1d10.d1c38.20': '135.0777d.04511.237', '135.0777d.04511.237': ['1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', '1', '3', 'f', 'e', '9', '.', '9', '1', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', '1', '3', 'f', 'e', '9', '.', '7', '1', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', 'd', '1', 'c', '3', '8', '.', '3', '7', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', 'd', '1', 'c', '3', '8', '.', '1', '1', '2', '1', '3', '5', '.', 'b', '1', 'd', '1', '0', '.', 'd', '1', 'c', '3', '8', '.', '2', '0'], '135.b1d10.d1c38.119': '135.0777d.04511.237', '135.b1d10.13fe9.56': '135.0777d.04511.237', '135.b1d10.d1c38.72': '135.0777d.04511.237', '135.b1d10.d1c38.126': '135.0777d.04511.237'}

如果我不使用扁平化方法，我将获得键“135.0777d.04511.237”的嵌套列表，如下所示：

{'135.b1d10.d1c38.20': '135.0777d.04511.237', '135.0777d.04511.237': [[[['135.b1d10.13fe9.91', '135.b1d10.13fe9.71'], '135.b1d10.d1c38.37'], '135.b1d10.d1c38.112'], '135.b1d10.d1c38.20'], '135.b1d10.d1c38.119': '135.0777d.04511.237', '135.b1d10.13fe9.56': '135.0777d.04511.237', '135.b1d10.d1c38.72': '135.0777d.04511.237', '135.b1d10.d1c38.126': '135.0777d.04511.237'}

Python Pandas 数据帧嵌套列表扁平化

如何在 Python 中展平嵌套列表，同时保留以逗号分隔的列表元素？

How to flatten a nested lists while retaining elements of the list separated by comma in Python?

评论