将代码从 Pandas 重构为 Vaex |loc 在 pandas howerver 中很有用,不能在 Vaex 中使用

refactoring code from pandas into vaex | loc was usefull in pandas howerver cannot be used in vaex

提问人:Arafat Absi 提问时间:6/20/2023 更新时间:6/20/2023 访问量:28

问:

我正在努力让我的代码正常工作,这是在 pandas 中写的,现在我正在使用 vaex 重构它,因为 vaex 中不存在 howerver loc()。谁能帮我解决这个问题!

想法:旨在通过减去end_timeconversation_time(整数,需要转换为秒)来替换start_time列中的缺失值

start_time   ,conversation_time,           end_time;
2023-06-01 19:14:42,        112,2023-06-01 19:16:34;
2023-06-01 19:16:33,          0,2023-06-01 19:16:33;
2023-06-01 19:11:44,        290,2023-06-01 19:16:34;
                   ,          0,2023-06-01 19:16:32;
2023-06-01 19:16:33,          0,2023-06-01 19:16:33;
2023-06-01 19:16:07,         26,2023-06-01 19:16:33;
                   ,        116,2023-06-01 19:16:33;
2023-06-01 19:16:33,          0,2023-06-01 19:16:33;
2023-06-01 19:16:32,          0,2023-06-01 19:16:32;
                   ,        217,2023-06-01 19:00:01

使用 pandas 的旧代码工作正常

# Convert conversation_time to numeric 
DF['conversation_time'] = pd.to_numeric(DF['conversation_time'])
# Convert end_time to datetime 
DF['end_time'] = pd.to_datetime(DF['end_time'], format='%Y-%m-%d %H:%M:%S')
# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.loc[DF.start_time == '                   ', ['end_time', 'conversation_time']]
# For empty start_times, calculate start_time by subtracting conversation_time from end_time 
DF.loc[DF.start_time == '                   ', 'start_time'] = [str(data[0] - pd.Timedelta(seconds=data[1])) for data in end_conv.values]

使用 vaex

DF = vx.read_csv('data.csv', sep=',', header=None)

# Function to convert to datetime
def convert_to_datetime(date_string):
    return np.datetime64(datetime.strptime(str(date_string), '%Y-%m-%d %H:%M:%S'))

# Convert end_time to datetime 
DF['end_time'] = DF['end_time'].astype(str).apply(convert_to_datetime)

# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.filter(DF['start_time'] == '                   ')['end_time', 'conversation_time']

# For empty start_times, calculate start_time by subtracting conversation_time from end_time 
DF['start_time'] = DF['start_time'].apply(lambda x:  [row['end_time'] - np.timedelta64(1, 's') * row['conversation_time'] for index,row in end_conv.iterrows()]  if x == '                   ' else x

我尝试了很多方法,终于得到了之前提供的代码行

Python Pandas 重构 Vaex

评论


答: 暂无答案