提问人:Arafat Absi 提问时间:6/20/2023 更新时间:6/20/2023 访问量:28
将代码从 Pandas 重构为 Vaex |loc 在 pandas howerver 中很有用,不能在 Vaex 中使用
refactoring code from pandas into vaex | loc was usefull in pandas howerver cannot be used in vaex
问:
我正在努力让我的代码正常工作,这是在 pandas 中写的,现在我正在使用 vaex 重构它,因为 vaex 中不存在 howerver loc()。谁能帮我解决这个问题!
想法:旨在通过减去end_time – conversation_time(整数,需要转换为秒)来替换start_time列中的缺失值
start_time ,conversation_time, end_time;
2023-06-01 19:14:42, 112,2023-06-01 19:16:34;
2023-06-01 19:16:33, 0,2023-06-01 19:16:33;
2023-06-01 19:11:44, 290,2023-06-01 19:16:34;
, 0,2023-06-01 19:16:32;
2023-06-01 19:16:33, 0,2023-06-01 19:16:33;
2023-06-01 19:16:07, 26,2023-06-01 19:16:33;
, 116,2023-06-01 19:16:33;
2023-06-01 19:16:33, 0,2023-06-01 19:16:33;
2023-06-01 19:16:32, 0,2023-06-01 19:16:32;
, 217,2023-06-01 19:00:01
使用 pandas 的旧代码工作正常
# Convert conversation_time to numeric
DF['conversation_time'] = pd.to_numeric(DF['conversation_time'])
# Convert end_time to datetime
DF['end_time'] = pd.to_datetime(DF['end_time'], format='%Y-%m-%d %H:%M:%S')
# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.loc[DF.start_time == ' ', ['end_time', 'conversation_time']]
# For empty start_times, calculate start_time by subtracting conversation_time from end_time
DF.loc[DF.start_time == ' ', 'start_time'] = [str(data[0] - pd.Timedelta(seconds=data[1])) for data in end_conv.values]
使用 vaex
DF = vx.read_csv('data.csv', sep=',', header=None)
# Function to convert to datetime
def convert_to_datetime(date_string):
return np.datetime64(datetime.strptime(str(date_string), '%Y-%m-%d %H:%M:%S'))
# Convert end_time to datetime
DF['end_time'] = DF['end_time'].astype(str).apply(convert_to_datetime)
# Filter to get end_time and conversation_time for rows where start_time is empty
end_conv = DF.filter(DF['start_time'] == ' ')['end_time', 'conversation_time']
# For empty start_times, calculate start_time by subtracting conversation_time from end_time
DF['start_time'] = DF['start_time'].apply(lambda x: [row['end_time'] - np.timedelta64(1, 's') * row['conversation_time'] for index,row in end_conv.iterrows()] if x == ' ' else x
我尝试了很多方法,终于得到了之前提供的代码行
答: 暂无答案
评论