collect() 在转换为行类型时将时间戳显示为“datetime.datetime”

collect() showing timestamp as "datetime.datetime" while converting to row type

提问人:Saswat Ray 提问时间:11/6/2023 更新时间:11/7/2023 访问量:57

问:

我有以下数据帧作为行类型(转换为 dataframe.collect())以进一步处理,但 它正在转换为 datetime.datetime(),我怎样才能摆脱 datetime.datetime 并只显示时间戳。

[Row(unit_id='00001597', country='ITA', gateway_id='8988228', first_hb_info=Row(hb_time=datetime.datetime(2022, 3, 15, 12, 56, 19), battery=60, ctrl=4, service=24, rssi=11, power=1, op_mode='IDL'), last_hb_info=Row(hb_time=datetime.datetime(2023, 11, 2, 20, 18, 21), battery=100, ctrl=4, service=27, rssi=10, power=1, op_mode='\x00\x00\x00'), last_op_mode_hb_info=Row(hb_time=datetime.datetime(2023, 11, 2, 20, 10, 21), battery=100, ctrl=4, service=24, rssi=10, power=1, op_mode='IDL')]

Python Azure 字典 datetime databricks

评论

0赞 JayashankarGS 11/7/2023
您的期望输出是多少?
0赞 Saswat Ray 11/7/2023
感谢您伸出援手。输出应为字符串格式。例如[{'hb_time': '2022-03-15 12:56:19', 'battery': '60', 'ctrl': '4', 'service': '24', 'rssi': '11', 'power': '1', 'op_mode': 'IDL'}, {'hb_time': '2023-11-02 20:18:21', 'battery': '100', 'ctrl': '4', 'service': '27', 'rssi': '10', 'power': '1', 'op_mode': '\x00\x00\x00'}.我开始知道to_json可以做,但是我已经嵌套了结构元素,所以看看如何使用结构类型处理该to_json。

答:

1赞 JayashankarGS 11/7/2023 #1

您得到的原因是嵌套列内的字段是 .datetime.datetimeTimestampType

您需要将其转换为字符串并进行收集。

下面是转换它的代码。

cols_to_alter = ['first_hb_info', 'last_hb_info', 'last_op_mode_hb_info']
struct_cols = ['hb_time', 'battery', 'ctrl', 'service', 'rssi', 'power', 'op_mode']

final_df = df_mod #your_current_dataframe.
for i in cols_to_alter:
    final_df = final_df.select('*',f'{i}.*').withColumn('hb_time',F.date_format('hb_time','yyyy-MM-dd HH:mm:ss'))
    final_df = final_df.drop(i).select('*',F.struct(*struct_cols).alias(i)).drop(*struct_cols)
final_df.collect()

输出:

[Row(unit_id='00001597', country='ITA', gateway_id='8988228',
 first_hb_info=Row(hb_time='2022-03-15 10:30:00', battery=60, ctrl=4, service=24, rssi=11, power=1, op_mode='IDL'), 
 last_hb_info=Row(hb_time='2022-03-15 10:30:00', battery=60, ctrl=4, service=24, rssi=11, power=1, op_mode='IDL'), 
 last_op_mode_hb_info=Row(hb_time='2022-03-15 10:30:00', battery=60, ctrl=4, service=24, rssi=11, power=1, op_mode='IDL'))]

如果你想在json中使用下面的代码。

final_df.toJSON().collect()

输出:

['{"unit_id":"00001597","country":"ITA","gateway_id":"8988228",
"first_hb_info":{"hb_time":"2022-03-15 10:30:00","battery":60,"ctrl":4,"service":24,"rssi":11,"power":1,"op_mode":"IDL"},
"last_hb_info":{"hb_time":"2022-03-15 10:30:00","battery":60,"ctrl":4,"service":24,"rssi":11,"power":1,"op_mode":"IDL"},
"last_op_mode_hb_info":{"hb_time":"2022-03-15 10:30:00","battery":60,"ctrl":4,"service":24,"rssi":11,"power":1,"op_mode":"IDL"}}']