提问人:Vaiva Petrikaite 提问时间:5/31/2023 最后编辑:KoedltVaiva Petrikaite 更新时间:5/31/2023 访问量:25
如何在pyspark中连接两个由两个inline()获得且没有公共列的表
How to join two tables in pyspark that were obtained by two inline() and have no common column
问:
我正在pyspark中解析JSON文件。该文件具有“许多分支”。我放了一个我感兴趣的分支的图。 我得到两个数据帧:
df1=df.select(F.expr("inline_outer(features.geometry)"))
df2=df.select(F.expr("inline_outer(features.properties"))
每个都有 10 行。我怎样才能加入它们以确保它们正确合并?它们没有公共列。 JSON 文件的结构如下
{features: [{"id":0,
"geometry":{"type" : "Polygon", "coordinates": "Polygon_1_coordinates"},
"properties":{"area_id":101, "date": "2022-01-01", "amount":1002 }
},
{"id":2,
"geometry":{"type" : "Polygon", "coordinates": "Polygon_2_coordinates"},
"properties":{"area_id":102, "date": "2022-06-05", "amount":33}
},
{"id":3,
"geometry":{"type" : "Polygon", "coordinates": "Polygon_3_coordinates"},
"properties":{"area:id":103, "date": "2022-08-05", "amount":12}
},
{"id":4,
"geometry":{"type" : "Polygon", "coordinates": "Polygon_4_coordinates"},
"properties":{"area:id":104, "date": "2021-06-05", "amount":7895}
}]}
我需要一个表格:包含列:“Area_ID”、“金额”、“坐标” [1]:https://i.stack.imgur.com/5NRon.png
答: 暂无答案
评论