如何在pyspark中连接两个由两个inline()获得且没有公共列的表

How to join two tables in pyspark that were obtained by two inline() and have no common column

提问人:Vaiva Petrikaite 提问时间:5/31/2023 最后编辑:KoedltVaiva Petrikaite 更新时间:5/31/2023 访问量:25

问:

我正在pyspark中解析JSON文件。该文件具有“许多分支”。我放了一个我感兴趣的分支的图。 我得到两个数据帧:

df1=df.select(F.expr("inline_outer(features.geometry)"))
df2=df.select(F.expr("inline_outer(features.properties"))

每个都有 10 行。我怎样才能加入它们以确保它们正确合并?它们没有公共列。 JSON 文件的结构如下

{features: [{"id":0, 
         "geometry":{"type" : "Polygon", "coordinates": "Polygon_1_coordinates"},
         "properties":{"area_id":101, "date": "2022-01-01", "amount":1002 }
        },
        {"id":2, 
         "geometry":{"type" : "Polygon", "coordinates": "Polygon_2_coordinates"},
         "properties":{"area_id":102, "date": "2022-06-05", "amount":33}
        },
        {"id":3, 
         "geometry":{"type" : "Polygon", "coordinates": "Polygon_3_coordinates"},
         "properties":{"area:id":103, "date": "2022-08-05", "amount":12}
        },
        {"id":4, 
         "geometry":{"type" : "Polygon", "coordinates": "Polygon_4_coordinates"},
         "properties":{"area:id":104, "date": "2021-06-05", "amount":7895}
        }]}

我需要一个表格:包含列:“Area_ID”、“金额”、“坐标” [1]:https://i.stack.imgur.com/5NRon.png

JSON的 pyspark

评论

0赞 notNull 5/31/2023
添加您的预期输出..!

答: 暂无答案