递归广度遍历查询(最大深度),返回嵌套 JSON,过滤初始种子

Recursive Breadth Traversal Query (with max depth), Returning Nested JSON, Filtering the Initial Seed

提问人:Philippe Fanaro 提问时间:8/21/2023 最后编辑:NickPhilippe Fanaro 更新时间:8/27/2023 访问量:155

问:

(我最初在DBA StackExchange上问过这个问题,但没有得到太多活动)。

在我的数据库中,我有它们之间的表,创建了一个可能的稀疏图,其中包含岛屿:itemsconnections

CREATE TABLE IF NOT EXISTS items (
    --------------------------------------------------------
    id    UUID         NOT NULL DEFAULT uuid_generate_v4(),
    --------------------------------------------------------
    title VARCHAR(300) NOT NULL,
    --------------------------------------------------------
    CONSTRAINT     items_pk
       PRIMARY KEY (id)
    --------------------------------------------------------
);

CREATE TABLE IF NOT EXISTS connections (
    ---------------------------------------------------------------------
    id                  UUID         NOT NULL DEFAULT uuid_generate_v4(),
    ---------------------------------------------------------------------
    origin_item_id      UUID         NOT NULL,
    destination_item_id UUID         NOT NULL,
    ---------------------------------------------------------------------
    title               VARCHAR(300) NOT NULL,
    ---------------------------------------------------------------------
    CONSTRAINT     origin_item_fk
       FOREIGN KEY (origin_item_id)
        REFERENCES items (id),
    CONSTRAINT     destination_item_fk
       FOREIGN KEY (destination_item_id)
        REFERENCES items (id),
    CONSTRAINT     connections_pk
       PRIMARY KEY (id)
    ---------------------------------------------------------------------
);

我正在使用 ,我想从初始种子(例如 ExpressJS 上的 url 参数)进行广度遍历,直至一定深度(从我在文档中看到的内容来看,深度部分可能是最简单的)。node-pg

理想情况下,我希望以嵌套方式返回内容,但是,通过阅读这些文档(WITH 查询),我无法弄清楚如何返回,或者是否可能。

INSERT INTO items (title)
VALUES ('Software'), ('Pyongyang'), ('Burma'), ('Shenzhen'), ('Jerusalem'), ('Hostage');

INSERT INTO connections (origin_item_id, destination_item_id, title)
VALUES
    (
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 1'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Burma'),
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        'Same Author - Guy Delisle - 2'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 3'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Hostage'),
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        'Same Author - Guy Delisle - 4'
    ),

然后,递归查询将导致 for 和 — 请注意,人质软件都不应该出现 — :PyongyangmaxDepth = 2

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzhen",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 2",
            "origin_item": {
              "title": "Jerusalem"
            }
          }
        ]
      }
    },
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma"
      }
    }
  ]
}

(我不是 100% 确定我是否已经用这个例子涵盖了所有情况。

这种类型的嵌套在 PostgreSQL 中是否可能(比如说,或者 )?我该怎么做?jsonb_build_objectjsonb_agg

这是我能够做的:

WITH RECURSIVE connections_and_items AS (
    SELECT
         c.id AS connection_id,
         c.title AS connection_title,
         c.origin_item_id,
         i.title AS origin_item_title,
         c.destination_item_id,
        it.title AS destination_item_title

    FROM connections c
    JOIN items       i
      ON  i.id = c.origin_item_id
    JOIN items       it
      ON it.id = c.destination_item_id
),
connected_items AS (
    SELECT *
    FROM connections_and_items
    WHERE origin_item_id = '${itemId}'::UUID

    UNION 
        
    SELECT c.*
    FROM connections_and_items c
    JOIN connected_items       ci
      ON ci.destination_item_id = c.origin_item_id
      OR ci.origin_item_id      = c.destination_item_id
) 

SELECT 
    ci.connection_id, 
    ci.connection_title,
    jsonb_build_object(
        'id',    ci.origin_item_id,
        'title', ci.origin_item_title
    ) origin_item,
    jsonb_build_object(
        'id',    ci.destination_item_id,
        'title', ci.destination_item_title
    ) destination_item

FROM connected_items ci

这实际上确实在两个方向上成功地遍历了图形 (!但是,我无法添加约束(无法检测周期),并且嵌套仍然不存在。maxDepth

所以,总而言之,这是我所拥有的和我所缺少的:

  • ✔ 递归广度遍历
  • ✗ 最大深度
  • ✗ 嵌套 JSON
  • ✔ 筛选初始种子

引用

sql json 递归 节点-postgres

评论

0赞 0xts 8/24/2023
示例查询结果的示例与尝试对查询执行的操作有很大不同,任何一个格式化的结果都适合您吗?另外,您是否希望能够输入查询,或者只是希望查询尽可能深入而不提供任何深度?maxDepth
0赞 Philippe Fanaro 8/24/2023
你的意思是我能够提出的查询是错误的,还是只是格式?如果是格式,那么它确实丢失了,但这就是问题的核心。如果错了,你能简单描述一下问题所在吗?
0赞 Philippe Fanaro 8/24/2023
我宁愿能够将其输入为参数,而不是硬编码。我认为将其包装在PSQL函数中会更好,当我问这个问题时,我有点忘记了这一点。maxDepth
0赞 0xts 8/24/2023
数据中是否存在循环连接?
1赞 Nick 8/27/2023
你真的应该等到赏金期结束再奖励它。你永远不知道什么时候你会得到另一个,可能更好的答案。

答:

1赞 Adrian Maxwell 8/21/2023 #1

对于最大深度限制,请尝试以下逻辑(即 DIY 液位计数器)

WITH RECURSIVE t(depth) AS (
    SELECT 1
    UNION ALL
    SELECT depth+1 FROM t WHERE depth < 10 -- specify max depth here
)
SELECT n FROM t;

例如:

WITH RECURSIVE connections_and_items AS (
    SELECT
         c.id AS connection_id,
         c.title AS connection_title,
         c.origin_item_id,
         i.title AS origin_item_title,
         c.destination_item_id,
         it.title AS destination_item_title,
         1 AS depth,
         c.origin_item_id AS parent_id
    FROM connections c
    JOIN items       i
      ON  i.id = c.origin_item_id
    JOIN items       it
      ON it.id = c.destination_item_id
    WHERE c.origin_item_id = '${itemId}'::UUID

    UNION 
        
    SELECT 
        c.connection_id,
        c.connection_title,
        c.origin_item_id,
        c.origin_item_title,
        c.destination_item_id,
        c.destination_item_title,
        ci.depth + 1 AS depth,
        ci.connection_id AS parent_id
    FROM connections_and_items c
    JOIN connected_items       ci
      ON ci.destination_item_id = c.origin_item_id
      OR ci.origin_item_id      = c.destination_item_id
    WHERE ci.depth < ${maxDepth}
)

并且(未经测试)这可能对嵌套的 JSON 有所帮助:

SELECT 
    ci.connection_id, 
    json_build_object(
        'title', ci.connection_title,
        'origin_item', json_build_object(
            'id',    ci.origin_item_id,
            'title', ci.origin_item_title
        ),
        'destination_item', json_build_object(
            'id',    ci.destination_item_id,
            'title', ci.destination_item_title
        ),
        'children', json_agg(ci) FILTER (WHERE ci.parent_id = c.connection_id)
    ) connection
FROM connected_items ci
GROUP BY ci.connection_id,
    ci.connection_title,
    ci.origin_item_id,
    ci.origin_item_title,
    ci.destination_item_id,
    ci.destination_item_title

评论

0赞 Philippe Fanaro 8/21/2023
我想我确实尝试了你的建议。它不太管用,因为它只会在循环中迷失方向。然后我尝试第 7.8.2.2 节中的所有内容。 PostgreSQL 文档的循环检测,但没有成功。depth
2赞 p3consulting 8/24/2023 #2

你说“(我不是 100% 确定我是否已经用这个例子涵盖了所有情况)”:你是对的,当一个项目是多个连接的起点或目的地时,你试图实现的 JSON 格式是模棱两可的。

递归查询可以作为 JSON 的基础,如下所示: 主要区别在于层次结构遍历中没有混合节点项/连接,这是您尝试实现的格式的主要困难,即从项(而查询是连接驱动的)然后转到连接,然后返回项, 此外,您还有 2 个层次结构:“从”和“到”......

递归查询很容易导致以下类型的 JSON:主要是在递归查询之后,您需要另一个 CTE 来计算层次结构中的 LAG 和 LEAD 级别 + item 节点的 JSON,最终查询将添加 “children” 节点并使用 lead/lag 信息来正确关闭层次结构,例如关闭 “children” 节点所需的正确“]}”数量):

[
  {
    "conn_title" : "Same Author - Guy Delisle - 1",
    "org_title" : "Pyongyang",
    "dest_title" : "Shenzhen",
    "children" :
    [
      {
        "conn_title" : "Same Author - Guy Delisle - 2",
        "org_title" : "Burma",
        "dest_title" : "Pyongyang",
        "children" :
        [
          {
            "conn_title" : "Same Author - Guy Delisle - 1",
            "org_title" : "Pyongyang",
            "dest_title" : "Shenzhen"
          }
        ]
      },
      {
        "conn_title" : "Same Author - Guy Delisle - 3",
        "org_title" : "Jerusalem",
        "dest_title" : "Shenzhen",
        "children" :
        [
          {
            "conn_title" : "Same Author - Guy Delisle - 1",
            "org_title" : "Pyongyang",
            "dest_title" : "Shenzhen"
          },
          {
            "conn_title" : "Same Author - Guy Delisle - 4",
            "org_title" : "Hostage",
            "dest_title" : "Jerusalem",
            "children" :
            [
              {
                "conn_title" : "Same Author - Guy Delisle - 3",
                "org_title" : "Jerusalem",
                "dest_title" : "Shenzhen"
              }
            ]
          }
        ]
      }
    ]
  }
]

根据要求,代码,但不要忘记它是针对 ORACLE,而不是 PostgreSQL:

with items(id, title) as (
    select 1, 'Software' from dual union all
    select 2, 'Pyongyang' from dual union all
    select 3, 'Burma' from dual union all
    select 4, 'Shenzhen' from dual union all
    select 5, 'Jerusalem' from dual union all
    select 6, 'Hostage' from dual -- union all
),
connections(id, org_item_id, dest_item_id, title) as (
    select 1, 2, 4, 'Same Author - Guy Delisle - 1' from dual union all
    select 2, 3, 2, 'Same Author - Guy Delisle - 2' from dual union all
    select 3, 5, 4, 'Same Author - Guy Delisle - 3' from dual union all
    select 4, 6, 5, 'Same Author - Guy Delisle - 4' from dual -- union all
),
connections_and_items as (
    select 
        c.id as conn_id,
        c.title AS conn_title,
        c.org_item_id,
        ito.title AS org_title,
        c.dest_item_id,
        itd.title AS dest_title
    from connections c
    join items ito on ito.id = c.org_item_id
    join items itd on itd.id = c.dest_item_id
),
connected_items(lvl, parent_id,
    conn_id, conn_title, org_item_id, org_title, dest_item_id, dest_title
) as (
    SELECT 1, null,
        c.conn_id, c.conn_title, 
        c.org_item_id, c.org_title, 
        c.dest_item_id, c.dest_title
    FROM connections_and_items c
    WHERE org_item_id = 2
    UNION ALL   
    SELECT lvl+1, ci.conn_id,
        c.conn_id, c.conn_title, 
        c.org_item_id, c.org_title, 
        c.dest_item_id, c.dest_title
    FROM connections_and_items c
    JOIN connected_items ci
      ON 
        ci.org_item_id      = c.dest_item_id
      OR 
        ci.dest_item_id     = c.org_item_id
      OR 
        (ci.org_item_id     = c.org_item_id AND c.conn_id <> ci.conn_id)
      OR 
        (ci.dest_item_id     = c.dest_item_id AND c.conn_id <> ci.conn_id)
    where lvl < 10 -- set the max as desired
)
search depth first by conn_id set rn
CYCLE org_item_id, dest_item_id SET is_loop TO 1 DEFAULT 0
, rel_hier_with_leadlag AS (
  SELECT r.*
    , LAG(lvl) OVER(ORDER BY rn) AS lag_lvl
    , LEAD(lvl,1) OVER(ORDER BY rn) AS llead_lvl -- we need to know which the latest node
    , LEAD(lvl,1,1) OVER(ORDER BY rn) AS lead_lvl  -- for the latest node we need to use 1 instead of NULL
    , JSON_OBJECT(
      -- 'conn_id'         VALUE conn_id, 
      'conn_title'      VALUE conn_title,
      -- 'org_item_id'     VALUE org_item_id,
      'org_title'       VALUE org_title,
      -- 'dest_item_id'    VALUE dest_item_id,
      'dest_title'      VALUE dest_title
      ABSENT ON NULL
      RETURNING CLOB
    ) js
    FROM connected_items r WHERE is_loop = 0
)
select
  JSON_QUERY(   
     XMLCAST(   
        XMLAGG(
            XMLELEMENT(e,
                  CASE WHEN rn = 1 THEN '[' END ||
                  CASE
                    WHEN lvl - lag_lvl = 1 THEN ',"children":['    -- Level incremented by one, so child level, start array
                    WHEN lvl > 1 then ','                          -- appending when not first level
                    WHEN rn>1 AND parent_id IS NULL THEN ','      -- appending when a root node but not the first one
                  END ||
                  SUBSTR(js, 1, LENGTH(js) - 1) ||                  -- remove last brace, as we are controlling children
                  CASE
                    WHEN lvl >= lead_lvl then '}' ||               -- Level same or greater than next level, so close json_object
                         RPAD(' ', (lvl - lead_lvl) * 2 + 1, ']}') -- and add as many closing array / object blocks as required
                  END ||
                  CASE WHEN llead_lvl IS NULL THEN ']' END          -- when the latest node 
                )
                ORDER BY rn
            )
    AS CLOB
    )
  , '$' RETURNING CLOB PRETTY
) 
    as js
from rel_hier_with_leadlag ;

评论

2赞 p3consulting 8/24/2023
好吧,对我来说,这不是“新”代码:我在另一个上下文中使用它(“算法”)来满足我自己的需求......此外,我不知道将其从 ORACLE 转换为 PostgreSQL 所需的工作量......(它在最终的 SELECT 中使用 JSON_OBJECT、JSON_QUERY、XMLCAST、XMLAGG、XMLELEMENT,并且在递归 CTE 上使用 SEARCH DEPTH FIRST 和 CYCLE),更不用说调整它以输出所需的 JSON 格式(我怀疑这部分的可行性)dbfiddle.uk/xcEdeBnI
0赞 Philippe Fanaro 8/25/2023
@p3consulting将代码放在答案的链接中,请
1赞 0xts 8/26/2023 #3

回答有点晚了,但是,为了能够构建这样一个具有节点的分层结构,其中每个节点都可能与其他节点相连并且可能存在循环,因此必须跟踪什么在什么地方进行,正如@p3consulting所指出的那样 -

主要是在递归查询之后,需要另一个 CTE 来计算层次结构中的 LAG 和 LEAD 级别 + item 节点的 JSON,最终查询会添加 “children” 节点,并使用 lead/lag 信息正确关闭层次结构

为了做到这一点(跟踪什么去了哪里),我们必须将一些元数据附加到一行,或者计算它。我通过添加一些元数据来接近它,从本质上讲,哪一行是哪一行的父行,以及特定行存在的深度是多少。添加元数据后,我们可以使用它来增量构建层次结构。


需要注意的几件事

  • 此脚本的工作假设是,尽管数据中可能存在循环,但我们只希望显示层次结构中到达的所有节点,而没有任何循环。也就是说,在任何不同的深度都不会有相同的分层数据。我也会在答案中进一步讨论它。
  • 可以这么说,此脚本仅适用于此方案(当每个节点最多有 2 个连接时),并且不使用通用方法。但是,很容易从中获取提示以进一步扩展它。
  • 脚本片段是在说明之间添加的,单独使用时可能不起作用,因此请参阅最终脚本。
  • 这种方法看起来很肮脏。

方法

具有最大深度的递归广度遍历

为了进行具有最大深度的递归广度遍历,我认为最好始终向两个方向查看,以便我们可以将 OP 的脚本修改为这样 -

WITH RECURSIVE connections_and_items AS (
    SELECT
        c.id AS connection_id,
        c.title AS connection_title,
        c.origin_item_id AS origin_item_id,
        i.title AS origin_item_title,
        c.destination_item_id AS destination_item_id,
        it.title AS destination_item_title,
        1 AS depth,
        `${itemId}`::UUID AS parent,
        gen_random_uuid() AS child_of
    FROM connections c
    JOIN items i
     ON i.id = c.origin_item_id
    JOIN items it
     ON it.id = c.destination_item_id
), base_items AS (
    SELECT 
        *
    FROM connections_and_items cai
    WHERE origin_item_id = `${itemId}`::UUID OR destination_item_id = `${itemId}`::UUID
), connected_items AS (
    SELECT 
        *
    FROM base_items
    
    UNION
    
    SELECT
        cai.connection_id,
        cai.connection_title,
        cai.origin_item_id,
        cai.origin_item_title,
        cai.destination_item_id,
        cai.destination_item_title,
        ci.depth + 1 AS depth,
        CASE WHEN ci.origin_item_id <> cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS parent,
        CASE WHEN ci.origin_item_id = cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS child_of
    FROM connections_and_items cai, connected_items ci
    WHERE ci.depth < `${maxDepth}`
    AND ((ci.destination_item_id = cai.destination_item_id)
    OR (ci.origin_item_id = cai.origin_item_id))
    
)

哪里是查询的第一个结果,然后我们做一个递归联合查询。在这里,深度是记录将成为记录的子行的行,并且将成为记录中父行的行。就目前而言,这个定义可能看起来模棱两可,但稍后会变得更加清晰。base_itemsconnected_itemsdepthchild_ofUUIDparentUUID

构建初始对象记录

对于我们从查询中获得的所有行,我们将为它们构建一个初始对象,该对象将进入结果,到一个名为 的列中,添加另一个 CTE,利用以下事实:if is not equal to the 我们知道,这将是标题,反之亦然 -jsonbjson_connparentdestination_item_idorigin_item_title

initial AS (
    SELECT
    DISTINCT ON (connection_id)
             connection_title,
             depth,
             parent,
             child_of,
             jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.origin_item_title ELSE cr.destination_item_title END, 'connections', jsonb_agg(jsonb_build_object('title', cr.connection_title, CASE WHEN cr.parent <> cr.destination_item_id THEN 'destination_item' ELSE 'origin_item' END,jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.destination_item_title ELSE cr.origin_item_title END))) ) json_conn
    FROM connected_items cr GROUP BY cr.connection_id, cr.connection_title, cr.origin_item_id, cr.origin_item_title, cr.destination_item_id, cr.destination_item_title, cr.depth, cr.parent, cr.child_of
)

有了这个,我们将得到另一列,其中的值如下所示 -json_conn

 {
    "title": "Shenzen",
    "connections": [
      {
        "title": "Same Author - Guy Delisle - 3",
        "origin_item": {
          "title": "Jerusalem"
        }
      }
    ]
  }

描述目的地和起点之间的关系,保持节点扫描的顺序。

请注意,我们正在执行 DISTINCT ON (connection_id) 来删除任何重复的连接,从而删除循环,如果我们不添加这个,结果可能会变得出乎意料,因为只有信息来跟踪节点是深度和父节点,脚本会感到困惑。如果还需要添加所有重复周期的层次结构,则必须添加 FQP(完全限定路径)元数据来保留跟踪。

这样,我们将获得非重复连接,每个深度最多存在两次。

更新初始记录对象

我们的初始对象记录显示了父项和子项之间的关系,但是,我们没有父项来自何处的连接名称,因此我们将像这样添加它 -

intermediate AS (
SELECT * FROM (SELECT 
    i.depth, 
    i.parent, 
    i.child_of,
    CASE WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'origin_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'origin_item',i.json_conn)
    ELSE NULL
    END AS json_conn
FROM initial i INNER JOIN initial im ON i.child_of = im.parent AND (i.depth - 1) = im.depth) AS _intermediate WHERE json_conn != 'null' UNION SELECT DISTINCT ON (i.depth, i.parent) i.depth, i.parent, i.child_of, jsonb_set(i.json_conn, '{connections}', gc.json_conn) FROM initial i, (SELECT depth, parent, jsonb_agg(elems) AS json_conn FROM initial, jsonb_array_elements(initial.json_conn->'connections') AS elems WHERE initial.depth = 1 GROUP BY initial.depth, initial.parent) AS gc WHERE i.depth = 1)

我们在 CTE 上进行内部连接,条件是匹配 a 匹配 a,其中差值为 1,从那里,我们可以确定父项是作为 an 还是 a 扫描。我们还合并了我们为 1 提供的 2 条记录,因为此时我们可以,并且它不会造成任何伤害 这就是深度 1 处的记录的样子 -intialchild_ofparentdepthorigin_itemdestination_itemdepth

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma"
      }
    },
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzen"
      }
    }
  ]
}

这就是深度低于 1 的任何记录的外观,以及连接信息(将其与我们之前在initial) -

{
  "title": "Same Author - Guy Delisle - 1",
  "destination_item": {
    "title": "Shenzen",
    "connections": [
      {
        "title": "Same Author - Guy Delisle - 3",
        "origin_item": {
          "title": "Jerusalem"
        }
      }
    ]
  }
}

此时,所有行中的所有 ' 都已准备好按深度合并其子行,然后包装在一个数组中并添加到其父行的连接中。json_conn

我们可以从上到下或从下到上构建层次结构。如果我们使用前一个选项,我们将不得不操作大量嵌套数据,为了避免这种情况,我们将进行自下而上的构建。为此,我们必须从最大深度到最小深度并合并记录。

走向最终结果

我们将添加另一个递归 CTE,以自下而上构建层次结构,如下所示 -

max_depth AS (
    SELECT MAX(depth) max_depth FROM intermediate
),
final_result AS (
    
    SELECT i.*  FROM intermediate i, max_depth WHERE i.depth = max_depth
    
    UNION
    
    (SELECT
        fr.depth - 1,
        i.parent,
        i.child_of,
        CASE 
          WHEN fr.child_of = i.parent THEN
            CASE 
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.destination_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.destination_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.origin_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.origin_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
            END
          ELSE i.json_conn END AS json_conn
    FROM final_result fr, intermediate i WHERE (fr.depth - 1) = i.depth)
    
)

这看起来真的很讨厌,但是,我们要做的是,从底部开始,记录该记录,然后检查我们的 CTE 以查看深度差异为 1 的匹配项在哪里,对于列,检查成员资格与连接的位置并将其添加到那里。所有这些嵌套看起来都很糟糕,并且可以通过使用jsonb_path_exists来减少,我没有这样做是因为,嗯,懒惰。intermediatefinal_result.child_ofintermediate.parentjson_connCASE-WHEN-ELSE-END

通过此查询,我们将获得 2 的记录,其中所有子项都合并到需要合并的地方,我们需要做的就是将其聚合到一个数组中并替换我们之前合并的 1 条记录的值。一个简单的最后语句,得到合并的结果就可以了——depthconnectionsdepth

SELECT CASE WHEN max_depth > 1 THEN jsonb_set(i.json_conn, '{connections}', (SELECT jsonb_agg(json_conn) FROM final_result WHERE depth = 2)) ELSE i.json_conn END AS result FROM intermediate i, max_depth WHERE i.depth = 1 GROUP BY i.json_conn, max_depth;

请注意这里,当 为 1 时,final_result执行非常小的计算和结果,因为没有什么可以合并的,而且我们已经得到了结果,所以我们有条件地得到结果。CASEmax_depthmax_depthnullintermediate

完整脚本

您可以在此处找到完整的脚本。我无法在此处添加完整的脚本,因为 markdown 在代码格式方面给我带来了困难。如果您知道如何去做,请随时更新答案并在此处添加。该脚本为我的测试提供了硬编码的值,以及来自我的测试的值。itemIdmaxDepth

测试

在我的表格中有了这个循环数据——

// items table
                  id                  |   title   
--------------------------------------+-----------
 908b18ea-6a98-45a5-bdf9-1dcf7f97f35c | Software
 999c02fd-8112-4f01-864a-d29a423d1704 | Burma
 79cd129f-ffd7-468e-9551-2eca974a6308 | Shenzen
 cbf5e622-773b-4b42-8002-1be91cf4efda | Jerusalem
 bc5ae91c-83b3-4381-a73c-08ed43d0a770 | Hostage
 b7f06283-1ba0-4d79-8cda-6b3ee266d166 | Pyongyang

// connections table

                  id                  |            origin_item_id            |         destination_item_id          |             title             
--------------------------------------+--------------------------------------+--------------------------------------+-------------------------------
 be5e99f2-e272-4d3a-ac0b-183805901ad9 | b7f06283-1ba0-4d79-8cda-6b3ee266d166 | 79cd129f-ffd7-468e-9551-2eca974a6308 | Same Author - Guy Delisle - 1
 6a74d269-0573-4ad9-a66a-5e681032250e | 999c02fd-8112-4f01-864a-d29a423d1704 | b7f06283-1ba0-4d79-8cda-6b3ee266d166 | Same Author - Guy Delisle - 2
 1bf720a6-27f0-4fca-b4a0-0f4ff34965eb | bc5ae91c-83b3-4381-a73c-08ed43d0a770 | cbf5e622-773b-4b42-8002-1be91cf4efda | Same Author - Guy Delisle - 4
 8477cf0f-8a39-42d0-90bf-d9bfaa31a0a0 | cbf5e622-773b-4b42-8002-1be91cf4efda | 79cd129f-ffd7-468e-9551-2eca974a6308 | Same Author - Guy Delisle - 3
 01e3b341-0ca3-4049-abdf-0c4d1ab0aa52 | cbf5e622-773b-4b42-8002-1be91cf4efda | 999c02fd-8112-4f01-864a-d29a423d1704 | Same Author - Guy Delisle - 5
 d237bc88-1fa3-411f-baec-d003da193a51 | 999c02fd-8112-4f01-864a-d29a423d1704 | bc5ae91c-83b3-4381-a73c-08ed43d0a770 | Same Author - Guy Delisle - 6

并运行附加的脚本(深度 - 3),我们得到这个 -

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 6",
            "destination_item": {
              "title": "Hostage"
            }
          }
        ]
      }
    },
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzen",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 3",
            "origin_item": {
              "title": "Jerusalem",
              "connections": [
                {
                  "title": "Same Author - Guy Delisle - 5",
                  "destination_item": {
                    "title": "Burma"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

最终脚本

WITH RECURSIVE connections_and_items AS (
    SELECT
        c.id AS connection_id,
        c.title AS connection_title,
        c.origin_item_id AS origin_item_id,
        i.title AS origin_item_title,
        c.destination_item_id AS destination_item_id,
        it.title AS destination_item_title,
        1 AS depth,
        'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID AS parent,
        gen_random_uuid() AS child_of
    FROM connections c
    JOIN items i
     ON i.id = c.origin_item_id
    JOIN items it
     ON it.id = c.destination_item_id
), base_items AS (
    SELECT 
        *
    FROM connections_and_items cai
    WHERE origin_item_id = 'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID OR destination_item_id = 'b7f06283-1ba0-4d79-8cda-6b3ee266d166'::UUID
), connected_items AS (
    SELECT 
        *
    FROM base_items
    
    UNION
    
    SELECT
        cai.connection_id,
        cai.connection_title,
        cai.origin_item_id,
        cai.origin_item_title,
        cai.destination_item_id,
        cai.destination_item_title,
        ci.depth + 1 AS depth,
        CASE WHEN ci.origin_item_id <> cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS parent,
        CASE WHEN ci.origin_item_id = cai.origin_item_id THEN ci.destination_item_id ELSE ci.origin_item_id END AS child_of
    FROM connections_and_items cai, connected_items ci
    WHERE ci.depth < 3
    AND ((ci.destination_item_id = cai.destination_item_id)
    OR (ci.origin_item_id = cai.origin_item_id))
    
),
initial AS (
    SELECT
    DISTINCT ON (connection_id)
             connection_title,
             depth,
             parent,
             child_of,
             jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.origin_item_title ELSE cr.destination_item_title END, 'connections', jsonb_agg(jsonb_build_object('title', cr.connection_title, CASE WHEN cr.parent <> cr.destination_item_id THEN 'destination_item' ELSE 'origin_item' END,jsonb_build_object('title', CASE WHEN cr.parent <> cr.destination_item_id THEN cr.destination_item_title ELSE cr.origin_item_title END))) ) json_conn
    FROM connected_items cr GROUP BY cr.connection_id, cr.connection_title, cr.origin_item_id, cr.origin_item_title, cr.destination_item_id, cr.destination_item_title, cr.depth, cr.parent, cr.child_of
), intermediate AS (
SELECT * FROM (SELECT 
    i.depth, 
    i.parent, 
    i.child_of,
    CASE WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,destination_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'destination_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,0,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,0,title}'),'origin_item',i.json_conn)
    WHEN jsonb_build_object('title', i.json_conn #> '{title}') <@ (im.json_conn #> '{connections,1,origin_item}')::jsonb THEN jsonb_build_object('title',(im.json_conn #> '{connections,1,title}'),'origin_item',i.json_conn)
    ELSE NULL
    END AS json_conn
FROM initial i INNER JOIN initial im ON i.child_of = im.parent AND (i.depth - 1) = im.depth) AS _intermediate WHERE json_conn != 'null' UNION SELECT DISTINCT ON (i.depth, i.parent) i.depth, i.parent, i.child_of, jsonb_set(i.json_conn, '{connections}', gc.json_conn) FROM initial i, (SELECT depth, parent, jsonb_agg(elems) AS json_conn FROM initial, jsonb_array_elements(initial.json_conn->'connections') AS elems WHERE initial.depth = 1 GROUP BY initial.depth, initial.parent) AS gc WHERE i.depth = 1),
max_depth AS (
    SELECT MAX(depth) max_depth FROM intermediate
),
final_result AS (
    
    SELECT i.*  FROM intermediate i, max_depth WHERE i.depth = max_depth
    
    UNION
    
    (SELECT
        fr.depth - 1,
        i.parent,
        i.child_of,
        CASE 
          WHEN fr.child_of = i.parent THEN
            CASE 
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.destination_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.destination_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{destination_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{destination_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.origin_item.title') AND (i.json_conn @? '$.origin_item.connections[*].origin_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{origin_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,origin_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
              WHEN (fr.json_conn @? '$.destination_item.title') AND (i.json_conn @? '$.origin_item.connections[*].destination_item')
                 THEN
                  CASE
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,0,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,0}',fr.json_conn)
                    WHEN jsonb_build_object('title', fr.json_conn #> '{destination_item,title}') <@ (i.json_conn #> '{origin_item,connections,1,destination_item}')::jsonb
                      THEN jsonb_set(i.json_conn, '{origin_item,connections,1}',fr.json_conn)
                  END
            END
          ELSE i.json_conn END AS json_conn
    FROM final_result fr, intermediate i WHERE (fr.depth - 1) = i.depth)
    
) SELECT CASE WHEN max_depth > 1 THEN jsonb_set(i.json_conn, '{connections}', (SELECT jsonb_agg(json_conn) FROM final_result WHERE depth = 2)) ELSE i.json_conn END AS result FROM intermediate i, max_depth WHERE i.depth = 1 GROUP BY i.json_conn, max_depth;
1赞 Nick 8/27/2023 #4

晚才参加聚会,但我认为解决这个问题的最简单方法是使用递归函数。这可以采用起始节点的参数和要搜索的最大深度。此函数还具有可见节点的第三个输入,以防止沿树的任何分支循环:

CREATE OR REPLACE FUNCTION build_node(node INT, max_depth INT, seen INT[] default ARRAY[]::INT[])
RETURNS JSONB
AS $$
DECLARE 
  _conns JSONB;
BEGIN
  IF max_depth = 0 THEN
    RETURN jsonb_build_object(
      'title', (SELECT title FROM items WHERE id = node)
    );
  END IF;
  seen = ARRAY_APPEND(seen, node);
  SELECT jsonb_agg(json_build_object('title', title, 
    CASE WHEN origin_item_id = node THEN 'destination_item' ELSE 'origin_item' END,
    build_node(CASE WHEN origin_item_id = node THEN destination_item_id ELSE origin_item_id END, max_depth - 1, seen)
    )
  ) INTO _conns
  FROM connections
  WHERE origin_item_id = node AND NOT destination_item_id = ANY(seen)
     OR destination_item_id = node AND NOT origin_item_id = ANY(seen);
  IF jsonb_array_length(_conns) > 0 THEN
    RETURN jsonb_build_object(
      'title', (SELECT title FROM items WHERE id = node),
      'connections', _conns
    );
  ELSE
    RETURN jsonb_build_object(
      'title', (SELECT title FROM items WHERE id = node)
    );
  END IF;
END;
$$
LANGUAGE plpgsql

用法示例:

SELECT build_node(2, 2)

输出(针对示例数据):

{
    "title": "Pyongyang",
    "connections": [{
        "title": "Same Author - Guy Delisle - 1",
        "destination_item": {
            "title": "Shenzhen",
            "connections": [{
                "title": "Same Author - Guy Delisle - 3",
                "origin_item": {
                    "title": "Jerusalem"
                }
            }]
        }
    }, {
        "title": "Same Author - Guy Delisle - 2",
        "origin_item": {
            "title": "Burma"
        }
    }]
}

dbfiddle 上的演示

评论

0赞 Philippe Fanaro 8/28/2023
该死的。如果可以的话,我会为这个答案创建第二个赏金。非常感谢,@Nick!
0赞 Nick 8/28/2023
@PhilippeFanaro完全不用担心 - 我很高兴你发现它很有用。