Neo4j Cypher 查询执行优化 WHERE 条件

Neo4j Cypher query performnce optimization with WHERE condition

提问人:alexanoid 提问时间:11/13/2023 最后编辑:cybersamalexanoid 更新时间:11/14/2023 访问量:48

问:

我有以下Cypher查询:

PROFILE
MATCH (childDStat:JobableStatistic {jobableId: childD.id}) 
WITH collect({`childDStat`:childDStat, `childD`:childD}) as childDStats 
CALL apoc.cypher.mapParallel2(
  " WITH _.childD as childD, _.childDStat as childDStat 
    WITH childD, childDStat 
    UNWIND childD.detailedCriterionIds as dCId 
    WITH childD, childDStat, dCId + coalesce(childDStat['replaceableCriterionIds.' + dCId],[]) as cGroup 
    WITH childD, childDStat, cGroup 
    WHERE NOT AlL(x IN cGroup WHERE x IN $zeroCriterionIds ) 
    WITH childD, childDStat, collect(cGroup) as cGroups 
    WHERE size(cGroups) >= size(childD.detailedCriterionIds) 
    UNWIND cGroups as cGroup 
    WITH childD, childDStat, cGroup 
    WHERE ANY(x IN cGroup WHERE x IN $detailedCriterionIds) 
    WITH childD, childDStat, collect(cGroup) as cGroups 
    WHERE size(cGroups) > 1 
    RETURN childD, childDStat, cGroups ",
  {`detailedCriterionIds`: [3, 5, 7, 8, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 28, 29, 30, 31, 33, 35, 36, 40, 42, 44, 45, 46, 47, 51, 54],
   `zeroCriterionIds`: []},
  childDStats, 6, 10) 
YIELD value
WITH value.childD as childD, value.childDStat as childDStat, value.cGroups as cGroups WITH collect({`childDStat`:childDStat, `childD`:childD, `cGroups`:cGroups}) as childDStats 

CALL apoc.cypher.mapParallel2(
  " WITH _.childD as childD, _.childDStat as childDStat, _.cGroups as cGroups WITH childD, childDStat, size(cGroups) as cGroupsSize, cGroups 
    UNWIND cGroups as cGroup 
    WITH childD, childDStat, cGroupsSize, cGroup 
    UNWIND cGroup as cId WITH childD, childDStat, cGroupsSize, cGroup, cId, cGroup[0] as cG0

    WITH childD, childDStat, cGroupsSize, cGroup, cId, cG0, childDStat['criterionAvgVoteWeights.' + cG0] as childDStatCriterionAvgVoteWeight,
      childDStat['criterionExperienceMonths.' + cG0] as childDStatCriterionExperienceMonth, 
      $criterionAvgVoteWeights[toString(cId)] as criterionAvgVoteWeight, $criterionExperienceMonths[toString(cId)] as criterionExperienceMonth 
    WHERE 
      (childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL) AND
      (childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)

    WITH childD, childDStat, cGroupsSize, cG0, collect(cId) as cIds WITH childD, childDStat, cGroupsSize, collect(DISTINCT cG0 + cIds) as cGroups
    WHERE size(cGroups) >= cGroupsSize
    RETURN childD, childDStat, cGroups ",
  {`detailedCriterionIds`: [3, 5, 7, 8, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 28, 29, 30, 31, 33, 35, 36, 40, 42, 44, 45, 46, 47, 51, 54],
   `zeroCriterionIds`: [],
   `criterionAvgVoteWeights`: {`51`:5.0, `8`:0.0, `33`:5.0, `21`:0.0, `31`:0.0, `26`:4.0, `14`:5.0, `36`:3.0, `46`:3.0, `12`:3.0, `18`:5.0, `28`:5.0, `16`:2.0, `7`:5.0, `40`:1.0, `5`:5.0, `44`:4.0, `3`:1.0, `54`:4.0, `20`:4.0, `42`:4.0, `30`:3.0, `15`:4.0, `47`:1.0, `35`:1.0, `13`:3.0, `45`:3.0, `23`:4.0, `29`:1.0},
   `criterionExperienceMonths`: {`8`:109, `33`:8, `21`:184, `31`:14, `26`:100, `14`:157, `36`:140, `46`:123, `12`:85, `18`:96, `28`:116, `16`:15, `7`:63, `40`:56, `5`:166, `44`:101, `3`:129, `42`:84, `20`:102, `30`:173, `15`:97, `47`:54, `13`:91, `35`:137, `45`:119, `23`:162, `29`:97}
  },
  childDStats, 6, 10) 
YIELD value
RETURN value.childD.id

以下条件占用了大部分查询执行时间:

WHERE 
  (childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL) AND
  (childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)

有了这个条件,查询就可以工作,没有它。~1000ms5ms

为了提高查询性能,我是否可以对此条件做些什么?

附言

这只是一个测试查询。实际查询对每个值使用参数。

Neo4J 密码 查询优化

评论

0赞 Vincent Rupp 11/14/2023
请编辑您的代码,以避免阅读它所需的大量水平滚动。
0赞 alexanoid 11/14/2023
当然,编辑了查询
0赞 cybersam 11/14/2023
为了清楚起见,我进一步编辑了。

答:

1赞 cybersam 11/14/2023 #1

在:(childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL)

  • 你应该先测试。这样,如果是 ,那么您就不会浪费时间使用表达式中的值进行测试(将被视为 ),并且您也不需要测试 。如果值经常是 .criterionAvgVoteWeight IS NULLNULLNULLchildDStatCriterionAvgVoteWeight <= criterionAvgVoteWeightfalsechildDStatCriterionAvgVoteWeight = 0NULL
  • 如果始终为 >= 0(当它不是 时),则可以完全消除测试。criterionAvgVoteWeightNULLchildDStatCriterionAvgVoteWeight = 0

简而言之,您可以考虑改用以下表达式:.(criterionAvgVoteWeight IS NULL OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight)

类似的注意事项也适用于 。(childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)

这些更改可能会节省一些时间,具体取决于您的实际数据特征,但筛选不是免费的。

评论

0赞 alexanoid 11/14/2023
谢谢你的回答。我是否需要以下字段的索引(这些字段是筛选逻辑中条件的一部分)- ?我希望不是,因为我可能有多达 100k 个这样的不同领域childDStat['criterionAvgVoteWeights.' + cG0]
1赞 cybersam 11/14/2023
在逻辑的这一点上,索引没有任何帮助为时已晚。您必须使用顶部的索引 ()。此外,正如您所说,您需要大量索引,这是无法支持的。MATCH ... WHERE ...