提问人:alexanoid 提问时间:11/13/2023 最后编辑:cybersamalexanoid 更新时间:11/14/2023 访问量:48
Neo4j Cypher 查询执行优化 WHERE 条件
Neo4j Cypher query performnce optimization with WHERE condition
问:
我有以下Cypher查询:
PROFILE
MATCH (childDStat:JobableStatistic {jobableId: childD.id})
WITH collect({`childDStat`:childDStat, `childD`:childD}) as childDStats
CALL apoc.cypher.mapParallel2(
" WITH _.childD as childD, _.childDStat as childDStat
WITH childD, childDStat
UNWIND childD.detailedCriterionIds as dCId
WITH childD, childDStat, dCId + coalesce(childDStat['replaceableCriterionIds.' + dCId],[]) as cGroup
WITH childD, childDStat, cGroup
WHERE NOT AlL(x IN cGroup WHERE x IN $zeroCriterionIds )
WITH childD, childDStat, collect(cGroup) as cGroups
WHERE size(cGroups) >= size(childD.detailedCriterionIds)
UNWIND cGroups as cGroup
WITH childD, childDStat, cGroup
WHERE ANY(x IN cGroup WHERE x IN $detailedCriterionIds)
WITH childD, childDStat, collect(cGroup) as cGroups
WHERE size(cGroups) > 1
RETURN childD, childDStat, cGroups ",
{`detailedCriterionIds`: [3, 5, 7, 8, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 28, 29, 30, 31, 33, 35, 36, 40, 42, 44, 45, 46, 47, 51, 54],
`zeroCriterionIds`: []},
childDStats, 6, 10)
YIELD value
WITH value.childD as childD, value.childDStat as childDStat, value.cGroups as cGroups WITH collect({`childDStat`:childDStat, `childD`:childD, `cGroups`:cGroups}) as childDStats
CALL apoc.cypher.mapParallel2(
" WITH _.childD as childD, _.childDStat as childDStat, _.cGroups as cGroups WITH childD, childDStat, size(cGroups) as cGroupsSize, cGroups
UNWIND cGroups as cGroup
WITH childD, childDStat, cGroupsSize, cGroup
UNWIND cGroup as cId WITH childD, childDStat, cGroupsSize, cGroup, cId, cGroup[0] as cG0
WITH childD, childDStat, cGroupsSize, cGroup, cId, cG0, childDStat['criterionAvgVoteWeights.' + cG0] as childDStatCriterionAvgVoteWeight,
childDStat['criterionExperienceMonths.' + cG0] as childDStatCriterionExperienceMonth,
$criterionAvgVoteWeights[toString(cId)] as criterionAvgVoteWeight, $criterionExperienceMonths[toString(cId)] as criterionExperienceMonth
WHERE
(childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL) AND
(childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)
WITH childD, childDStat, cGroupsSize, cG0, collect(cId) as cIds WITH childD, childDStat, cGroupsSize, collect(DISTINCT cG0 + cIds) as cGroups
WHERE size(cGroups) >= cGroupsSize
RETURN childD, childDStat, cGroups ",
{`detailedCriterionIds`: [3, 5, 7, 8, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 28, 29, 30, 31, 33, 35, 36, 40, 42, 44, 45, 46, 47, 51, 54],
`zeroCriterionIds`: [],
`criterionAvgVoteWeights`: {`51`:5.0, `8`:0.0, `33`:5.0, `21`:0.0, `31`:0.0, `26`:4.0, `14`:5.0, `36`:3.0, `46`:3.0, `12`:3.0, `18`:5.0, `28`:5.0, `16`:2.0, `7`:5.0, `40`:1.0, `5`:5.0, `44`:4.0, `3`:1.0, `54`:4.0, `20`:4.0, `42`:4.0, `30`:3.0, `15`:4.0, `47`:1.0, `35`:1.0, `13`:3.0, `45`:3.0, `23`:4.0, `29`:1.0},
`criterionExperienceMonths`: {`8`:109, `33`:8, `21`:184, `31`:14, `26`:100, `14`:157, `36`:140, `46`:123, `12`:85, `18`:96, `28`:116, `16`:15, `7`:63, `40`:56, `5`:166, `44`:101, `3`:129, `42`:84, `20`:102, `30`:173, `15`:97, `47`:54, `13`:91, `35`:137, `45`:119, `23`:162, `29`:97}
},
childDStats, 6, 10)
YIELD value
RETURN value.childD.id
以下条件占用了大部分查询执行时间:
WHERE
(childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL) AND
(childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)
有了这个条件,查询就可以工作,没有它。~1000ms
5ms
为了提高查询性能,我是否可以对此条件做些什么?
附言
这只是一个测试查询。实际查询对每个值使用参数。
答:
1赞
cybersam
11/14/2023
#1
在:(childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL)
- 你应该先测试。这样,如果是 ,那么您就不会浪费时间使用表达式中的值进行测试(将被视为 ),并且您也不需要测试 。如果值经常是 .
criterionAvgVoteWeight IS NULL
NULL
NULL
childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight
false
childDStatCriterionAvgVoteWeight = 0
NULL
- 如果始终为 >= 0(当它不是 时),则可以完全消除测试。
criterionAvgVoteWeight
NULL
childDStatCriterionAvgVoteWeight = 0
简而言之,您可以考虑改用以下表达式:.(criterionAvgVoteWeight IS NULL OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight)
类似的注意事项也适用于 。(childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)
这些更改可能会节省一些时间,具体取决于您的实际数据特征,但筛选不是免费的。
评论
0赞
alexanoid
11/14/2023
谢谢你的回答。我是否需要以下字段的索引(这些字段是筛选逻辑中条件的一部分)- ?我希望不是,因为我可能有多达 100k 个这样的不同领域childDStat['criterionAvgVoteWeights.' + cG0]
1赞
cybersam
11/14/2023
在逻辑的这一点上,索引没有任何帮助为时已晚。您必须使用顶部的索引 ()。此外,正如您所说,您需要大量索引,这是无法支持的。MATCH ... WHERE ...
评论