Mongodb大时间序列抽取/查询改进

Mongodb large time series decimation / query improvments

提问人:DaminouTav 提问时间:3/24/2022 最后编辑:DaminouTav 更新时间:3/27/2022 访问量:126

问:

我有一个传感器数据的时间序列集合,例如:

{"date":{"$date":"2018-01-01T00:00:05.045Z"},"metaField":{"assets":{"city":"Meylan","couleur":"BlougeCeyBienBlouge","country":"France","machine_name":"totomachine","monitoring_id":666},"input_name":"inputname1","results":{"head":{"dimension":1,"id":"HEAD","is_complex":0,"magnitude_key":"Acceleration","measure_status":2,"name":"FFT 1: FFT1: OvPwre [1]-Input 1","overall_level_status":0,"physical_quantity":"","result_type":1,"size":9,"unit_label":null,"unit_name":null,"version":5,"weighted_ol_status":0,"weighting_window":0},"info":{"global_level":0,"id":"INFO","info_mask":0,"tacho":0,"version":1,"weighted_global_level":0},"module_id":10,"process_id":21,"scal":{"id":"SCAL","imag":0,"version":2}}},"_id":{"$oid":"623b3c53f492b4b87a47f0d6"},"value":51}

{"date":{"$date":"2018-01-01T00:00:10.122Z"},"metaField":{"assets":{"city":"Meylan","couleur":"BlougeCeyBienBlouge","country":"France","machine_name":"totomachine","monitoring_id":666},"input_name":"inputname1","results":{"head":{"dimension":1,"id":"HEAD","is_complex":0,"magnitude_key":"Acceleration","measure_status":2,"name":"FFT 1: FFT1: OvPwre [1]-Input 1","overall_level_status":0,"physical_quantity":"","result_type":1,"size":9,"unit_label":null,"unit_name":null,"version":5,"weighted_ol_status":0,"weighting_window":0},"info":{"global_level":0,"id":"INFO","info_mask":0,"tacho":0,"version":1,"weighted_global_level":0},"module_id":10,"process_id":21,"scal":{"id":"SCAL","imag":0,"version":2}}},"_id":{"$oid":"623b3c53f492b4b87a47f0dc"},"value":77}

{"date":{"$date":"2018-01-01T00:00:15.165Z"},"metaField":{"assets":{"city":"Meylan","couleur":"BlougeCeyBienBlouge","country":"France","machine_name":"totomachine","monitoring_id":666},"input_name":"inputname1","results":{"head":{"dimension":1,"id":"HEAD","is_complex":0,"magnitude_key":"Acceleration","measure_status":2,"name":"FFT 1: FFT1: OvPwre [1]-Input 1","overall_level_status":0,"physical_quantity":"","result_type":1,"size":9,"unit_label":null,"unit_name":null,"version":5,"weighted_ol_status":0,"weighting_window":0},"info":{"global_level":0,"id":"INFO","info_mask":0,"tacho":0,"version":1,"weighted_global_level":0},"module_id":10,"process_id":21,"scal":{"id":"SCAL","imag":0,"version":2}}},"_id":{"$oid":"623b3c53f492b4b87a47f0e2"},"value":100}

索引为:日期 -1、input_name 1、metaField.results.module_id 1 和 metaField.results.process_id 1。

可能是一百个传感器,每 10 秒推送一次样本。我想从 input_name、module_id 和 process_id 查询数组。如果范围日期很大,则需要抽取。

我首先尝试聚合每年/每月/每天/小时的值(如果范围时间不太大,则为分钟):

{"$match": {"metaField.input_name": "inputname1", "metaField.results.module_id": 10, "metaField.results.process_id": 21, "date:"{"$gte": new ISODate("2018-01-01T01:01:01Z"), "$lt": new ISODate("2020-01-01T01:01:01Z")}},
"$group": {"_id": {
            "input_name": "$metaField.input_name",
            "hour": {"$hour": "$date"},
            "day": {"$dayOfMonth": "$date"},
            "month": {"$month": "$date"},
            "year": {"$year": "$date"}},
            "date": {"$first": "$date"},
            "value": {"$first": "$value"}},
"$sort": {"date": 1},
"$group": {"_id": "$_id.input_name",
                                        "data": {"$push": "$value"},
                                        "date": {"$push": "$date"}}}

但是对于巨大的日期范围来说,性能是不可接受的......(30 秒,持续 3 年......

你有什么更好的主意,或者有一种方法可以不基于日期进行抽取吗?比如检索 1/10 值?

MongoDB 抽取

评论

0赞 YuTing 3/24/2022
也许您可以重新排列数据或查询。给出更多有效的测试数据并显示有效的预期结果。mongoplayground (英语)
0赞 DaminouTav 3/24/2022
你好!有效测试数据:与此类似,对于传感器,只有值会随时间而变化(传感器由其名称/模块/过程定义),每个传感器每 10 秒一个文档。这意味着每个传感器每年有 30 亿个文档。预期结果是值数组和日期数组,以便绘制趋势数据。但是,如果我想要一个年份范围,我不需要 30 亿个值(并且很容易达到 16mb bson 限制)。
0赞 prasad_ 3/24/2022
您能否更具体(明确)地说明您在馆藏中的索引?阶段中有一个查询筛选器,但没有正确引用文档字段。还包括一个清晰的示例文档,其中包含查询筛选器中使用的所有字段。$match
0赞 DaminouTav 3/24/2022
谢谢@prasad_,我已经编辑了我的帖子,添加了一些样本和索引。
0赞 prasad_ 3/24/2022
查询可以从单个复合索引(查询筛选器中使用所有或大部分字段)而不是多个单字段索引中受益。

答: 暂无答案