提问人:DHP 提问时间:10/21/2023 更新时间:10/22/2023 访问量:73
从文件中并行读取二进制码序列化结构
Read bincode serialized structs in parallel from file
问:
我目前正在使用将大量相同的结构序列化到文件。然后,我使用以下命令并行从此文件中读取数据:serde-jsonlines
rayon
par_bridge
let mut reader = JsonLinesReader::new(input_file);
let results: Vec<ResultStruct> = db_json_reader
.read_all::<MyStruct>()
.par_bridge()
.into_par_iter()
.map(|my_struct| {
// do processing of my struct and return result
result
})
.collect();
这之所以有效,是因为在输入文件的行上返回一个迭代器。我想用它来编码我的结构,因为这会导致磁盘上的文件变小。我有以下按预期工作的游乐场:JsonLinesReader
bincode
use bincode;
use serde::{Deserialize, Serialize};
use std::fs::File;
use std::io::{BufWriter, Write};
#[derive(Debug, Deserialize, Serialize)]
struct MyStruct {
name: String,
value: Vec<u64>,
}
pub fn playground() {
let s1 = MyStruct {
name: "Hello".to_string(),
value: vec![1, 2, 3],
};
let s2 = MyStruct {
name: "World!".to_string(),
value: vec![3, 4, 5, 6],
};
let out_file = File::create("test.bin").expect("Unable to create file");
let mut writer = BufWriter::new(out_file);
let s1_encoded: Vec<u8> = bincode::serialize(&s1).unwrap();
writer.write_all(&s1_encoded).expect("Unable to write data");
let s2_encoded: Vec<u8> = bincode::serialize(&s2).unwrap();
writer.write_all(&s2_encoded).expect("Unable to write data");
drop(writer);
let mut in_file = File::open("test.bin").expect("Unable to open file");
let s1_decoded: MyStruct =
bincode::deserialize_from(&mut in_file).expect("Unable to read data");
let s2_decoded: MyStruct =
bincode::deserialize_from(&mut in_file).expect("Unable to read data");
println!("s1_decoded: {:?}", s1_decoded);
println!("s2_decoded: {:?}", s2_decoded);
}
是否可以以类似于我当前正在执行的方式并行读取结构?我想这可能是不可能的,因为每个结构都不是由换行符终止的,因此没有明智的方法来分块输入流以允许多个线程进行处理。serde-jsonlines
答:
2赞
Jmb
10/22/2023
#1
请注意,该代码使用单线程来解析 JSON,并且仅使用多线程进行处理。同样的事情可以用:serde-jsonlines
map
bincode
let results: Vec<ResultStruct> = iter::from_fn (
move || bincode::deserialize_from (&mut in_file).ok())
.par_bridge()
.map (|my_struct| {
// do processing of my struct and return result
result
})
.collect();
(我还删除了多余的调用,因为已经创建了一个并行迭代器)。into_par_iter
par_bridge
评论
iter::from_fn (move || bincode::deserialize_from(&mut in_file).ok()).par_bridge()…
.par_bridge().into_par_iter()
.par_bridge()