Expand description
Record processor function output trait
The return type must satisfy the Processable trait, which requires:
Clonebecause two rkeys can refer to the same record by CID, which may only appear once in the CAR file.Serialize + DeserializeOwnedso it can be spilled to disk.
One required function must be implemented, get_size(): this should return the
approximate total off-stack size of the type. (the on-stack size will be added
automatically via std::mem::get_size).
Note that it is not guaranteed that the process function will run on a
block before storing it in memory or on disk: it’s not possible to know if a
block is a record without actually walking the MST, so the best we can do is
apply process to any block that we know cannot be an MST node, and otherwise
store the raw block bytes.
Here’s a silly processing function that just collects ’eyy’s found in the raw record bytes
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Eyy(usize, String);
impl Processable for Eyy {
fn get_size(&self) -> usize {
// don't need to compute the usize, it's on the stack
self.1.capacity() // in-mem size from the string's capacity, in bytes
}
}
fn process(raw: Vec<u8>) -> Vec<Eyy> {
let mut out = Vec::new();
let to_find = "eyy".as_bytes();
for i in 0..(raw.len() - 3) {
if &raw[i..(i+3)] == to_find {
out.push(Eyy(i, "eyy".to_string()));
}
}
out
}The memory sizing stuff is a little sketch but probably at least approximately works.
Traits§
- Processable
- Output trait for record processing
Functions§
- noop
- Processor that just returns the raw blocks