Module process

Module process 

Source
Expand description

Record processor function output trait

The return type must satisfy the Processable trait, which requires:

  • Clone because two rkeys can refer to the same record by CID, which may only appear once in the CAR file.
  • Serialize + DeserializeOwned so it can be spilled to disk.

One required function must be implemented, get_size(): this should return the approximate total off-stack size of the type. (the on-stack size will be added automatically via std::mem::get_size).

Note that it is not guaranteed that the process function will run on a block before storing it in memory or on disk: it’s not possible to know if a block is a record without actually walking the MST, so the best we can do is apply process to any block that we know cannot be an MST node, and otherwise store the raw block bytes.

Here’s a silly processing function that just collects ’eyy’s found in the raw record bytes

#[derive(Debug, Clone, Serialize, Deserialize)]
struct Eyy(usize, String);

impl Processable for Eyy {
    fn get_size(&self) -> usize {
        // don't need to compute the usize, it's on the stack
        self.1.capacity() // in-mem size from the string's capacity, in bytes
    }
}

fn process(raw: Vec<u8>) -> Vec<Eyy> {
    let mut out = Vec::new();
    let to_find = "eyy".as_bytes();
    for i in 0..(raw.len() - 3) {
        if &raw[i..(i+3)] == to_find {
            out.push(Eyy(i, "eyy".to_string()));
        }
    }
    out
}

The memory sizing stuff is a little sketch but probably at least approximately works.

Traits§

Processable
Output trait for record processing

Functions§

noop
Processor that just returns the raw blocks