Skip to main content

queue_nodes_from_swhids_csv

Function queue_nodes_from_swhids_csv 

Source
pub fn queue_nodes_from_swhids_csv<G, R: Read + Send>(
    graph: &G,
    reader: R,
    column_name: &str,
    tx: SyncSender<Box<[(String, NodeId)]>>,
    batch_size: usize,
) -> Result<()>
Expand description

Reads CSV records from a file, and queues their SWHIDs and node ids to tx, preserving the order.

This is equivalent to:

std::thread::spawn(move || -> Result<()> {
    let mut reader = csv::ReaderBuilder::new()
        .has_headers(true)
        .from_reader(reader);

    for record in reader.deserialize() {
        let InputRecord { swhid, .. } =
            record.with_context(|| format!("Could not deserialize record"))?;
        let node = graph
            .properties()
            .node_id_from_string_swhid(swhid)
            .with_context(|| format!("Unknown SWHID: {}", swhid))?;

        tx.send((swhid, node))
    }
});

but uses inner parallelism as node_id() could otherwise be a bottleneck on systems where accessing graph.order has high latency (network and/or compressed filesystem). This reduces the runtime from a couple of weeks to less than a day on the 2023-09-06 graph on a ZSTD-compressed ZFS.

reader is buffered internally.