pub fn queue_nodes_from_swhids_csv<G, R: Read + Send>(
graph: &G,
reader: R,
column_name: &str,
tx: SyncSender<Box<[(String, NodeId)]>>,
batch_size: usize,
) -> Result<()>Expand description
Reads CSV records from a file, and queues their SWHIDs and node ids to tx,
preserving the order.
This is equivalent to:
std::thread::spawn(move || -> Result<()> {
let mut reader = csv::ReaderBuilder::new()
.has_headers(true)
.from_reader(reader);
for record in reader.deserialize() {
let InputRecord { swhid, .. } =
record.with_context(|| format!("Could not deserialize record"))?;
let node = graph
.properties()
.node_id_from_string_swhid(swhid)
.with_context(|| format!("Unknown SWHID: {}", swhid))?;
tx.send((swhid, node))
}
});but uses inner parallelism as node_id() could otherwise be a bottleneck on systems
where accessing graph.order has high latency (network and/or compressed filesystem).
This reduces the runtime from a couple of weeks to less than a day on the 2023-09-06
graph on a ZSTD-compressed ZFS.
reader is buffered internally.