Skip to main content

cosine_join_with

Function cosine_join_with 

Source
pub fn cosine_join_with(
    c: &Corpus,
    t: f64,
    mode: Concurrency,
) -> Vec<(usize, usize, f64)>
Expand description

Run the cosine join under a chosen Concurrency backend. Returns (j, i, cos) pairs with j < i and cos ≥ t, scores as f64 (the Gpu mode’s f32 cosines are widened losslessly).

  • Concurrency::Cpucosine_join: exact f64, all-CPU, every platform.
  • Concurrency::GpuPlusCpu — exact f64 hybrid: CPU generates survivor pairs, the GPU f32 cosine filters the clear rejects, the CPU recomputes the exact f64 score on what passes. Byte-identical to Cpu; both engines fully used. ~1.7–2× on bandwidth-bound real data.
  • Concurrency::Gpu — GPU-dominant f32: CPU generates survivor pairs, the GPU scores them and the result is emitted directly (no f64 re-verify). Fastest (~2×); differs from the exact answer only on pairs whose true cosine is within ~1e-6 of t (measured: ≤1 pair in millions).

When the gpu feature is off, the target isn’t macOS, or no Metal device can be acquired, the GPU modes transparently fall back to cosine_join (same as Rationer). This convenience entry compiles + uploads the GPU corpus on every call — fine for a one-shot join, but for repeated joins on one corpus build a CosineJoiner once and call CosineJoiner::join, which holds the device + kernel + uploaded CSR across calls (and avoids the driver instability of compiling a Metal library hundreds of times in a tight loop).