1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
"""difflib-fast — fast, byte-for-byte exact difflib Ratcliff-Obershelp similarity + clustering.
A drop-in for ``difflib.SequenceMatcher(None, a, b, autojunk=False).ratio()``, computed with a suffix
automaton (Rust), plus exact single-linkage clustering of a corpus.
``ratio`` is overloaded: two strings → one float; a list of ``(a, b)`` pairs → a list of floats,
computed across all cores inside Rust (rayon, GIL released) — the contention-free way to score a batch.
``Rationer`` is the stateful handle: on a macOS wheel built with the ``gpu`` feature it runs
``cluster_canonicals`` on the Metal GPU (else CPU, identical output).
"""
=
...
...
"""Exact ``difflib.SequenceMatcher(None, a, b, autojunk=False).ratio()`` — byte-for-byte.
- ``ratio(a, b)`` → one float for the pair.
- ``ratio(pairs)`` → one float per ``(a, b)`` pair, computed in parallel across all cores inside
Rust (the GIL is released), so a batch saturates every core with no ``ThreadPoolExecutor`` and no
per-call overhead. ``ratio(pairs)[i] == ratio(*pairs[i])``, in order. Pass ``threads=N`` to cap
the pool to N workers for this call (``threads=0``, the default, uses every core — itself tunable
process-wide via the ``RAYON_NUM_THREADS`` environment variable).
"""
return
return