1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
//! Fast bitwise Hamming distance using auto-vectorization with runtime SIMD
//! detection on x86.
//!
//! # Quick Start
//!
//! ```
//! use hamming_bitwise_fast::array;
//!
//! let a: [u8; 128] = [0xFF; 128]; // 1024-bit vectors
//! let b: [u8; 128] = [0x00; 128];
//!
//! // Single comparison
//! let distance = array::distance(&a, &b); // 1024
//!
//! // One source vs many targets
//! let targets = vec![a, b];
//! let mut distances = vec![0u32; 2];
//! array::batch(&a, &targets, &mut distances);
//! ```
//!
//! # Choosing an API
//!
//! ## Fixed-size arrays vs slices
//!
//! If the vector size is known at compile time (e.g., 1024-bit embeddings are
//! `[u8; 128]`), use the [`mod@array`] module for the best performance.
//!
//! Use [`mod@slice`] when sizes vary at runtime or are not known until program
//! execution.
//!
//! ## Single vs batch
//!
//! Use [`array::batch`] or [`slice::batch`] when comparing one source against
//! many targets. Batch is the fastest approach for one-to-many comparisons.
//!
//! # Platform Behavior
//!
//! | Platform | Configuration | Behavior |
//! |----------|---------------|----------|
//! | x86/x86_64 | Default | Runtime CPU detection via [`multiversion`](https://crates.io/crates/multiversion) (AVX-512/AVX2/SSE4.2) |
//! | x86/x86_64 | `default-features = false` | Baseline SSE2 only (slow) |
//! | ARM | Default | NEON is baseline; already optimized |
//!
//! On x86, the default build automatically detects and uses the best available
//! SIMD instructions at runtime:
//! ```sh
//! cargo add hamming-bitwise-fast
//! ```
//!
//! For best single-call performance on x86, enable LTO so the compiler can
//! auto-vectorize across the crate boundary:
//! ```toml
//! [profile.release]
//! lto = true
//! ```
//!
//! For maximum performance, also compile with `-C target-cpu=native`
//! (eliminates runtime dispatch overhead, at the cost of portability).
//!
//! On ARM (including Apple Silicon), the default build is already fast.
//!
//! # Feature Flags
//!
//! - `multiversion_x86` *(enabled by default)*: Enables runtime CPU detection
//! for optimal SIMD on x86 via the [`multiversion`](https://crates.io/crates/multiversion) crate.
//! Disable with `default-features = false` if you need zero dependencies or
//! are targeting a known CPU with `-C target-cpu=native`.
// ============================================================================
// Shared implementation functions
// ============================================================================
/// x86 distance implementation using u64 chunks for auto-vectorization.
pub
/// Non-x86 distance implementation using simple byte iteration.
pub
/// Convenience alias for [`slice::distance`] that matches the crate name.
///
/// For fixed-size arrays, consider [`array::distance`] or
/// [`array::batch`] for comparing one source against many targets.