1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
//! SIMD-accelerated DSV semi-indexing.
//!
//! This module provides vectorized implementations of DSV (CSV/TSV) parsing
//! that process multiple bytes at once using SIMD instructions.
//!
//! The algorithm is based on the hw-dsv approach:
//! - Use SIMD to find all quotes, delimiters, and newlines in parallel
//! - Use arithmetic carry propagation to mask out characters inside quotes
//! - The trick: quote positions create a mask where odd quotes "open" and even quotes "close"
//!
//! ## Algorithm
//!
//! For each 64-byte chunk:
//! 1. Find all quote, delimiter, and newline positions using SIMD comparisons
//! 2. Compute the "in-quote" mask using prefix XOR (or BMI2 PDEP / SVE2 BDEP on supported CPUs)
//! 3. Mask out delimiters and newlines that are inside quotes
//!
//! ## x86_64 Instruction Sets
//!
//! - **BMI2 + AVX2** (fastest): Uses PDEP for quote masking, ~10x faster than prefix_xor
//! - **AVX2** (fast): 32 bytes/iteration with prefix_xor, ~95% availability (2013+)
//! - **SSE2** (baseline): 16 bytes/iteration, universal availability
//!
//! ## ARM aarch64
//!
//! - **SVE2-BITPERM + NEON** (fastest): Uses BDEP for quote masking, ~10x faster than prefix_xor
//! - Supported: Azure Cobalt 100, AWS Graviton 4, Neoverse N2/V2
//! - **NEON** (baseline): 16 bytes/iteration with prefix_xor, universal on aarch64
use is_aarch64_feature_detected;
// ============================================================================
// ARM exports with runtime dispatch (SVE2 > NEON)
// ============================================================================
/// Build a DSV index using the fastest available SIMD implementation.
///
/// Runtime dispatch order (fastest to slowest):
/// 1. SVE2-BITPERM + NEON: Uses BDEP for quote masking (~10x faster)
/// 2. NEON: Uses prefix_xor for quote masking (fallback)
// Without std feature, default to NEON (can't do runtime detection)
pub use build_index_simd;
// ============================================================================
// x86_64 exports with runtime dispatch
// ============================================================================
/// Build a DSV index using the fastest available SIMD implementation.
///
/// Runtime dispatch order (fastest to slowest):
/// 1. BMI2 + AVX2: Uses PDEP for quote masking (~10x faster)
/// 2. AVX2: Uses prefix_xor for quote masking
/// 3. SSE2: Fallback for older CPUs
// Without std feature, default to SSE2 (can't do runtime detection)
pub use build_index_simd;
// ============================================================================
// Fallback for other platforms
// ============================================================================