1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
//! Sharded trie storage for Google Books n-gram import.
//!
//! This module provides a sharded storage architecture that distributes n-grams
//! across multiple trie instances based on prefix routing. This eliminates the
//! single-writer bottleneck of a centralized trie, enabling true parallel writes.
//!
//! # Architecture
//!
//! ```text
//! ┌─────────────────────────────────────────┐
//! │ ShardCoordinator │
//! │ (orchestrates shards, checkpoints) │
//! └───────────┬─────────────────────────────┘
//! │
//! ┌───────┼───────┐───────┐
//! │ │ │ │
//! ┌───┴───┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
//! │Shard a│ │...│ │th │ │zz │ ← Each shard written lock-free
//! └───────┘ └───┘ └───┘ └───┘
//! ```
//!
//! # Sharding Strategy
//!
//! N-grams are routed to shards based on the first character(s) of the first word:
//!
//! - **1-grams**: 26 shards (a-z)
//! - **2-5 grams**: 676 shards (aa-zz)
//!
//! This matches Google Books file partitioning, enabling lock-free parallel writes
//! where each worker writes to its own shard without coordination.
//!
//! # Example
//!
//! ```ignore
//! use libgrammstein::sources::google_books::sharding::{
//! MergeCoordinator, ShardConfig, ShardCoordinator, ShardGranularity,
//! };
//!
//! // Create coordinator with adaptive sharding
//! let config = ShardConfig::new("/tmp/shards")
//! .with_granularity(ShardGranularity::Adaptive)
//! .with_max_writers(8);
//!
//! let coordinator = ShardCoordinator::create(config)?;
//!
//! // Workers write to different shards in parallel — the lock-free overlay
//! // lets concurrent `store_ngram` calls proceed with no writer token or lock.
//! coordinator.store_ngram("the|quick|brown", 100)?;
//!
//! // After import, merge all shards into a single in-memory n-gram map.
//! let merged = MergeCoordinator::new(&coordinator).merge_to_memory()?;
//! ```
//!
//! # Checkpoint & Recovery
//!
//! Each shard maintains its own WAL (Write-Ahead Log) for crash recovery.
//! A global checkpoint coordinates per-shard checkpoints for consistent recovery.
//!
//! # Merge Strategy
//!
//! After import completes, shards are merged using parallel reduction:
//!
//! 1. **Pairwise merge**: Merge adjacent shards in parallel
//! 2. **Reduce**: Continue until single shard remains
//! 3. **Export**: Materialize as a byte-keyed trie (`merge_to_trie`) or in-memory map (`merge_to_memory`)
// Re-export commonly used types
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;