1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
// Copyright 2025 LunaOS Contributors
// SPDX-License-Identifier: Apache-2.0
//! # Deduplication Table (DDT)
//!
//! This module implements block-level deduplication for LCPFS.
//!
//! ## Overview
//!
//! Deduplication identifies and eliminates duplicate blocks by storing
//! only one copy of each unique block. When a duplicate block is written,
//! LCPFS returns a reference to the existing block instead of allocating
//! new storage.
//!
//! ## How It Works
//!
//! 1. When a block is written, compute its SHA-256 hash
//! 2. Look up the hash in the DDT
//! 3. If found: increment reference count, return existing DVA
//! 4. If not found: allocate new block, register in DDT
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────┐
//! │ Deduplication Flow │
//! ├─────────────────────────────────────────────────────────────┤
//! │ │
//! │ Write Block ──► Hash (SHA-256) ──► DDT Lookup │
//! │ │ │
//! │ ┌────────────────┴─────────────┐ │
//! │ ▼ ▼ │
//! │ [Found] [Not Found]│
//! │ │ │ │
//! │ ▼ ▼ │
//! │ Increment RefCount Allocate Block │
//! │ Return Existing DVA Register in DDT │
//! │ Return New DVA │
//! └─────────────────────────────────────────────────────────────┘
//! ```
//!
//! ## Space Savings
//!
//! Deduplication is most effective for:
//! - Backup storage (many similar files)
//! - Virtual machine images (common OS blocks)
//! - Development environments (similar codebases)
//!
//! ## Memory Usage
//!
//! The DDT requires approximately 320 bytes per unique block:
//! - 32 bytes: SHA-256 hash
//! - 12 bytes: DVA (vdev + offset)
//! - 8 bytes: reference count
//! - Overhead: BTreeMap node structure
//!
//! For a 1TB pool with 128KB blocks, the DDT uses ~26 MB of RAM.
use crateDva;
use BTreeMap;
use lazy_static;
use Mutex;
// ═══════════════════════════════════════════════════════════════════════════════
// DDT ENTRY
// ═══════════════════════════════════════════════════════════════════════════════
/// Entry in the deduplication table.
///
/// Each unique block has one DDT entry that tracks its location and
/// how many references point to it.
// ═══════════════════════════════════════════════════════════════════════════════
// DEDUP TABLE
// ═══════════════════════════════════════════════════════════════════════════════
/// Deduplication table - maps content hashes to block locations.
///
/// The DDT is a critical data structure for deduplication. It must be
/// persisted to disk to survive reboots.
lazy_static!