datafold 0.1.28

A personal database for data sovereignty with AI-powered ingestion and S3 support
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
# Storage Abstraction Refactoring Plan

## Executive Summary

The current storage abstraction layer provides value for basic CRUD operations but leaks backend-specific concerns (async/sync execution models, flush semantics, deadlock handling) throughout the codebase. This plan outlines a refactoring to make the abstraction truly backend-agnostic.

## Current Problems

### 1. Execution Model Leakage
**Problem**: Backend-specific execution models (async vs sync) leak into business logic.

**Evidence**:
- `is_dynamodb()` checks in 37+ locations
- `flush_sync()` method specifically for DynamoDB
- Conditional `spawn_blocking` avoidance for DynamoDB
- `run_async()` helper with deadlock workarounds

**Impact**: 
- Business logic must know about backend implementation details
- Deadlock issues when mixing sync/async contexts
- Difficult to add new backends without modifying business logic

### 2. Flush Semantics Mismatch
**Problem**: Flush means different things for different backends.

**Current State**:
- Sled: `flush()` = actual disk write, blocking operation
- DynamoDB: `flush()` = no-op (eventually consistent)
- Code must check backend type to decide flush behavior

**Impact**:
- `flush_sync()` workaround needed
- Confusion about when data is actually persisted
- Potential data loss if flush semantics misunderstood

### 3. Backend-Specific Code Paths
**Problem**: Business logic branches based on backend type.

**Examples**:
```rust
// mutation_manager.rs
if is_dynamodb {
    // Special handling
} else {
    // Different handling
}

// operation_processor.rs
if is_dynamodb {
    // Avoid spawn_blocking
} else {
    // Use spawn_blocking
}
```

**Impact**:
- Violates abstraction principle
- Makes testing harder
- Increases maintenance burden

## Proposed Solution: Async-First Abstraction

### Core Principle
**Make all storage operations async, even for sync backends.**

This eliminates the async/sync split and makes the abstraction truly backend-agnostic.

### Architecture Changes

#### 1. Unified Async Trait Interface

**Current** (Mixed):
```rust
// Some operations are async
async fn flush(&self) -> StorageResult<()>;

// But we need sync versions
fn flush_sync(&self) -> Result<(), SchemaError>;
```

**Proposed** (Unified):
```rust
#[async_trait]
pub trait KvStore: Send + Sync {
    // All operations are async
    async fn flush(&self) -> StorageResult<()>;
    
    // Backend capabilities exposed via metadata
    fn execution_model(&self) -> ExecutionModel;
    fn flush_behavior(&self) -> FlushBehavior;
}
```

#### 2. Execution Model Metadata

```rust
/// Describes how the backend executes operations
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ExecutionModel {
    /// Backend is truly async (network I/O)
    Async,
    /// Backend is sync but wrapped in async (local I/O)
    SyncWrapped,
}

/// Describes flush behavior
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FlushBehavior {
    /// Flush is a no-op (eventually consistent backend)
    NoOp,
    /// Flush performs actual persistence (strongly consistent)
    Persists,
}
```

#### 3. Backend Implementations

**Sled Backend** (Sync → Async Wrapper):
```rust
impl KvStore for SledKvStore {
    async fn flush(&self) -> StorageResult<()> {
        // Wrap sync operation in spawn_blocking
        tokio::task::spawn_blocking({
            let tree = self.tree.clone();
            move || tree.flush()
        })
        .await
        .map_err(|e| StorageError::Internal(e.to_string()))?
        .map_err(|e| StorageError::Io(e))
    }
    
    fn execution_model(&self) -> ExecutionModel {
        ExecutionModel::SyncWrapped
    }
    
    fn flush_behavior(&self) -> FlushBehavior {
        FlushBehavior::Persists
    }
}
```

**DynamoDB Backend** (Native Async):
```rust
impl KvStore for DynamoDbKvStore {
    async fn flush(&self) -> StorageResult<()> {
        // DynamoDB is eventually consistent, flush is no-op
        Ok(())
    }
    
    fn execution_model(&self) -> ExecutionModel {
        ExecutionModel::Async
    }
    
    fn flush_behavior(&self) -> FlushBehavior {
        FlushBehavior::NoOp
    }
}
```

#### 4. Remove Backend-Specific Checks

**Before**:
```rust
if is_dynamodb {
    // Special path
} else {
    // Normal path
}
```

**After**:
```rust
// All paths use async, no branching needed
db_ops.flush().await?;
```

#### 5. Mutation Path Refactoring

**Current Problem**: Mutations use `spawn_blocking` which deadlocks with DynamoDB.

**Solution**: Make mutation operations fully async.

**Before**:
```rust
// operation_processor.rs
let mut ids = if is_dynamodb {
    // Direct call (no spawn_blocking)
    db_guard.mutation_manager.write_mutations_batch(vec![mutation])?
} else {
    // Use spawn_blocking
    tokio::task::spawn_blocking(move || {
        node.mutate_batch(vec![mutation])
    }).await?
};
```

**After**:
```rust
// operation_processor.rs
let mut ids = db_guard.mutation_manager
    .write_mutations_batch_async(vec![mutation])
    .await?;
```

**Mutation Manager Changes**:
```rust
impl MutationManager {
    // New async version
    pub async fn write_mutations_batch_async(
        &mut self,
        mutations: Vec<Mutation>
    ) -> Result<Vec<String>, SchemaError> {
        // All storage operations are async
        // No run_async() needed, no spawn_blocking
        // Direct async/await throughout
    }
    
    // Keep sync version for backward compatibility (deprecated)
    #[deprecated]
    pub fn write_mutations_batch(
        &mut self,
        mutations: Vec<Mutation>
    ) -> Result<Vec<String>, SchemaError> {
        // Wrapper that calls async version
        tokio::runtime::Handle::current()
            .block_on(self.write_mutations_batch_async(mutations))
    }
}
```

## Implementation Plan

### Phase 1: Add Execution Model Metadata (Week 1)

**Goal**: Expose backend capabilities without breaking changes.

**Tasks**:
1. Add `ExecutionModel` and `FlushBehavior` enums
2. Add methods to `KvStore` trait:
   - `fn execution_model(&self) -> ExecutionModel`
   - `fn flush_behavior(&self) -> FlushBehavior`
3. Implement for all backends (Sled, DynamoDB, InMemory)
4. Add tests to verify metadata is correct

**Files to Modify**:
- `src/storage/traits.rs`
- `src/storage/sled_backend.rs`
- `src/storage/dynamodb_backend.rs`
- `src/storage/inmemory_backend.rs`

**Success Criteria**:
- All backends expose correct metadata
- No breaking changes to existing code
- Tests pass

### Phase 2: Make Sled Backend Fully Async (Week 2)

**Goal**: Wrap all Sled operations in async, eliminating sync/async split.

**Tasks**:
1. Update `SledKvStore` to wrap all sync operations in `spawn_blocking`
2. Ensure all trait methods are async
3. Update error handling for async context
4. Add performance benchmarks to ensure no regression

**Files to Modify**:
- `src/storage/sled_backend.rs`
- `src/storage/tests.rs` (add async tests)

**Success Criteria**:
- All Sled operations are async
- No performance regression (< 5% overhead)
- All tests pass

### Phase 3: Create Async Mutation Manager (Week 3)

**Goal**: Make mutation operations fully async, eliminating deadlock issues.

**Tasks**:
1. Create `write_mutations_batch_async()` method
2. Refactor mutation logic to use direct async/await
3. Remove `run_async()` helper (no longer needed)
4. Update mutation paths to use async version
5. Keep sync version as deprecated wrapper

**Files to Modify**:
- `src/fold_db_core/mutation_manager.rs`
- `src/datafold_node/operation_processor.rs`
- `src/datafold_node/db.rs` (add async mutate methods)

**Success Criteria**:
- Mutations work without deadlocks
- No `is_dynamodb()` checks in mutation path
- All mutation tests pass

### Phase 4: Remove Backend-Specific Checks (Week 4)

**Goal**: Eliminate all `is_dynamodb()` and backend name checks.

**Tasks**:
1. Remove `is_dynamodb()` method from `DbOperationsV2`
2. Remove `flush_sync()` method (use async `flush()` everywhere)
3. Update all call sites to use async operations
4. Remove conditional execution paths
5. Update error messages to be backend-agnostic

**Files to Modify**:
- `src/db_operations/core_refactored.rs`
- `src/fold_db_core/mutation_manager.rs`
- `src/datafold_node/operation_processor.rs`
- `src/db_operations/schema_operations_v2.rs`
- All files with `is_dynamodb()` checks

**Success Criteria**:
- Zero `is_dynamodb()` checks in business logic
- Zero `backend_name == "dynamodb"` checks
- All tests pass
- Code is cleaner and more maintainable

### Phase 5: Documentation and Cleanup (Week 5)

**Goal**: Document the new architecture and remove deprecated code.

**Tasks**:
1. Update architecture documentation
2. Add migration guide for developers
3. Remove deprecated sync methods (after migration period)
4. Update examples and tutorials
5. Add performance benchmarks comparing old vs new

**Files to Modify**:
- `docs/STORAGE_ABSTRACTION_DESIGN.md`
- `docs/STORAGE_ABSTRACTION_IMPLEMENTATION.md`
- `README.md`
- Examples and tests

**Success Criteria**:
- Documentation is complete and accurate
- All deprecated code removed
- Examples work with new API

## Migration Strategy

### Backward Compatibility

**Approach**: Gradual migration with deprecation warnings.

1. **Phase 1-2**: Add new async methods alongside existing sync methods
2. **Phase 3**: Update call sites to use async methods
3. **Phase 4**: Mark sync methods as deprecated
4. **Phase 5**: Remove sync methods after migration period (e.g., 2 releases)

### Testing Strategy

1. **Unit Tests**: Test each backend independently
2. **Integration Tests**: Test with all backends (Sled, DynamoDB, InMemory)
3. **Performance Tests**: Ensure no regression with async wrappers
4. **Deadlock Tests**: Verify mutations don't deadlock with DynamoDB
5. **Compatibility Tests**: Ensure deprecated methods still work

## Benefits

### 1. True Abstraction
- Business logic doesn't need to know about backend implementation
- Easy to add new backends (PostgreSQL, Redis, etc.)
- Backend-specific code isolated to storage layer

### 2. Eliminates Deadlocks
- No more `spawn_blocking` + `block_on` deadlocks
- Consistent async execution model
- Easier to reason about concurrency

### 3. Better Performance
- No unnecessary thread pool overhead for async backends
- Consistent execution model allows better optimization
- Can use async batching more effectively

### 4. Improved Maintainability
- Less code (no conditional paths)
- Easier to test (consistent async model)
- Clearer error messages (backend-agnostic)

### 5. Future-Proof
- Easy to add new async backends
- Can leverage async ecosystem (tokio, async-std)
- Better support for distributed systems

## Risks and Mitigations

### Risk 1: Performance Regression for Sled
**Mitigation**: 
- Benchmark before/after
- Use `spawn_blocking` efficiently (batch operations)
- Consider thread-local caching for hot paths

### Risk 2: Breaking Changes
**Mitigation**:
- Gradual migration with deprecation
- Keep sync methods during transition
- Comprehensive test coverage

### Risk 3: Increased Complexity
**Mitigation**:
- Clear documentation
- Code examples
- Incremental rollout

## Success Metrics

1. **Code Quality**:
   - Zero `is_dynamodb()` checks in business logic
   - Zero `backend_name` checks in business logic
   - Reduced cyclomatic complexity

2. **Reliability**:
   - Zero deadlock issues
   - All tests pass
   - No performance regression

3. **Maintainability**:
   - Reduced lines of code
   - Clearer error messages
   - Better documentation

## Timeline

- **Week 1**: Execution model metadata
- **Week 2**: Sled async wrapper
- **Week 3**: Async mutation manager
- **Week 4**: Remove backend checks
- **Week 5**: Documentation and cleanup

**Total**: 5 weeks for complete refactoring

## Next Steps

1. Review and approve this plan
2. Create GitHub issues for each phase
3. Set up performance benchmarking infrastructure
4. Begin Phase 1 implementation