minigraf 1.0.0

Zero-config, single-file, embedded graph database with bi-temporal Datalog queries
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
# Minigraf Design Philosophy

> "Minigraf is not trying to replace Neo4j. It's trying to replace `serde_json` for graph data."

Minigraf aims to be **the embedded graph memory layer for AI agents, mobile apps, and the browser** — built on the SQLite philosophy: small, fast, reliable, zero-configuration, single-file.

## Why Datalog?

Minigraf uses Datalog as its query language. Here's why it's the right choice:

### 1. Better Philosophy Alignment

**Datalog is simpler** → Aligns with "do less, do it perfectly":
- Datalog spec: ~50 pages of core concepts
- Smaller surface area = fewer bugs, faster to production

**Datalog is proven** → 40+ years of production use (Datomic since 2012, XTDB, LogicBlox)
**Datalog is reliable** → Well-understood semantics, extensive research

### 2. Natural Fit for Temporal Databases

**Bi-temporal support was always the plan.** Datalog makes it natural:
- Facts are tuples: `(Entity, Attribute, Value, ValidFrom, ValidTo, TxTime)`
- Time is just another dimension in relations
- Temporal queries use simple predicates: `[(<= ?valid-from ?query-time)]`
- No special temporal syntax needed - it's just data

Bi-temporal is 3-4 months of proven patterns (Datomic/XTDB model).

### 3. Graph Traversal is MORE Powerful

**Recursive rules are first-class in Datalog:**
```datalog
[(reachable ?from ?to)
 [?from :connected ?to]]

[(reachable ?from ?to)
 [?from :connected ?intermediate]
 (reachable ?intermediate ?to)]
```

Transitive closure is native, not bolted on.

### 4. Faster Path to Production

**Datalog roadmap**: 12-15 months to production (proven implementation patterns)

We can ship a useful, reliable database faster with Datalog.

### 5. Unique Market Position

**Datalog space**: Gap exists for single-file embedded bi-temporal DB

Minigraf = embedded graph memory for agents/mobile/browser + SQLite's simplicity + Datomic's temporal model (no one else offers this combination)

---

## Core Inspiration: SQLite

SQLite's success comes from a clear philosophy: be a library, not a server. Be small, not feature-complete. Be reliable, not cutting-edge. Minigraf adopts these same principles for graph databases.

## Guiding Principles

### 1. Zero-Configuration

**Philosophy**: It should just work, immediately, with no setup.

**Implementation**:
- No installation process beyond adding a dependency
- No server process to start or manage
- No configuration files to edit
- No connection strings or authentication for local use
- `Minigraf::open("data.graph")` and you're done

**Anti-pattern**: Requiring users to install external dependencies, start services, or edit config files.

### 2. Embedded-First Design

**Philosophy**: Minigraf is a library you link against, not a server you connect to.

**Implementation**:
- In-process execution - direct function calls, no network overhead
- Runs in the same address space as your application
- No client-server architecture for embedded use
- Network protocols are opt-in extensions, not core features

**Anti-pattern**: Designing for client-server first and retrofitting embedded mode.

**Target statement**: "The graph database you compile into your app, not connect to."

### 3. Single-File Database

**Philosophy**: All data in one portable file that's easy to manage.

**Implementation**:
- Single `.graph` file contains nodes, edges, properties, indexes, schema
- Easy to backup: copy one file
- Easy to share: email, USB drive, version control (for small DBs)
- Easy to delete: remove one file
- WASM: Store in browser's IndexedDB as single blob

**Anti-pattern**: Multiple files, directories, or complex file structures that are hard to manage.

### 4. Self-Contained

**Philosophy**: Minimal dependencies. Small binary size. No external requirements.

**Implementation**:
- Pure Rust implementation
- Minimal dependency tree (currently: serde, uuid, anyhow)
- No required system libraries (optional backends OK)
- Target: <1MB binary for core engine
- No runtime dependencies (no JVM, no Python, no Node.js)

**Anti-pattern**: Requiring external services, libraries, or runtimes to function.

### 5. Cross-Platform Portability

**Philosophy**: Run anywhere, from embedded devices to browsers to servers.

**Implementation**:
- Native: Linux, macOS, Windows, BSD, mobile
- WebAssembly: Run in any modern browser
- File format is endian-agnostic and cross-platform
- No platform-specific features in core (OS-specific optimizations OK)

**Target platforms**:
- Desktop: Windows, macOS, Linux
- Mobile: iOS, Android (via FFI/JNI)
- Web: WASM in browsers
- Embedded: Raspberry Pi, IoT devices
- Server: As a library in server applications

**Anti-pattern**: Platform-specific code in the core engine.

### 6. Reliability Over Features

**Philosophy**: It's better to do less and do it perfectly than to do more and do it poorly.

**Implementation**:
- ACID transactions (Atomicity, Consistency, Isolation, Durability)
- Write-ahead logging (WAL) for crash recovery
- Data integrity checks on every operation
- Rigorous testing (aim for 100% branch coverage)
- Conservative feature addition
- No data loss, ever

**Quality bar**:
- Every feature must be fully tested
- Every feature must handle edge cases
- Every feature must be crash-safe
- Prefer proven algorithms over novel ones

**Anti-pattern**: Adding features before existing ones are bulletproof.

### 7. Stability & Backwards Compatibility

**Philosophy**: Your graph database files should work forever.

**Implementation**:
- Stable file format once v1.0 ships
- Can read graphs created 20+ years ago
- API stability: semantic versioning, no breaking changes in minor versions
- Clear migration paths when absolutely necessary
- Deprecation warnings 12+ months before removal

**Commitment**: Once v1.0 ships, file format is stable for decades.

**Anti-pattern**: Breaking changes, format churn, forced migrations.

### 8. Performance Through Simplicity

**Philosophy**: Fast because simple, not simple because fast.

**Implementation**:
- Optimize the common case (small to medium graphs, <1M nodes)
- Page-based storage with locality of reference
- Indexes for frequently queried patterns
- Memory-mapped I/O where beneficial
- Avoid premature optimization

**Target performance**:
- Sub-millisecond queries for indexed lookups
- Thousands of transactions per second on commodity hardware
- Efficient memory usage (<100MB for medium graphs)

**Anti-pattern**: Complex optimization that sacrifices reliability or adds dependencies.

### 9. Well-Documented

**Philosophy**: Documentation is as important as code.

**Implementation**:
- Every public API has rustdoc comments with examples
- Query language reference manual (like SQL reference)
- Architecture documentation for contributors
- Performance tuning guide
- Common patterns and recipes
- Migration guides between versions

**Documentation types**:
- API reference (generated from code)
- User guide (getting started, tutorials)
- Query language specification
- Internals guide (for contributors)

**Anti-pattern**: "The code is the documentation."

### 10. Long-Term Support

**Philosophy**: This is a marathon, not a sprint.

**Implementation**:
- Commitment to decades of support
- Conservative, deliberate feature additions
- No rewrites or "version 2.0" churn
- Security patches for old versions
- Focus on stability over novelty

**Inspiration**: SQLite has been maintained for 20+ years and is committed to 2050.

**Anti-pattern**: Framework churn, major rewrites, abandoned versions.

## What Minigraf IS

✅ **An embedded graph database library**
- Link it into your application like SQLite
- Direct function calls, no network overhead
- Runs in-process with your app

✅ **A bi-temporal database**
- Track when facts were recorded (transaction time)
- Track when facts were valid in the real world (valid time)
- Time travel queries: see any point in history
- Audit trails and compliance built-in

✅ **A Datalog query engine**
- Recursive rules for graph traversal
- Logic programming paradigm
- Simpler than SQL, more powerful for graphs
- Proven semantics (40+ years of research)

✅ **A local-first storage solution**
- Perfect for desktop applications
- Ideal for mobile apps
- Great for WASM in browsers
- Suitable for embedded devices

✅ **A single-file graph store**
- One `.graph` file, easy to manage
- Portable across platforms
- Simple backup and versioning

✅ **A reliable, ACID-compliant database** (Phase 5)
- Transactions with rollback support
- Crash recovery via WAL
- Data integrity guarantees

✅ **A learning-friendly implementation**
- Readable Rust code
- Well-documented internals
- Clear architecture

## What Minigraf IS NOT

❌ **Not a distributed database**
- No clustering, no sharding, no replication
- Single-node only (by design)
- If you need distributed, use Neo4j or similar

❌ **Not a graph analytics engine**
- No built-in PageRank, community detection, etc.
- You can build these on top, or use external tools
- Focus is on storage and queries, not analytics

❌ **Not a client-server system**
- No network protocol in core
- No authentication/authorization layer
- No multi-user access control (use OS permissions)

❌ **Not enterprise-focused**
- No role-based access control (RBAC)
- No audit logging
- No high-availability features
- (These can be built on top if needed)

❌ **Not trying to be Neo4j**
- Different use case (embedded vs. server)
- Different scale (millions vs. billions of nodes)
- Different philosophy (library vs. service)

❌ **Not chasing feature parity with XTDB/Datomic**
- Simpler scope: single-file only
- No distributed features
- No vector search (separate crate if needed)
- Focus on reliability over features

## Target Use Cases

**Primary use cases** (optimize for these):

1. **Audit-heavy applications** - Finance, healthcare, legal (bi-temporal = compliance)
2. **Event sourcing** - Full history, time travel debugging
3. **Personal knowledge bases** - Obsidian, Logseq, Roam-like apps with provenance
4. **Mobile applications** - Local graph storage on phones/tablets
5. **Desktop applications** - Apps that need relationship data (IDEs, note-taking, etc.)
6. **Web applications (WASM)** - Client-side graph storage in browsers
7. **AI/RAG systems** - Knowledge graphs with temporal provenance
8. **Embedded devices** - IoT, edge computing with graph data
9. **Development/testing** - Local graph database for testing
10. **Small to medium production apps** - Where embedded DB is sufficient

**Secondary use cases** (should work, but not optimized for):

11. **Server applications** - Using Minigraf as an embedded component
12. **Data analysis** - Exploring graph datasets locally
13. **Education** - Learning Datalog and temporal databases

**Non-use cases** (explicitly out of scope):

- Large-scale distributed systems
- Multi-datacenter replication
- Billion-node graphs
- Real-time analytics at scale

## Design Decision Framework

When evaluating a feature or design choice, ask:

### 1. Does it align with "SQLite for graphs"?
- Would SQLite do this?
- Does it keep things simple and embedded?

### 2. Does it compromise reliability?
- Can it cause data loss or corruption?
- Does it make the codebase harder to test?

### 3. Does it add complexity?
- How many lines of code?
- How many new dependencies?
- Does it complicate the API?

### 4. Does it serve the primary use cases?
- Is this needed for embedded/mobile/WASM?
- Or is it only useful for enterprise/distributed?

### 5. Can it be a separate crate instead?
- Could this be an optional feature flag?
- Could this be a separate library on top of Minigraf?

### Decision rubric:
- **YES**: Aligns with philosophy, improves reliability, serves primary use cases
- **MAYBE**: Useful but adds complexity, consider making optional
- **NO**: Violates philosophy, compromises reliability, or only serves non-use cases

## Success Metrics

You'll know Minigraf has succeeded when:

1. **Ubiquity**: Developers say "just use Minigraf" for embedded graph storage
2.**Trust**: Known for never losing data, crash-safe, reliable
3.**Simplicity**: New users are productive in under 5 minutes
4.**Size**: Core binary under 1MB, minimal dependencies
5.**Portability**: Runs everywhere from Raspberry Pi to browsers
6.**Stability**: API hasn't broken in years
7.**Documentation**: Comprehensive docs with examples
8.**Longevity**: Still maintained and improved 10+ years later

## Non-Goals

To maintain focus, these are explicitly NOT goals:

- ❌ Distributed consensus algorithms
- ❌ Multi-master replication
- ❌ Built-in authentication/authorization
- ❌ Competing with Neo4j/TigerGraph on their turf
- ❌ Real-time analytics (OLAP workloads)
- ❌ Graph visualization (provide data, let others visualize)
- ❌ Built-in ML/AI (provide APIs for external tools)

## Testing Philosophy

Inspired by SQLite's legendary testing rigor:

**Test coverage goals**:
- 100% branch coverage (aspirational)
- Property-based testing (quickcheck, proptest)
- Fuzz testing (cargo-fuzz)
- Fault injection (simulate disk errors, OOM)
- Memory safety (miri, valgrind)
- Cross-platform testing (CI on Linux, macOS, Windows)

**Test-to-code ratio**: Aim for 5:1 (5x more test code than library code)

**Release criteria**:
- All tests pass on all platforms
- No memory leaks detected
- No undefined behavior (miri clean)
- Performance benchmarks within 5% of baseline
- Documentation complete for new features

## File Format Principles

The `.graph` file format must be:

1. **Stable** - Once v1.0 ships, format is frozen for decades
2. **Self-describing** - Header with magic number and version
3. **Portable** - Endian-agnostic, cross-platform
4. **Efficient** - Page-based, locality of reference
5. **Extensible** - Can add features without breaking old readers
6. **Verifiable** - Checksums for integrity validation

## API Design Principles

1. **Simple common case**: `db.add_node()` should be one line
2. **Safe by default**: Require `unsafe` only where truly needed
3. **Transactions explicit**: Clear when you're in a transaction
4. **Ergonomic errors**: `Result<T, Error>` with helpful messages
5. **Builder patterns**: Complex operations use builders
6. **Zero-cost abstractions**: No runtime penalty for nice APIs

## Evolution Strategy

**Phase 1**: ✅ Prove the concept (COMPLETE)
- Basic graph model, simple queries, in-memory storage

**Phase 2**: ✅ Embeddability (COMPLETE)
- Single-file storage, persistent graph database, embedded API

**Phase 3**: ✅ Datalog Core (COMPLETE)
- EAV data model, basic facts and queries, recursive rules, semi-naive evaluation

**Phase 4**: ✅ Bi-temporal Support (COMPLETE - March 2026)
- Transaction time (`tx_id`, `tx_count`) + valid time (`valid_from`, `valid_to`)
- `:as-of` and `:valid-at` time travel queries, file format v2

**Phase 5**: ✅ ACID + WAL (COMPLETE)
- Write-ahead logging, transactions, crash recovery

**Phase 6**: ✅ Performance (COMPLETE — March 2026)
- Covering indexes (EAVT, AEVT, AVET, VAET), packed pages, LRU page cache, on-disk B+tree (file format v6), query optimizer

**Phase 7**: 🎯 Datalog Completeness (next — 6-8 weeks)
- Stratified negation (`not` / `not-join`), aggregation (`count`, `sum`, `min`, `max`), disjunction (`or` / `or-join`)
- ≥90% branch coverage target

**Phase 8**: 🎯 Cross-platform (3-4 months)
- WASM (browser via wasm-pack + npm; server-side via WASI)
- Mobile bindings (iOS `.xcframework`, Android `.aar` via UniFFI)
- Language bindings (Python, JavaScript, C)

**Phase 9**: 🎯 Ecosystem & Tooling (ongoing)
- Developer tools: database inspector, query profiler, time travel visualizer
- Documentation: Datalog language spec, cookbook, performance tuning guide
- Integration examples: GraphRAG pattern, LangChain/LlamaIndex agent memory, annotated end-to-end scenarios
- Ecosystem libraries: graph algorithms crate, schema validation, import/export, backup utilities
- Exploratory: database branching (`db.branch()` → independent `.graph` fork for speculative writes, agent sandboxing, test isolation)

**v1.0.0**: 9-12 months

See ROADMAP.md for detailed feature breakdown.
## When to Say "No"

It's important to say "no" to preserve the project's focus:

**Say NO to**:
- Features that only serve enterprise/distributed use cases
- Complexity that compromises reliability
- Dependencies that increase binary size significantly

- Breaking changes without overwhelming justification
- Features that should be separate libraries
- Premature optimization

**It's OK to say**: "That's a great feature, but it's better suited for a library built on top of Minigraf."

## Inspirations

Beyond SQLite, we draw inspiration from:

- **Datomic**: Immutable facts, temporal queries, Datalog
- **XTDB**: Bi-temporal database, time travel
- **Cozo**: Embedded Datalog, graph algorithms
- **Redis**: Simple, focused, well-documented
- **Git**: Single-file stores (packfiles), content-addressed storage
- **DuckDB**: Modern analytics, SQLite-style
- **Local-first software**: Offline-capable, user-owned data

## Closing Thoughts

Minigraf is a decades-long project. We optimize for:
- **Reliability** over features
- **Simplicity** over flexibility
- **Longevity** over hype
- **Users** over competitors

The goal is not to be the most feature-complete graph database. The goal is to be the one that's always there when you need it, works reliably, and never gets in your way.

Be boring. Be reliable. Be Minigraf.

---

*This document is a living guide. When in doubt, refer back to these principles.*