hd-cas 0.1.0

Content-addressable store with BLAKE3 hashing and CDC chunking for hyperdocker
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
# Hyperdocker

**A Rust-native, content-addressed container runtime that replaces Docker's layer-based rebuild model with an incremental Merkle DAG.**

[![Crates.io](https://img.shields.io/crates/v/hd-cli.svg)](https://crates.io/crates/hd-cli)
[![CI](https://github.com/omeedtehrani/hyperdocker/actions/workflows/ci.yml/badge.svg)](https://github.com/omeedtehrani/hyperdocker/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

---

## Why Hyperdocker

Docker changed how we ship software. But its inner loop -- the edit-build-test cycle during development -- has barely improved since 2013. The fundamental problem is Docker's layer model: every `RUN`, `COPY`, and `ADD` instruction creates an opaque filesystem layer. When you change a single source file, Docker invalidates that layer and every layer after it, then re-executes all of them from scratch.

This means:

- **Changing one line of application code** triggers a full `npm install` / `pip install` / `cargo build` because the dependency layer sits above the source layer, or vice versa.
- **Layer ordering is fragile.** Reordering Dockerfile instructions to optimize caching is a dark art. Get it wrong and you rebuild everything.
- **There is no content awareness.** Docker does not know that only `src/utils.ts` changed. It sees "the COPY context changed" and invalidates everything downstream.
- **Rebuilds are sequential.** Each layer waits for the previous one. Parallelism is impossible within the Dockerfile model.

Hyperdocker takes a fundamentally different approach. Instead of layers, it uses a **content-addressed store** (CAS) backed by a **Merkle DAG** that tracks every file, package, and build step as an individually hashed node. When a file changes, hyperdocker walks the DAG bottom-up, rehashing only the nodes whose inputs actually changed. Everything else is untouched.

```
  Docker: Layer-Based Rebuild             Hyperdocker: Merkle DAG Invalidation
  ===========================             ====================================

  +---------------------+                         +-------+
  | FROM ubuntu:22.04   | <-- always cached        | Env   |
  +---------------------+                         +---+---+
  | RUN apt-get install | <-- cached if above OK      |
  +---------------------+                     +-------+-------+
  | COPY package.json . | <-- INVALIDATED!     |               |
  +---------------------+   (file changed)   Pkg(node)    Dir(src/)
  | RUN npm install     | <-- RE-RUN!           |         /       \
  +---------------------+   (layer above      (cached)  main.rs  lib.rs
  | COPY . .            |    changed)                   CHANGED   (cached)
  +---------------------+                                 |
  | RUN npm run build   | <-- RE-RUN!             Only main.rs rehashed.
  +---------------------+                         Dir(src/) rehashed.
  | CMD ["node","app"]  |                         Env root rehashed.
  +---------------------+                         Everything else: untouched.
                                                   npm install: NOT re-run.
  Result: ~60s rebuild                             Result: ~200ms update
```

The key insight is that **most changes during development touch a tiny fraction of the dependency graph.** Hyperdocker exploits this by making the unit of caching a content-addressed chunk, not an ordered layer. If the content did not change, the hash did not change, and there is nothing to rebuild.

---

## Key Features

- **Content-addressed storage** -- Every file is split into content-defined chunks (FastCDC), hashed with BLAKE3, and deduplicated at the chunk level. Identical content is stored once, regardless of filename or path.

- **Merkle DAG engine** -- Files, directories, packages, and build steps are nodes in a directed acyclic graph. Each node's identity is derived from its content and its children's hashes. Changing a leaf rehashes only the path from that leaf to the root.

- **Bottom-up invalidation** -- When a file changes on disk, hyperdocker walks the DAG upward, rebuilding only the ancestor nodes whose child hashes changed. Siblings are left untouched.

- **FUSE-projected filesystem** -- The environment filesystem is projected from the DAG via FUSE. Files are materialized lazily from the CAS on read. An in-memory overlay captures writes without modifying the immutable DAG.

- **File watching with debouncing** -- Filesystem changes are detected via `notify`, filtered against include/exclude patterns, debounced to coalesce rapid edits, and then fed into the DAG invalidation engine.

- **Service management** -- Services are defined in `hd.toml` with watch patterns, dependency ordering, and restart policies. When a watched file changes, only the affected services are restarted.

- **OCI compatibility** -- OCI/Docker images can be used as base images. Layers are unpacked into the CAS. Dockerfiles can be translated to `hd.toml` via `hd ingest`.

- **Zstd compression** -- Chunks larger than 512 bytes are transparently compressed with zstd (level 3) on disk and decompressed on read.

- **Reference-counting garbage collection** -- Unreferenced manifests and chunks are cleaned up by the GC. Active environments hold references to keep their data alive.

- **Deterministic lockfile** -- Resolved dependencies are recorded in `hd.lock` with exact versions and artifact hashes, ensuring reproducible builds across machines.

- **Single static binary** -- Ships as `hd`, a single Rust binary with no runtime dependencies beyond FUSE.

---

## How It Works

### Content-Addressed Store (CAS)

Every piece of data in hyperdocker lives in a content-addressed store. Files are split into variable-size chunks using [FastCDC](https://github.com/nlfiedler/fastcdc-rs) (4 KB min, 16 KB target, 64 KB max), and each chunk is hashed with [BLAKE3](https://github.com/BLAKE3-team/BLAKE3). The chunk hash determines its storage path (`objects/<first-2-hex>/<remaining-62-hex>`). A **manifest** records the ordered list of chunk hashes, file size, and permissions for each file.

Because chunking is content-defined (not offset-based), inserting or deleting bytes in the middle of a file shifts chunk boundaries only locally. The chunks before and after the edit remain identical and are deduplicated automatically.

### Merkle DAG

On top of the CAS sits a Merkle DAG with five node types:

| Node Type   | Identity Derived From                                    |
|-------------|----------------------------------------------------------|
| `File`      | Path + manifest hash                                     |
| `Dir`       | Path + sorted list of (child name, child hash) pairs     |
| `Package`   | Provider + name + version + artifact hash                |
| `BuildStep` | Command + input hashes + sorted environment variables    |
| `Env`       | Name + ordered list of child hashes (root of the DAG)    |

Each node's content hash is computed deterministically from its fields. A `Dir` node's hash incorporates all its children's hashes. An `Env` node (the root) incorporates everything. This means the root hash is a cryptographic summary of the entire environment state.

### Incremental Invalidation

When a file changes:

1. The file is re-chunked and re-hashed in the CAS.
2. A new `File` node is inserted into the DAG with the updated manifest hash.
3. The invalidation engine walks upward from the changed node, finding all parent nodes.
4. Each parent is rebuilt with the updated child hash, producing a new parent hash.
5. This continues until the root `Env` node is reached and rebuilt.
6. Sibling nodes that were not affected retain their original hashes and are not touched.

The result is a new DAG root that shares the vast majority of its structure with the previous root. Only the path from the changed leaf to the root is new.

### FUSE Projection

The environment filesystem is not a traditional directory tree on disk. Instead, it is a FUSE mount that resolves paths against the DAG and serves file content from the CAS on demand. An in-memory overlay captures any writes made by running processes, so the immutable DAG is never modified directly.

### Architecture Diagram

```
+---------------------------------------------------+
|                    hd (CLI)                        |
|  init | up | down | status | exec | ingest | ...  |
+--+----+----+------+--------+------+--------+------+
   |         |              |               |
   v         v              v               v
+------+ +--------+ +------------+ +-------------+
|  hd  | |   hd   | |     hd     | |     hd      |
| spec | | engine | |   sandbox  | |     oci      |
+--+---+ +--+-----+ +-----+------+ +------+------+
   |        |              |               |
   |        v              v               |
   |   +--------+   +----------+           |
   |   |   hd   |   |    hd    |           |
   |   |  mount |   |   watch  |           |
   |   +--+-----+   +----+-----+           |
   |      |              |                 |
   v      v              v                 v
 +--------------------------------------------+
 |              hd-cas (CAS)                   |
 |  ContentStore | Manifest | Chunk | GC       |
 +--------------------------------------------+
 |           On-disk storage (~/.hd/cas)       |
 |  objects/<shard>/<hash>  (zstd-compressed)  |
 |  manifests/<shard>/<hash>                   |
 |  refs/<shard>/<hash>     (ref counts)       |
 +---------------------------------------------+
```

---

## Quick Start

### Installation

```bash
# From crates.io (once published)
cargo install hd-cli

# From source
git clone https://github.com/omeedtehrani/hyperdocker.git
cd hyperdocker
cargo install --path crates/hd-cli
```

On macOS, install macFUSE first:
```bash
brew install macfuse
```

On Linux, install FUSE:
```bash
sudo apt-get install fuse3 libfuse3-dev   # Debian/Ubuntu
sudo dnf install fuse3 fuse3-devel        # Fedora
```

### Initialize a Project

```bash
cd your-project
hd init
```

This creates an `hd.toml` in the current directory with a starter template.

### Configure Your Environment

Edit `hd.toml` to describe your environment:

```toml
[environment]
name = "myapp"
base = "node:20-alpine"

[dependencies]
apt = ["curl", "git"]

[dependencies.npm]
file = "package.json"

[build]
steps = ["npm install", "npm run build"]
cache = ["node_modules"]

[services.web]
command = "npm run dev"
watch = ["src/**/*.ts", "src/**/*.tsx"]
port = 3000

[files]
include = ["src", "public", "package.json", "tsconfig.json"]
exclude = [".git", "node_modules/.cache", "*.log"]
```

### Start the Environment

```bash
hd up
```

This parses `hd.toml`, compiles it into a Merkle DAG, ingests your files into the CAS, and starts your services. File watching begins automatically.

### Migrate from Docker

If you have an existing Dockerfile, translate it:

```bash
hd ingest Dockerfile
```

This generates an `hd.toml` from the Dockerfile's `FROM`, `RUN`, and `CMD` instructions. Review and customize the output.

---

## Configuration Reference

The `hd.toml` file is the single source of truth for an environment. Here is a complete reference for every section.

### `[environment]` (required)

| Key    | Type   | Description                                        |
|--------|--------|----------------------------------------------------|
| `name` | string | Name of the environment. Used as the DAG root name.|
| `base` | string | Base OCI image reference (e.g., `ubuntu:22.04`).   |

```toml
[environment]
name = "myapp"
base = "ubuntu:22.04"
```

### `[dependencies]`

Declares system and language-level dependencies. Keys are provider names; values vary by format.

**Package list** -- install specific packages from a provider:
```toml
[dependencies]
apt = ["curl", "git", "build-essential"]
```

**Version string** -- install a specific version of a runtime:
```toml
[dependencies]
node = "20.x"
python = "3.11"
```

**File reference** -- resolve dependencies from a manifest file:
```toml
[dependencies.npm]
file = "package.json"

[dependencies.pip]
file = "requirements.txt"
```

### `[build]`

| Key     | Type         | Description                                              |
|---------|--------------|----------------------------------------------------------|
| `steps` | list[string] | Ordered build commands. Each becomes a `BuildStep` node. |
| `cache` | list[string] | Directories to preserve across rebuilds.                 |

```toml
[build]
steps = [
    "npm install",
    "npm run build",
]
cache = ["node_modules", "dist"]
```

Build steps are chained: each step's DAG node includes the hash of the previous step as an input. If a step's inputs have not changed, it is skipped entirely.

### `[services.<name>]`

Define long-running processes. Each service has its own configuration block.

| Key              | Type         | Default    | Description                                     |
|------------------|--------------|------------|-------------------------------------------------|
| `command`        | string       | (required) | The command to run.                             |
| `watch`          | list[string] | `[]`       | Glob patterns. Service restarts when matched files change. |
| `port`           | integer      | (none)     | Port the service listens on.                    |
| `depends_on`     | list[string] | `[]`       | Services that must start before this one.       |
| `restart_policy` | string       | `"always"` | One of `always`, `on_failure`, `never`.         |

```toml
[services.web]
command = "npm run dev"
watch = ["src/**/*.ts", "src/**/*.tsx"]
port = 3000

[services.worker]
command = "node worker.js"
watch = ["worker.js", "lib/**"]
depends_on = ["web"]
restart_policy = "on_failure"
```

Services are started in topological order (respecting `depends_on`) and stopped in reverse order.

### `[files]`

Controls which files are tracked by the file watcher and ingested into the CAS.

| Key       | Type         | Default | Description                                   |
|-----------|--------------|---------|-----------------------------------------------|
| `include` | list[string] | `[]`    | Path prefixes to watch. Empty means all.      |
| `exclude` | list[string] | `[]`    | Glob patterns to exclude. `*.log`, `.git`, etc.|

```toml
[files]
include = ["src", "public", "package.json", "tsconfig.json"]
exclude = [".git", "node_modules/.cache", "*.log"]
```

The following paths are always excluded by default: `.git`, `node_modules/.cache`, `.DS_Store`, `target`.

### `[options]`

| Key             | Type   | Default | Description                                       |
|-----------------|--------|---------|---------------------------------------------------|
| `restart_grace` | string | `"5s"`  | Grace period before force-killing a restarting service. |

```toml
[options]
restart_grace = "5s"
```

---

## CLI Reference

The `hd` binary provides all commands for managing hyperdocker environments.

### `hd init`

Create a new `hd.toml` in the current directory with a starter template.

```bash
hd init
```

Fails if `hd.toml` already exists.

### `hd up`

Parse `hd.toml`, compile the environment into a Merkle DAG, and start all services.

```bash
hd up
```

This command:
1. Reads and validates `hd.toml`.
2. Opens (or creates) the CAS at `~/.hd/cas`.
3. Resolves dependencies via registered providers.
4. Compiles the spec into a DAG and prints the root hash.
5. Registers a GC reference for the new root.

### `hd down`

Stop the running environment and all its services.

```bash
hd down
```

### `hd status`

Show the current environment configuration and service states.

```bash
hd status
```

Output includes the environment name, base image, and each service's command and watch patterns.

### `hd exec <command> [args...]`

Run a command inside the environment context.

```bash
hd exec npm test
hd exec python -m pytest
hd exec sh -c "echo hello"
```

### `hd ingest <dockerfile-path>`

Translate a Dockerfile into an `hd.toml`.

```bash
hd ingest Dockerfile
```

Parses `FROM`, `RUN`, and `CMD` instructions. `COPY`, `WORKDIR`, and `ENV` are noted but may require manual adjustment in the generated `hd.toml`.

### `hd lock`

Resolve all dependencies and write `hd.lock`.

```bash
hd lock
```

The lockfile records each dependency's provider, name, exact version, and artifact hash. It is sorted deterministically so that identical dependency sets always produce identical lockfiles.

### `hd dag show`

Print the Merkle DAG tree for the current environment.

```bash
hd dag show
```

Example output:
```
DAG root: a1b2c3d4e5f6...
Env(myapp)
  Pkg(oci/node:20-alpine latest)
  Pkg(npm/express 4.18.2)
  Build(npm install)
  Build(npm run build)
```

### `hd cas stats`

Show storage statistics for the content-addressed store.

```bash
hd cas stats
```

Output:
```
CAS Statistics:
  Chunks: 1,247
  Manifests: 83
```

### `hd cas gc`

Run garbage collection on the CAS. Removes unreferenced manifests and chunks.

```bash
hd cas gc
```

Output:
```
Garbage collection complete:
  Manifests removed: 12
  Chunks removed: 94
```

---

## Architecture

Hyperdocker is organized as a Cargo workspace with eight crates. Each crate has a single responsibility and well-defined boundaries.

### Crate Dependency Graph

```
                        hd-cli
                       /  |   \     \
                      /   |    \     \
                     v    v     v     v
                hd-spec  hd-mount  hd-sandbox  hd-oci
                 / |       / |        |           |  \
                /  |      /  |        |           |   \
               v   v     v   v        v           v    v
          hd-engine    hd-engine   hd-spec     hd-cas  hd-spec
              |           |                       |
              v           v                       |
           hd-cas      hd-cas                     |
                                                  |
                        hd-watch                  |
                       /   |   \                  |
                      v    v    v                 |
                 hd-cas hd-engine hd-spec         |
                    |                             |
                    +-----------------------------+
```

### Crate Descriptions

| Crate          | Purpose                                                                                 |
|----------------|-----------------------------------------------------------------------------------------|
| **hd-cas**     | Content-addressed store. BLAKE3 hashing, FastCDC chunking, zstd compression, manifests, sharded on-disk layout, reference-counting garbage collector. The foundation everything else builds on. |
| **hd-engine**  | Merkle DAG engine. Defines the five node types (`File`, `Dir`, `Package`, `BuildStep`, `Env`), the in-memory DAG with CAS persistence, bottom-up invalidation, and DAG diffing. |
| **hd-spec**    | Configuration layer. Parses `hd.toml` into `EnvSpec`, validates service dependency graphs (cycle detection), compiles specs into DAGs via the provider registry, and manages the `hd.lock` lockfile. |
| **hd-mount**   | Filesystem projection. `ProjectedFs` resolves paths against the DAG and serves content from the CAS. `Overlay` captures writes. `FuseFs` bridges to the FUSE kernel interface. `MountManager` tracks mount lifecycle. |
| **hd-watch**   | File watching. Uses `notify` with configurable poll intervals, `PathFilter` for include/exclude rules (with hardcoded defaults for `.git`, `target`, etc.), `Debouncer` for coalescing rapid changes, and `PathMap` for bidirectional path-to-hash lookups. |
| **hd-sandbox** | Process management. `ManagedProcess` wraps `std::process::Child` with lifecycle control. `Service` adds watch-pattern matching and restart policies. `Sandbox` orchestrates multiple services with topological ordering. |
| **hd-oci**     | OCI/Docker interop. Parses image references (Docker Hub, GHCR, private registries), unpacks tar/tar+gzip layers into the CAS, and translates Dockerfiles into `EnvSpec`. |
| **hd-cli**     | The `hd` binary. Clap-based CLI with subcommands for `init`, `up`, `down`, `status`, `exec`, `ingest`, `lock`, `dag show`, `cas stats`, and `cas gc`. |

### Key Dependencies

| Dependency | Version | Role                                              |
|------------|---------|---------------------------------------------------|
| blake3     | 1.6     | Cryptographic hashing (256-bit, ~3x faster than SHA-256) |
| fastcdc    | 3.1     | Content-defined chunking for deduplication         |
| zstd       | 0.13    | Transparent chunk compression                      |
| fuser      | 0.15    | FUSE filesystem in userspace                       |
| notify     | 7.0     | Cross-platform filesystem event watching           |
| clap       | 4.5     | CLI argument parsing with derive macros            |
| serde      | 1.0     | Serialization for DAG nodes, manifests, specs      |
| bincode    | 2.0     | Compact binary serialization for CAS objects       |
| rayon      | 1.10    | Data parallelism for chunking and hashing          |
| nix        | 0.29    | Unix signal handling for process management        |
| reqwest    | 0.12    | HTTP client for OCI registry communication         |
| tar/flate2 | 0.4/1.0 | OCI layer unpacking                                |

---

## Performance

Hyperdocker is designed around four principles that make it fast where Docker is slow.

### 1. Content-Defined Chunking

Files are split into variable-size chunks using FastCDC, not at fixed byte offsets. This means that inserting or removing bytes in the middle of a file only affects the chunks immediately surrounding the edit. All other chunks remain identical and are deduplicated automatically.

In practice, editing a single function in a 500 KB source file typically affects 1-2 chunks (16-32 KB). The other ~30 chunks are unchanged and already in the store.

### 2. Bottom-Up Invalidation

Docker invalidates top-down: if layer N changes, layers N+1 through the end are all re-executed. Hyperdocker invalidates bottom-up: only the ancestors of a changed node are rebuilt, and "rebuilt" means rehashing -- not re-executing commands.

Consider an environment with 500 source files, 200 npm packages, and 3 build steps. Editing one source file in Docker might trigger a full `npm install` (30-60 seconds). In hyperdocker, it triggers a rehash of the file node, its parent directory node, and the root env node -- a sub-second operation.

### 3. Lazy Materialization

The FUSE-projected filesystem does not extract the entire environment to disk. Files are served on-demand from the CAS when a process reads them. If your application only touches 50 of 500 source files during a test run, only those 50 files are ever read from the store. The rest exist as DAG nodes but are never materialized.

### 4. Cross-File and Cross-Version Deduplication

Because the CAS is content-addressed at the chunk level, deduplication happens automatically:

- Two copies of the same library in `node_modules` share chunks.
- Successive versions of a file that differ by a few lines share most chunks.
- Multiple environments using the same base image share all base-image chunks.

This means disk usage grows proportionally to unique content, not to the number of environments or versions.

---

## Comparison with Docker

| Dimension                     | Docker                                    | Hyperdocker                              |
|-------------------------------|-------------------------------------------|------------------------------------------|
| **Unit of caching**           | Ordered filesystem layer                  | Content-addressed chunk                  |
| **Invalidation direction**    | Top-down (layer N invalidates N+1..end)   | Bottom-up (only ancestors of changed node) |
| **Invalidation granularity**  | Entire layer                              | Individual file                          |
| **Deduplication scope**       | Identical layers across images            | Identical chunks across all files        |
| **Rebuild after 1-file edit** | Re-run all layers after the changed layer | Rehash one file + ancestors (~ms)        |
| **Filesystem model**          | Copy-on-write overlay (overlayfs)         | FUSE projection from DAG + overlay       |
| **File watching**             | Not built-in (requires polling or tools)  | Built-in with debouncing and filtering   |
| **Service management**        | `docker-compose` (separate tool)          | Built into `hd.toml`                     |
| **Configuration**             | Dockerfile + docker-compose.yml           | Single `hd.toml`                         |
| **Compression**               | Layer-level (gzip/zstd)                   | Chunk-level (zstd, transparent)          |
| **Garbage collection**        | `docker system prune`                     | Reference-counting GC (`hd cas gc`)      |
| **Hash algorithm**            | SHA-256                                   | BLAKE3 (3-4x faster)                     |
| **Implementation language**   | Go                                        | Rust                                     |

---

## Comparison with Alternatives

### Nix

Nix pioneered content-addressed package management and reproducible builds. Hyperdocker borrows the content-addressing concept but differs in scope and approach:

- **Nix** is a full package manager and build system with its own functional language. Hyperdocker is a container runtime -- it manages environments, not package builds.
- **Nix** hashes derivation inputs. Hyperdocker hashes file content directly (content-addressed, not input-addressed).
- **Nix** has a steep learning curve (the Nix language). Hyperdocker uses a simple TOML file.
- **Nix** does not include file watching, service management, or FUSE projection.

### Bazel

Bazel is a build system with content-addressed caching and remote execution.

- **Bazel** focuses on build artifact caching across a monorepo. Hyperdocker focuses on development environment lifecycle.
- **Bazel** requires BUILD files and a complex rule system. Hyperdocker requires a single `hd.toml`.
- **Bazel** does not manage running services or provide a container-like filesystem.
- **Bazel's** remote cache is analogous to a distributed CAS, which hyperdocker plans for v2.

### Devbox (by Jetify)

Devbox wraps Nix to provide a simpler developer experience.

- **Devbox** focuses on dependency isolation via Nix packages. Hyperdocker provides a full environment runtime with file watching and service management.
- **Devbox** does not chunk or deduplicate your source files.
- **Devbox** does not provide incremental invalidation of build steps.

### Dev Containers

Dev Containers (VS Code) use Docker under the hood and inherit all of Docker's layer-based limitations.

- **Dev Containers** are tied to VS Code and Docker. Hyperdocker is editor-agnostic and Docker-free.
- **Dev Containers** rebuild via Dockerfile. Hyperdocker rebuilds incrementally via DAG invalidation.
- **Dev Containers** do not provide built-in file watching or service management.

### OrbStack

OrbStack is a fast Docker Desktop replacement for macOS.

- **OrbStack** optimizes Docker's execution (faster VM, better I/O). Hyperdocker replaces Docker's model entirely.
- **OrbStack** still uses layers. A one-file change still invalidates downstream layers.
- **OrbStack** is macOS-only and closed source. Hyperdocker is cross-platform and MIT-licensed.

---

## Roadmap

### v1 (Current)

- [x] Content-addressed store with BLAKE3 + FastCDC + zstd
- [x] Merkle DAG with five node types
- [x] Bottom-up invalidation engine
- [x] DAG diffing
- [x] `hd.toml` spec parser with validation
- [x] Dependency provider registry (trait-based, extensible)
- [x] Deterministic lockfile (`hd.lock`)
- [x] Spec-to-DAG compiler
- [x] FUSE filesystem projection with overlay
- [x] File watcher with include/exclude filtering and debouncing
- [x] Service management with topological ordering and watch-based restart
- [x] OCI image reference parsing and layer unpacking
- [x] Dockerfile-to-`hd.toml` translation
- [x] Reference-counting garbage collection
- [x] CLI: `init`, `up`, `down`, `status`, `exec`, `ingest`, `lock`, `dag show`, `cas stats`, `cas gc`

### v2 (Planned)

- **Distributed CAS** -- Share the content-addressed store across machines. Push/pull chunks to a remote store (S3, GCS, or a dedicated server). Team-wide deduplication.
- **Docker socket shim** -- Expose a Docker-compatible API so that tools expecting `docker build` and `docker run` can use hyperdocker transparently.
- **Language-aware reloaders** -- Instead of restarting a service on file change, inject the changed module at runtime (hot module replacement for Node.js, hot reload for Go/Rust).
- **Checkpoint/restore** -- Snapshot a running environment's state (processes, memory, network) and restore it instantly on another machine using CRIU.
- **Programmable API** -- Expose the DAG, CAS, and invalidation engine as a Rust library with stable API for building custom tooling on top.
- **Built-in dependency providers** -- Ship providers for apt, npm, pip, cargo, and brew out of the box, with automatic resolution and CAS ingestion.
- **Parallel build steps** -- Execute independent build steps concurrently when the DAG shows no data dependencies between them.
- **Remote execution** -- Run build steps on remote machines and pull only the output artifacts into the local CAS.
- **Layer-compatible export** -- Export a hyperdocker environment as an OCI image for deployment to Kubernetes or any container runtime.

---

## Contributing

### Prerequisites

- Rust 1.75+ (2021 edition)
- macFUSE (macOS) or FUSE3 (Linux)

### Building from Source

```bash
git clone https://github.com/omeedtehrani/hyperdocker.git
cd hyperdocker
cargo build
```

The `hd` binary is built to `target/debug/hd`.

### Running Tests

```bash
# Run all tests
cargo test --workspace

# Run tests for a specific crate
cargo test -p hd-cas
cargo test -p hd-engine
cargo test -p hd-spec
cargo test -p hd-mount
cargo test -p hd-watch
cargo test -p hd-sandbox
cargo test -p hd-oci
cargo test -p hd-cli
```

Note: `hd-watch` tests involve filesystem polling and include short sleeps. They may be flaky on very slow CI runners. `hd-mount` FUSE tests require FUSE privileges and are skipped in environments without FUSE support.

### Project Structure

```
hyperdocker/
  Cargo.toml              # Workspace root
  Cargo.lock
  crates/
    hd-cas/               # Content-addressed store
      src/
        hash.rs           # BLAKE3 ContentHash type
        chunk.rs          # FastCDC content-defined chunking
        manifest.rs       # File manifest (chunk list + metadata)
        store.rs          # On-disk store with sharding and compression
        gc.rs             # Reference-counting garbage collector
    hd-engine/            # Merkle DAG engine
      src/
        node.rs           # Five DAG node types
        dag.rs            # In-memory DAG with CAS persistence
        invalidation.rs   # Bottom-up invalidation algorithm
        diff.rs           # DAG diffing (added/removed/changed)
    hd-spec/              # Configuration and compilation
      src/
        spec.rs           # hd.toml parser and validator
        provider.rs       # Dependency provider trait and registry
        compiler.rs       # Spec-to-DAG compiler
        lockfile.rs       # hd.lock serialization
    hd-mount/             # Filesystem projection
      src/
        projected.rs      # DAG-backed virtual filesystem
        overlay.rs        # In-memory write overlay
        fuse.rs           # FUSE filesystem adapter
        manager.rs        # Mount lifecycle management
    hd-watch/             # File watching
      src/
        watcher.rs        # notify-based recursive file watcher
        filter.rs         # Include/exclude path filtering
        debounce.rs       # Event coalescing
        pathmap.rs        # Bidirectional path <-> hash mapping
    hd-sandbox/           # Process management
      src/
        process.rs        # Managed child process wrapper
        service.rs        # Service with watch patterns and restart
        sandbox.rs        # Multi-service orchestrator
    hd-oci/               # OCI/Docker interop
      src/
        registry.rs       # Image reference parsing
        unpack.rs         # Tar/tar+gzip layer unpacking into CAS
        dockerfile.rs     # Dockerfile to hd.toml translation
    hd-cli/               # CLI binary
      src/
        main.rs           # Clap CLI definition
        commands/          # One module per subcommand
          init.rs
          up.rs
          down.rs
          status.rs
          exec.rs
          ingest.rs
          lock.rs
          dag.rs
          cas.rs
```

---

## License

MIT -- see [LICENSE](LICENSE) for details.