vsf 0.1.13

Versatile Storage Format
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
<p align="center">
  <img src="vsf.png" alt="VSF Logo" width="400"/>
</p>

# VSF (Versatile Storage Format)

A self-describing binary format designed for optimal integer encoding, mathematical correctness, and type safety.

VSF addresses a fundamental challenge in binary formats: how to efficiently encode integers of any size while maintaining O(1) skip-ability. The solution enables efficient storage of everything from a single photon's wavelength to the number of atoms in the observable universe (and yes, both fit comfortably).

---

## Core Innovation: Exponential-Width Integer Encoding

Most binary formats face a tradeoff when encoding integers:

**Fixed-width approach** (TIFF, PNG, HDF5):
- Fast to parse (known size)
- Wastes space on small values
- Hard limits (4GB for u32, etc.)

**Variable-width approach** (Protobuf, MessagePack):
- Compact encoding (7 bits per byte, continuation bit in MSB)
- **Cannot skip** - must read every byte to find the end (O(n) parse cost)
- Caps at 64 bits (Protobuf stops at 2^64-1, MessagePack at 2^64-1)
- Can't encode "Planck volumes in observable universe" (~10^185)

**VSF's solution** - Exponential-width with explicit size markers:

```
Value 42:                'u' '3' 0x2A              (2 decimal digits)
Value 4,096:             'u' '4' 0x10 0x00         (4 digits)
Value 2^32-1:            'u' '5' + 4 bytes         (10 digits)
Value 2^64-1:            'u' '6' + 8 bytes         (20 digits)
Value 2^128-1:           'u' '7' + 16 bytes        (39 digits)
Value 2^256-1:           'u' '8' + 32 bytes        (78 digits)
RSA-16384 prime:         'u' 'D' + 2048 bytes      (4932 digits)
Actual max:              'u' 'Z' + 8 GB            (~20 billion digits)
```

**Properties:**
- ✅ O(1) skip - read single byte exponent, immediately skip that number of bytes
- ✅ Optimal size - automatically selects minimal encoding
- ✅ No hard limits - can encode all arbitrarily large values (assuming you have the storage)
- ✅ 40-70% space savings vs fixed-width on typical data

---

## Why VSF Has Literally No Limits (Unlike Every Other Format)

### The Universal Integer Encoding Problem

Every binary format faces this question: **"How do you encode a number when you don't know how big it will be?"**

Until VSF, every format in existence picked one of these three bad answers:

**Answer 0: "We'll use fixed sizes"** (TIFF, PNG, HDF5)
- Store everything as u32 or u64
- **Problem**: Hits hard limits (4GB for u32) and wastes space for small numbers

**Answer 1: "We'll use continuation bits"** (Protobuf, MessagePack)
- 7 bits per byte, MSB indicates "more bytes follow"
- **Problem**: Must read every byte to find the end (literally cannot skip), hard cap at 64 bits for native integers

**Answer 2: "We'll store the length first"** (Most TLV formats)
- Store length as u32, then data
- **Problem**: Length field itself has a limit! Recursion required for bigger lengths, small numbers waste space

### VSF's Answer: Exponential Width Encoding (EWE)

VSF introduces **Exponential Width Encoding (EWE)** - a novel byte-aligned scheme where ASCII markers map directly to exponential size classes:

```
How it works:
0. Type: 'u' (unsigned), 'i' (signed), etc.
1. Size: ASCII character '0'-'Z'
2. Data: Exactly 2^(ASCII) bits follow
3. 0=bool, 3=8 bits, 4=16 bits, 5=32 bits, 6=64 bits, ..., Z=2^36 bits (8 GB)

Example: 'u' '5' [0x01234567]
          │   │  └─ Data (2^5 bits = 32 bits = 4 bytes)
          │   └─ Size class marker
          └─ Type marker

Result: O(1) seekability + unbounded integers
```

**Why this works:**

Every number can be represented as `mantissa × 2^exponent`:
- **Small numbers** → small exponents → small markers ('3', '4')
- **Large numbers** → large exponents → large markers ('D', 'Z')
- **The ASCII marker IS the exponent** (directly encoded, no recursion needed)

**Novel properties of EWE:**
- **Byte-aligned** - no bit-shifting, works with standard I/O
- **O(1) seekability** - read one marker (two bytes), know exact size
- **ASCII-readable** - markers are printable characters for debugging
- **Unbounded** - bool to 8 GB (that's a HUGE number!)

### Overhead Analysis: From Tiny to Googolplex

Let's look at what it costs to encode numbers of different magnitudes:
```
Value 42:           2 bytes overhead + 1 byte data = 3 bytes total
Value 2^64-1:       2 bytes overhead + 8 bytes data = 10 bytes total
RSA-16384 prime:    2 bytes overhead + 2048 bytes = 2050 bytes total
```

**The overhead stays negligible even for numbers larger than the universe.**

### Comparison: What CAN'T Other Formats Handle?

Here are real-world numbers that **break** other formats but VSF handles trivially:

#### Protobuf/MessagePack: Caps at 2^64-1
```
❌ Planck volumes in observable universe: ~10^185
   (Needs 185 bits, Protobuf stops at 64)

✅ VSF: 'u' 'B' + 23 bytes = 25 bytes total
```

#### JSON: Precision loss above 2^53
```
❌ Cryptographic keys (RSA-16384 = 2048 bytes)
   JSON can't represent integers > 2^53 exactly

✅ VSF: 'u' 'D' + 2048 bytes = 2050 bytes
```

#### HDF5: 64-bit everywhere!
```
❌ Storing 1 million boolean flags as u64
   Wastes 8 MB instead of 125 KB

✅ VSF bitpacked: 125KB (1000x smaller)
```

### Theoretical Limits: Universe Runs Out First

With marker 'Z' (ASCII 90), VSF can encode:
```
2^(2^36) = 2^68,719,476,736 possible values

That's a memory address with ~20.7 billion digits!

For context:
- Atoms in universe: ~10^80 (needs 266 bits)
- Planck volumes in universe: ~10^185 (needs 615 bits)

VSF handles all of these with **two bytes of overhead.**
```

**You will run out of storage, memory, and life WAY before VSF hits any limits.**

### Why This Matters: Future-Proof Architecture

Today's "unreasonably large" is tomorrow's "barely sufficient":

**1970s**: "640KB ought to be enough for anybody"
**1990s**: "Why would anyone need more than 4GB?" (u32 addresses)
**2010s**: "2^64 is effectively infinite" (IPv6, filesystems)
**2020s**: Quantum computing, cosmological simulations, genomic databases hitting 2^64 limits

**VSF's design principle**: Stop predicting the future. Build a format that **mathematically cannot** impose artificial limits.

### The Core Innovation

VSF is the only format that combines:
- **Optimal space efficiency** (no wasted bits on small numbers)
- **Arbitrary size support** (no maximum value)
- **O(1) seekability** (know size without parsing)
- **Byte-aligned** (no bit-shifting overhead)

This is possible because I solved the fundamental problem: **How do you easily encode the exponent of arbitrarily large numbers?**

Answer: **Directly**, using ASCII characters as exponential size class markers (Exponential Width Encoding).

Every other format either:
0. Uses fixed exponents (hits limits, wastes space on small numbers), or
1. Uses variable exponents but can't encode their length efficiently (not seekable), or
2. Doesn't try at all (caps at 64 bits)

---

## Type Safety Thru Exhaustive Pattern Matching

VSF is written entirely in Rust with zero wildcards in all match statements:

```rust
match self {
    VsfType::u0(value) => encode_bool(value),
    VsfType::u3(value) => encode_u8(value),
    VsfType::u4(value) => encode_u16(value),
    // ... 208 more explicit cases ...
    VsfType::p(tensor) => encode_bitpacked(tensor),
}
// No _ => I forgot?
```

**Why this matters:**
- Add a type? Won't compile until handled everywhere
- Remove a type? Compiler shows all affected code
- Refactor? Guided thru every impact
- Ship unhandled cases? Not possible

**Why Rust specifically?** It's the only language that gives you proven:
- Memory safety **without** garbage collection
- Thread safety **without** runtime checks
- Zero-cost abstractions (no interpreter, no VM, no GC pauses)
- **All enforced at compile time**

This isn't possible in any other language:
- C/C++: Manual memory (use-after-free, double-free, null pointers)
- Java/C#/Go/Python/JS: Garbage collection (pauses, unpredictability)
- Everything else: Pick your poison ☠️

**Ways VSF can break:**
0. **Cosmic rays** (hardware bit flips) → Use ECC RAM, hash check will fail
1. **Python FFI** → Just don't. Spirix bans it anyway
2. **You modify the code** → Compiler catches it before you ship

That's it. Those are the **only** ways VSF breaks. Everything else is systematically impossible! Cool eh?

---

## What VSF Enables

### 0. Efficient Bitpacked Tensors

Camera RAW data, scientific sensors, and ML models often use non-standard bit depths:

```rust
// 12-bit camera RAW (common in photography)
BitPackedTensor {
    shape: vec![4096, 3072],  // 12.6 megapixels
    bit_depth: 12,
    data: packed_bytes,
}
// 18 MB vs 24 MB as a sixteen bit array
```

Supports 1-256 bits per element efficiently.

### 1. Cryptographic Primitives as Types

Hashes, signatures, and keys are first-class types:

```rust
VsfType::a(algorithm, mac_tag)    // Message Authentication Code
VsfType::h(algorithm, hash)       // Hash (BLAKE3, SHA-256, etc.)
VsfType::g(algorithm, signature)  // Signature (Ed25519, ECDSA, RSA)
VsfType::k(algorithm, pubkey)     // Public key
```

### 2. Mathematically Correct Arithmetic (Spirix Integration)

VSF natively supports Spirix - two's complement floating-point that legitimately preserves mathematical identities:

```rust
VsfType::s53(spirix_scalar)  // 32-bit fraction, 8-bit exponent Scalar (F5E3)
VsfType::c64(spirix_circle)  // 64-bit fractions, 16-bit exponent Circle (complex numbers!)
```

**Why Spirix exists:** IEEE-754 breaks fundamental math:
- NaN (wat)
- Two different zeros: +0 and -0 (but +0 == -0 returns true?!)
- Very small numbers underflow to zero, breaking *a × b = 0 iff a = 0 or b = 0*
- Infinity from overflow, not just division by zero
- Sign-magnitude representation requires special-case branching EVERYWHERE!

**What Spirix fixes:**
- **One Zero.** Not two. Just one. I don't remember there ever being two zeros in math class?
- **One Infinity.** Reserved for actual mathematical singularities (like 1/0), not overflow
- **Vanished values** - numbers too small to represent normally but **not zero** (preserves sign/orientation)
- **Exploded values** - numbers too large to represent but **not infinite** (preserves sign/orientation)
- **Two's complement thruout** - no sign bit shenanigans, no special cases
- **a × b = 0 iff a = 0 or b = 0** - all the time, every time, 100% of the time
- **Customizable precision AND range** - pick your fraction and exponent sizes independently! (F3E3 to F7E7)

**Undefined states that actually tell you what went wrong:**
Instead of IEEE's generic NaN, Spirix tracks *why* something became undefined:
- `[℘ ⬆+⬆]` - You added two exploded values (whoops!)
- `[℘ ⬇/⬇]` - You divided two vanished values
- `[℘ ⬆×⬇]` - Multiplied infinity by Zero?
- Dozens more - your debugger will thank you!

VSF stores all 25 Scalar types (F3-F7 × E3-E7) and 25 Circle types as first-class primitives.

### 3. Geographic Precision (Dymaxion WorldCoord)

Store Earth coordinates with millimeter precision:

```rust
VsfType::w(WorldCoord::from_lat_lon(47.6062, -122.3321))
```

Uses Fuller's Dymaxion projection - 2.14mm precision in 8 bytes.

### 4. Huffman-Compressed Text

Unicode strings with global frequency table:

```rust
VsfType::x(text)  // Automatically compressed
// ~36% compression on English text
// 83 MB/s encode, 100+ MB/s decode 2025 average CPU
```

---

## Comparison with Other Formats

### TIFF
- **Strength**: Widely supported, good for images
- **Limitation**: 4GB file limit (u32 offsets), 12 bytes minimum overhead per tag
- **VSF approach**: Variable-width encoding, no size limits

### PNG
- **Strength**: Lossless compression, ubiquitous
- **Limitation**: 12 bytes per chunk overhead, u32 length limits
- **VSF approach**: Minimal overhead per field, arbitrary sizes

### HDF5
- **Strength**: Hierarchical data, scientific community adoption
- **Limitation**: Complex spec, u64 everywhere wastes space
- **VSF approach**: Optimal size selection, simpler spec

### Protobuf
- **Strength**: Cross-language, schema evolution
- **Limitation**: Varint requires sequential parsing (O(n) skip)
- **VSF approach**: O(1) skip with explicit size markers

### JSON
- **Strength**: Human-readable, debuggable, universal
- **Limitation**: Text encoding bloat, precision loss, no binary data
- **VSF approach**: Binary format, full precision, efficient

---

### Working Now

✅ **Pretty damn complete type system** - 211 variants:
- Primitives: u3-u7 (8-128 bit), i3-i7 (signed), f5-f6 (float), j5-j6 (complex)
- Spirix: 50 types (25 Scalar + 25 Circle varieties)
- Tensors: 130 flavors (65 contiguous + 65 strided)
- Bitpacked: 1-256 bit bins
- Metadata: strings, time, hashes, signatures, keys, MACs

✅ **Encoding/decoding**
- Full round-trip validation
- Variable-length integer encoding
- Big-endian byte order

✅ **Huffman text compression**
- Global Unicode frequency table
- 36% compression typical, not just English
- Low overhead

✅ **Cryptographic support**
- Hash algorithms: BLAKE3 (default), SHA-256, SHA-512
- Signatures: Ed25519, ECDSA-P256, RSA-2048
- Keys: Ed25519, X25519, P-256, RSA-2048
- MACs: HMAC-SHA256/512, Poly1305, BLAKE3-keyed, CMAC
- **Mandatory file hash integrity** - automatic on every build

✅ **Camera RAW builders**
- Extensive metadata support (CFA pattern, black/white levels, etc.)
- Calibration frame hashes with algorithm IDs
- Camera settings (ISO, shutter, aperture, focal length)
- Lens metadata (make, model, focal range, aperture range)

✅ **Builder pattern with dot notation**
- Ergonomic field access: `raw.camera.iso_speed = Some(800.0)`
- Nested builders for organized metadata
- Already implemented and working

✅ **Zero-copy mmap support**
- BitPackedTensor data is raw bytes after header
- Parse `'p' [bit_depth] [ndim] [shapes...]` then mmap the data
- No "unboxed sections" needed - bulk data types are already mmap-able

✅ **Hierarchical field names**
- Section names support dots: `"camera.sensor"`, `"raw.calibration"`, etc.
- Validation enforces clean syntax (no leading/trailing dots, no double dots)
- Already implemented - use `builder.add_section("camera.sensor", items)`

✅ **Type-safe schema system**
- Pattern-based TypeConstraint validation (no type system duplication)
- Named field encoding with d-type keys: `(d"field_name":value)`
- Automatic type conversion via IntoVsfType/FromVsfType traits
- Parse → modify → re-encode workflow
- Official schemas: image, camera, audio, network_peer, announce
- See [schema/README.md]src/schema/README.md for examples

### Coming Next (v0.2.0)

🚧 **Structured capability tokens** - Formal capability types built on existing crypto primitives (`g`, `k`, `h`, `a`)

**Note on File I/O:** VSF gives you bytes - do whatever you want with them:
```rust
let bytes = encode(&my_data)?;
std::fs::write("data.vsf", &bytes)?;  // Or network, database, embedded, etc.
```
File I/O is intentionally out of scope - you know your use case better than we do. Network streaming? Memory-mapped regions? SQLite blobs? Custom compression? VSF doesn't make opinions about your storage layer

---

## Quick Start

```rust
use vsf::{VsfType, BitPackedTensor, Tensor};

// Store 12-bit camera RAW
let raw = BitPackedTensor::pack(12, vec![4096, 3072], &pixel_data);
let encoded = VsfType::p(raw).flatten();

// Store a tensor (8-bit grayscale image)
let tensor = Tensor::new(vec![1920, 1080], grayscale_data);
let img = VsfType::t_u3(tensor);

// Store text (automatically Huffman compressed)
let doc = VsfType::x("Hello, world!".to_string());

// Store a hash (BLAKE3)
use vsf::crypto_algorithms::HASH_BLAKE3;
let hash = VsfType::h(HASH3, hash_bytes);

// Round-trip
let decoded = VsfType::parse(&encoded)?;
assert_eq!(original, decoded);
```

### Minimal Camera RAW

```rust
use vsf::builders::build_raw_image;
use vsf::types::BitPackedTensor;

// Just image data - no metadata
let samples: Vec<u64> = vec![2048; 4096 * 3072]; // 12-bit, mid-gray
let image = BitPackedTensor::pack(12, vec![4096, 3072], &samples);

let bytes = build_raw_image(image, None, None, None)?;
// That's it! File includes mandatory BLAKE3 hash automatically
```

### Camera RAW with Full Metadata

```rust
use vsf::builders::build_raw_image;
use vsf::types::BitPackedTensor;
use vsf::crypto_algorithms::HASH_BLAKE3;

// Create image with metadata
let samples: Vec<u64> = vec![2048; 4096 * 3072];
let image = BitPackedTensor::pack(12, vec![4096, 3072], &samples);

let bytes = build_raw_image(
    image, // Only required field
    Some(RawMetadata {
        cfa_pattern: Some(vec![b'R', b'G', b'G', b'B']),
        black_level: Some(64.),
        white_level: Some(4095.),
        // Calibration frame hashes (algorithm + bytes)
        dark_frame_hash: Some((HASH_BLAKE3, dark_hash)),
        flat_field_hash: Some((HASH_BLAKE3, flat_hash)),
        // ...
    }),
    Some(CameraSettings {
        iso_speed: Some(800.),
        shutter_time_s: Some(1./60.),
        aperture_f_number: Some(2.8),
        focal_length_m: Some(0.024),  // 24mm in meters
        // ...
    }),
    Some(LensInfo { /* lens details */ }),
)?;
// File hash computed automatically - no additional steps needed!
```

---

## Data Provenance & Verification

VSF treats integrity verification and data provenance as architectural requirements, not optional add-ons. While other formats bolt on checksums as an afterthought (or skip them entirely), VSF makes verification impossible to ignore.

### Automatic File Integrity

**Every VSF file includes mandatory BLAKE3 verification** - no exceptions, no opt-out:

```rust
// Just build - hash is computed automatically
let bytes = builder.build()?;

// Verify integrity later
verify_file_hash(&bytes)?;  // Returns Ok(()) or Err("corruption detected")
```

**How it works:**
0. Header contains `hb3[32][hash]` placeholder covering entire file
1. `build()` automatically computes BLAKE3 over the complete file (using zero-out procedure)
2. Hash written into placeholder position atomically
3. Parser expects hash field - files without it are invalid VSF

**Why this matters:**
- Can't accidentally ship unverifiable files (hash is mandatory)
- Can't strip verification without breaking the format
- Corruption detected immediately on parse
- Zero-overhead verification (hash computed once during build)

**Performance: BLAKE3 is essentially free**

"But won't hashing everything slow down my writes?" Nope!

BLAKE3 throughput on modern hardware:
- **~3-7 GB/s single-threaded** (faster than most SSDs)
- **~10-15 GB/s multi-threaded** (saturates NVMe drives)
- **SIMD-optimized** (AVX2/AVX-512 on x86, NEON on ARM)

For context, typical hardware limits:
- Consumer SSD: ~500 MB/s (SATA) to ~3 GB/s (NVMe Gen3)
- Enterprise NVMe: ~7 GB/s (Gen4)

**BLAKE3 is faster than your storage.** The hash computation happens while you're waiting for the disk write anyway - literally zero added latency in most cases.

This is similar to Rust's bounds checking: "But won't array bounds checks slow me down?" In practice, the optimizer eliminates most checks, and the remaining ones are drowned out by cache misses. Safety first, performance second - and you get both anyway.

Traditional formats like TIFF, PNG, and HDF5 make integrity checks optional, if even supported. VSF makes them unavoidable.

### Cryptographic Types as First-Class Citizens

Hashes, signatures, and keys aren't byte blobs - they're strongly-typed primitives:

```rust
VsfType::h(algorithm, hash)       // Integrity verification
VsfType::g(algorithm, signature)  // Authentication & authorization
VsfType::k(algorithm, pubkey)     // Identity
VsfType::a(algorithm, mac_tag)    // Message authentication
```

**These primitives enable:**
- **File integrity** - Mandatory BLAKE3 hash on every file
- **Data provenance** - Sign sections to establish chain of custody
- **Capability-based security** - Signatures as unforgeable permission tokens
- **Distributed trust** - No central authority required

Algorithm identifiers prevent type confusion - the compiler enforces verification.

**Concrete examples:**
```rust
// Hash - integrity verification
VsfType::h(HASH_BLAKE3, hash_bytes)      // Algorithm ID prevents confusion
VsfType::h(HASH_SHA256, sha256_bytes)    // Type system enforces verification

// Signature - authentication and non-repudiation
VsfType::g(SIG_ED25519, signature)       // 64 bytes, Ed25519
VsfType::g(SIG_ECDSA_P256, signature)    // NIST P-256

// Public key - identity
VsfType::k(KEY_ED25519, pubkey)          // 32 bytes
VsfType::k(KEY_RSA_2048, pubkey)         // 256 bytes

// MAC - message authentication
VsfType::a(MAC_HMAC_SHA256, mac_tag)     // 32 bytes
```

**Algorithm identifiers prevent type confusion attacks:**
- Can't substitute SHA-256 hash where BLAKE3 expected
- Compiler enforces signature verification before payload access
- "Forgot to verify signature" becomes a compile error

**Supported algorithms:**
- **Hashes**: BLAKE3, SHA-256, SHA-512, SHA3-256, SHA3-512
- **Signatures**: Ed25519, ECDSA-P256, RSA-2048/3072/4096
- **Keys**: Ed25519, X25519, P-256, P-384, RSA
- **MACs**: HMAC-SHA256/512, Poly1305, BLAKE3-keyed, CMAC-AES

### Per-Section Provenance (Strategy 2)

Lock specific sections with signatures while allowing other sections to be modified freely:

```rust
// Camera signs RAW sensor data at capture
let bytes = raw.build()?;
let bytes = sign_section(bytes, "raw", &camera_private_key)?;

// Later: Add thumbnail without breaking RAW signature
builder.add_section("thumbnail", thumbnail_data);

// Later: Add EXIF metadata without breaking RAW signature
builder.add_section("exif", exif_data);

// Verify original RAW data is untouched
verify_section_signature(&bytes, "raw", &camera_public_key)?;
```

**Use cases:**
- **Forensic photography**: Camera signs RAW at capture, establishes chain of custody. Lab adds analysis metadata without invalidating signature.
- **Scientific instruments**: Sensor signs measurement data. Researchers annotate results without compromising provenance.
- **Medical imaging**: Scanner signs DICOM data. Radiologist adds diagnosis without altering signed pixels.
- **Legal documents**: Notary signs document hash. Clerk adds filing metadata without breaking signature.

**Why per-section matters:**

Traditional whole-file signatures break on any modification - even benign metadata updates. VSF's per-section signatures enable:
- **Immutable provenance** for critical data (sensor readings, RAW pixels)
- **Flexible metadata** that doesn't require re-signing
- **Layered trust** - multiple parties can sign different sections

### Calibration Frame Verification

Camera RAW files reference external calibration frames (dark, flat, bias). VSF embeds cryptographic hashes to verify frame integrity:

```rust
RawMetadata {
    // Each hash includes algorithm ID + hash bytes
    dark_frame_hash: Some((HASH_BLAKE3, dark_hash)),
    flat_field_hash: Some((HASH_BLAKE3, flat_hash)),
    bias_frame_hash: Some((HASH_BLAKE3, bias_hash)),
    vignette_correction_hash: Some((HASH_BLAKE3, vignette_hash)),
    distortion_correction_hash: Some((HASH_BLAKE3, distortion_hash)),
}
```

**Why this matters:**
- Prevents using wrong calibration frames (would corrupt image)
- Detects calibration frame corruption before processing
- Enables distributed workflows (send RAW + hashes, verify calibration locally)
- Supports multiple hash algorithms (BLAKE3 today, post-quantum tomorrow)

### Why VSF Is Different

**Other formats treat verification as optional or external:**

| Format | File Integrity | Cryptographic Types | Per-Section Signing |
|--------|---------------|---------------------|---------------------|
| TIFF | ❌ None | ❌ Byte blobs | ❌ Not supported |
| PNG | ⚠️ Optional CRC (can strip) | ❌ Byte blobs | ❌ Not supported |
| HDF5 | ⚠️ Optional checksums | ❌ Byte blobs | ❌ Not supported |
| JPEG | ❌ None | ❌ Byte blobs | ❌ Not supported |
| Protobuf | ❌ None | ❌ Byte blobs | ❌ Not supported |
| **VSF** |**Mandatory BLAKE3** |**First-class types** |**Built-in support** |

**VSF makes data provenance impossible to ignore:**
0. **Can't create unverifiable files** - hash is computed automatically
1. **Can't strip verification** - removes hash field, breaks file structure
2. **Can't ignore signatures** - type system enforces verification
3. **Can't use wrong algorithm** - algorithm ID embedded in type

For systems where data integrity matters - forensic photography, scientific measurements, medical imaging, financial records, legal documents - VSF provides cryptographic guarantees from the ground up, not as a retrofit.

### Verification is O(1) Skip-able

Despite mandatory hashing, VSF maintains O(1) seek performance:
- Hash stored in header (known location)
- Sections have byte offsets in header
- Can skip to any section without reading others
- Verify only sections you care about

**Traditional formats force a choice:** fast seeking OR integrity checks. VSF gives you both.

### Toward Capability-Based Security

VSF's cryptographic types aren't just for verification - they're the foundation for capability-based permissions:

**Traditional ACLs** (what UNIX does):
- File metadata: "user alice can read, group lab can write"
- Requires central identity database (UID/GID)
- Doesn't work in distributed systems

**Capabilities** (what VSF enables):
- Signature: "holder of this token can read file with hash 0xABCD..."
- Self-contained, unforgeable, delegatable
- Works across distributed systems with TOKEN identities

```rust
// v0.2+ will enable:
let capability = Capability {
    resource: VsfType::h(HASH_BLAKE3, file_hash),
    permission: "read",
    granted_to: VsfType::k(KEY_ED25519, editor_pubkey),
    granted_by: VsfType::k(KEY_ED25519, camera_pubkey),
    expires: EtType::f6(eagle_time + 30_days),
    location: WorldCoord::from_lat_lon(47.6062, -122.3321),
};

let signed_cap = VsfType::g(SIG_ED25519, sign(&capability, camera_private_key));

// Signature proves: camera granted permission to editor
// No central authority needed - crypto proves authorization
// Capability is self-contained, unforgeable, delegatable
```

VSF v0.1 provides the cryptographic primitives (`g`, `k`, `h`, `a`). v0.2 will add structured capability types built on these foundations.

---

## Design Principles

### 0. Information-Theoretic Optimality

Variable-width encoding that's provably optimal for byte-aligned systems. Small numbers use small encodings, large numbers use large encodings.

### 1. Type Safety

211 strongly-typed variants with complete pattern matching. Compiler verifies every case is handled.

### 2. Mathematical Correctness

Integrates Spirix for arithmetic that preserves mathematical identities. Eagle Time for physics-bounded timestamps.

### 3. Cryptographic Foundation

Signatures, hashes, keys, and MACs as first-class types, not afterthoughts.

### 4. Self-Describing
Each value includes its type information. Files can be parsed without external schema.

---

## Use Cases

### Genomics & Bioinformatics
- DNA sequencing quality scores (Phred) use 6 bits but get stored in 8-bit ASCII. A human genome (3 billion bases) wastes 750MB on padding. VSF bitpacking eliminates this overhead while embedding cryptographic signatures to verify data provenance.

### Financial Systems & Audit Trails
- Currency amounts require arbitrary precision - IEEE-754 floats fail (0.1 + 0.2 ≠ 0.3). VSF's variable-width integers encode $0.42 in 3 bytes, $1,234,567.89 in 5 bytes. HMAC tags verify transaction integrity without external databases.

### Geospatial Systems & Navigation
- GPS coordinates as IEEE doubles use 16 bytes for precision you don't need. Dymaxion WorldCoord provides 2.14mm accuracy in 8 bytes. Useful for drone navigation, autonomous vehicles, and surveying equipment.

### Game Development & Asset Pipelines
- Animation data has mixed precision requirements: keyframe times (16-bit), quaternions (32-bit), visibility flags (1-bit). VSF bitpacked tensors let each channel use its natural width. A 10-minute mocap recording: 45MB → 18MB.

### Machine Learning & Model Distribution
- Quantized neural networks use 4-bit or 8-bit weights. Standard formats store these in 32-bit arrays (4-8x waste). A 4-bit quantized LLaMA-7B: 3.5GB actual, 14GB in typical formats. VSF maintains 3.5GB while embedding model signatures.

### Scientific Data Archival
- Particle physics experiments produce petabytes with heterogeneous precision: detector IDs (16-bit), energies (32-bit), timestamps (64-bit). VSF selects optimal encoding per field. Spirix prevents IEEE-754 underflow in long-running cumulative calculations.

### Web3 & Decentralized Identity
- Blockchain transactions contain signatures, typically stored as untyped byte arrays. VSF signatures are first-class types - the compiler enforces verification before payload access. "Forgot to verify signature" becomes a compile error.

### Embedded Systems & IoT Telemetry
- Satellite sensors transmit over power/bandwidth-constrained RF links. Temperature sensors: 12-bit, accelerometers: 10-bit. Storing as 16-bit wastes 20-40% per reading. VSF optimizes automatically. Eagle Time anchors timestamps to locality, eliminating clock drift.

---

## Technical Details

### Variable-Length Integer Encoding

See "Core Innovation: Exponential-Width Integer Encoding" section above for complete details.

**Quick summary:**
- Type marker (`u`, `i`, etc.) + size marker (ASCII '3'-'Z')
- 2 bytes overhead, O(1) skip
- Extends from 8 bits to 8 GB (2^36 bits max)

### Bitpacked Tensor Format

```
'p' marker (1 byte)
ndim (variable-length)
bit_depth (1 byte: 0x0C for 12-bit, 0x00 for 256-bit)
shape dimensions (each variable-length encoded)
packed data (bits packed into bytes, MSB-first)
```

Efficient for non-standard bit depths common in sensors and quantized ML models.

---

## Context

VSF is part of a broader computational foundation:

- **Spirix** - Better floating point arithmetic
- **TOKEN** - Unfakeable cryptographic identity
- **VSF** - Optimal serialization
- **Eagle Time** - Physics-bounded consensus timestamps
- **Dymaxion Encoding** - Global precision of 2.14mm avg, 5.07mm max in 64 bits.

Each component addresses fundamental problems that irritated me for a minute now.

---

## Contributing

VSF is in active development. Core encoding/decoding is stable.

---

## License

Custom open-source:
- ✅ Free for any purpose (including commercial)
- ✅ Modify and distribute freely
- ✅ Patent grant included
- ❌ Cannot sell VSF itself as a standalone product

See LICENSE for full terms.

---

## Summary

VSF solves the universal integer encoding problem thru exponential-width encoding with explicit size markers. This enables:

- **Optimal space usage** - 40-70% savings on typical data
- **Literally no size limits** - Can encode arbitrarily large values
- **O(1) skip** - Fast random access without parsing
- **Type safety** - Compiler-verified exhaustive handling

If you need efficient encoding of varied-size integers, bitpacked tensors, or cryptographic primitives with perfect type safety, VSF is your only option!

---

*Written in Rust with ZERO wildcards.*