casq
A content-addressed file store CLI with compression and chunking (v0.4.0).
This is Alpha level software.
Overview
casq is a command-line tool for managing content-addressed storage. It stores files and directories by their cryptographic hash, providing automatic deduplication, transparent compression, content-defined chunking, garbage collection, and named references.
This is the CLI binary that uses the casq_core library.
Installation
# Build from source
# The binary will be at target/release/casq
# Install it by
# Optionally, copy to your PATH
Quick Start
# Initialize a new store
# Add files or directories
# Add content from stdin (pipe data directly) - in most cases you want to use --reference
|
|
# Add with a named reference
# Discover what content you have
# List tree contents (requires hash)
# Output blob content
# Show object metadata
# Materialize (restore) to filesystem
# Garbage collect unreferenced objects
Commands
casq initialize
Initialize a new content-addressed store.
)
Creates the store directory structure at the configured root (default: ./casq-store).
casq put <PATH> or casq put -
Add files, directories, or stdin content to the store.
<PATH> Path )
Examples:
# Add a single file
# Add a directory
# Add with a reference
# Add from stdin
|
|
|
The command outputs the hash of the added object. Directories are added recursively and stored as tree objects (and the returned hash is that of the tree itself). Stdin content is stored as a blob.
Important notes:
- Output format on stdout is:
<hash>
casq materialize <HASH> <DEST>
Materialize (restore) an object from the store to the filesystem.
<HASH> Hash
<DEST> Destination )
Examples:
# Restore a directory
# Restore a file
casq get <HASH>
Output blob content to stdout.
<HASH> Hash
Examples:
# View a text file
# Pipe to another command
|
# Save to a file
casq list <HASH>
List tree contents or show blob info.
<HASH> Hash )
Examples:
# List directory contents
# Show detailed listing with modes and hashes
# Output format (short):
# filename.txt
# subdir
# Output format (--long):
# b 100644 <hash> filename.txt
# t 040755 <hash> subdir
Type codes: b = blob (file), t = tree (directory)
Tip: Use casq references list to discover content, then casq list <hash> to explore it.
casq metadata <HASH>
Show detailed metadata about an object.
<HASH> Hash
Example output:
Hash: abc123...
Type: tree
Entries: 5
Size: 320 bytes (on disk)
Path: ./casq-store/objects/blake3-256/ab/c123...
casq collect-garbage
Garbage collect unreferenced objects.
Examples:
# Preview what would be deleted
# Actually delete unreferenced objects
Walks from all named references and deletes objects that are no longer reachable.
casq references add <NAME> <HASH>
Add a named reference to an object.
<NAME> Reference
<HASH> Hash
Examples:
References act as GC roots - objects reachable from references won't be deleted by collect-garbage.
casq references list
List all references.
Example output:
backup-2024 -> abc123...
important -> def456...
casq references remove <NAME>
Remove a reference.
<NAME> Reference
Example:
Global Options
All commands support these global options:
Store Root Priority
The store root is determined in this order:
--rootCLI argumentCASQ_ROOTenvironment variable./casq-store(default)
Examples:
# Use explicit root
# Use environment variable
# Use default (./casq-store)
Output Streams
casq follows Unix conventions for output streams to enable proper pipeline usage:
Text mode (default):
- All informational messages, confirmations, and data → stderr
- stdout is empty
- This allows reliable pipeline usage and scripting
JSON mode (--json flag):
- All structured output → stdout
- Errors → stderr (as JSON)
- Designed for parsing and automation
Examples:
# Text mode - output on stderr
|
# Suppress informational messages
# JSON mode - output on stdout (recommended for scripts)
HASH=
|
For scripting: Use --json flag for reliable, machine-readable output.
Typical Workflows
Backup Workflow
# Initialize store
# Create initial backup
# Add more data later
# List all backups
# Restore a backup
# Clean up old backups
Deduplication Example
# Add the same file twice
# Output: abc123...
# Output: abc123... (same hash - deduplicated!)
# Only one copy stored internally
Exploring Content
# Add a directory with a reference
# Discover what's in your store
# Output: current-work -> abc123...
# Explore the tree
HASH=
# Look at a specific file
FILE_HASH=
Store Structure
casq-store/
├── config # Store configuration
├── objects/
│ └── blake3-256/ # Algorithm-specific directory
│ ├── ab/ # Shard directory (first 2 hex chars)
│ │ └── cd...ef # Object file (remaining 62 hex chars)
│ └── ...
└── refs/ # Named references
├── backup-2024
└── important
Object Types
- Blob - Raw file content (automatically compressed if ≥ 4KB)
- Tree - Directory listing (sorted entries)
- ChunkList - Large file split into chunks (files ≥ 1MB, enables incremental backups)
Trees reference other blobs and trees, forming a hierarchical structure similar to git. Large files are split into chunks for efficient incremental backups and cross-file deduplication.
Exit Codes
0- Success1- Error (with descriptive message to stderr)
Environment Variables
CASQ_ROOT- Default store root directory
Error Handling
All commands provide clear error messages:
)
Performance Tips
- Large files - Content is streamed, not buffered in memory
- Many small files - Use directories to group them
- Deduplication - Identical content is stored only once (including chunk-level deduplication)
- Compression - Files ≥ 4KB automatically compressed with zstd (3-5x typical reduction)
- Chunking - Files ≥ 1MB split into chunks for incremental backups (change 1 byte → store ~512KB)
- GC frequency - Run
gcperiodically to reclaim space from unreferenced objects
Storage Efficiency (v0.4.0+)
- Compression: 3-5x reduction for text files, 2-3x for mixed data
- Chunking: Change 1 byte in 1GB file → store only ~512KB (changed chunk)
- Cross-file deduplication: Shared content across files stored only once
- Example: 10 files with identical 5MB section = 5MB stored (not 50MB)
JSON Output
All commands support the --json flag for machine-readable output, enabling scripting and automation.
Basic Usage
# Get JSON output from any command
# Pipe through jq for processing
|
|
Standard Response Format
All JSON responses include these standard fields:
success(boolean) - Whether the operation succeededresult_code(number) - Exit code (0 for success, non-zero for errors)
Command-Specific Outputs
initialize
put
references list
list <hash> (tree contents)
metadata <hash> (blob)
collect-garbage
find-orphans
Error Response (stderr)
Scripting Examples
# Extract hash from put operation
HASH=
# Count orphaned objects
COUNT=
# List all reference names
|
# Get GC stats
|
# Check if operation succeeded
if | ; then
else
fi
Exit Codes
Program exit codes match the result_code field in JSON output:
0- Success1- Error (details inerrorfield for JSON, or stderr for text)
Binary Data Limitation
The get command outputs binary data to stdout and cannot be used with --json. Use materialize or metadata instead:
# This will error with JSON
# Use these alternatives
Limitations
- No encryption - Store plaintext only
- No network - Local-only storage
- No parallel operations - Single-threaded (may be added in future)
- POSIX only - Full permission preservation only on Unix-like systems
Comparison to Git
| Feature | casq | Git |
|---|---|---|
| Content addressing | ✓ | ✓ |
| Deduplication | ✓ | ✓ |
| Trees/Blobs | ✓ | ✓ |
| Hash algorithm | BLAKE3 | SHA-1/SHA-256 |
| Commits | ✗ | ✓ |
| Branches | ✗ | ✓ |
| Diffs | ✗ | ✓ |
| Network | ✗ | ✓ |
| Use case | File storage | Version control |
casq is simpler than git - it's just content-addressed storage without the version control features.
Troubleshooting
Store not found
# Solution: Initialize the store first
Object not found
# Solution: Verify the hash is correct
Path already exists
# Solution: Remove the destination first or use a different path
Development
# Run from source
# Build optimized binary
# Run tests
# Format code
# Lint
License
Apache-2.0
See Also
- casq_core - The library powering this CLI
- NOTES.md - Design and specification
- CLAUDE.md - Development guidelines