# Hashing Support
base-d includes built-in support for multiple hash algorithms including cryptographic hashes (RustCrypto), CRC checksums, and xxHash for non-cryptographic use cases. All algorithms are implemented in pure Rust without any C dependencies or OpenSSL linking requirements.
## Supported Algorithms
### Cryptographic Hashes
| **MD5** | `--hash md5` | 128 bits (16 bytes) | Legacy compatibility (NOT secure) |
| **SHA-224** | `--hash sha224` | 224 bits (28 bytes) | Truncated SHA-256 |
| **SHA-256** | `--hash sha256` | 256 bits (32 bytes) | General purpose, widely supported |
| **SHA-384** | `--hash sha384` | 384 bits (48 bytes) | Truncated SHA-512 |
| **SHA-512** | `--hash sha512` | 512 bits (64 bytes) | High security, large digests |
| **SHA3-224** | `--hash sha3-224` | 224 bits (28 bytes) | Modern NIST standard |
| **SHA3-256** | `--hash sha3-256` | 256 bits (32 bytes) | Modern NIST standard |
| **SHA3-384** | `--hash sha3-384` | 384 bits (48 bytes) | Modern NIST standard |
| **SHA3-512** | `--hash sha3-512` | 512 bits (64 bytes) | Modern NIST standard |
| **Keccak-224** | `--hash keccak224` | 224 bits (28 bytes) | Ethereum (pre-standardization) |
| **Keccak-256** | `--hash keccak256` | 256 bits (32 bytes) | Ethereum, blockchain |
| **Keccak-384** | `--hash keccak384` | 384 bits (48 bytes) | Ethereum variant |
| **Keccak-512** | `--hash keccak512` | 512 bits (64 bytes) | Ethereum variant |
| **BLAKE2b** | `--hash blake2b` | 512 bits (64 bytes) | Fastest, high security |
| **BLAKE2s** | `--hash blake2s` | 256 bits (32 bytes) | Fast, optimized for 32-bit |
| **BLAKE3** | `--hash blake3` | 256 bits (32 bytes) | Fastest modern hash |
| **Ascon** | `--hash ascon` | 256 bits (32 bytes) | Lightweight, IoT/embedded |
| **KangarooTwelve** | `--hash k12` | 256 bits (32 bytes) | High-speed, XOF capability |
### CRC Checksums (Non-Cryptographic)
| **CRC-16** | `--hash crc16` | 16 bits (2 bytes) | Simple error detection |
| **CRC-32** | `--hash crc32` | 32 bits (4 bytes) | ZIP, Ethernet, PNG files |
| **CRC-32C** | `--hash crc32c` | 32 bits (4 bytes) | iSCSI, Btrfs (Castagnoli) |
| **CRC-64** | `--hash crc64` | 64 bits (8 bytes) | Large data integrity |
### xxHash (Ultra-Fast Non-Cryptographic)
| **xxHash32** | `--hash xxhash32` | 32 bits (4 bytes) | Hash tables, cache keys |
| **xxHash64** | `--hash xxhash64` | 64 bits (8 bytes) | Fast checksums, deduplication |
| **xxHash3-64** | `--hash xxhash3` | 64 bits (8 bytes) | Newest, fastest 64-bit hash |
| **xxHash3-128** | `--hash xxhash3-128` | 128 bits (16 bytes) | Newest, 128-bit output |
## Basic Usage
### Compute Hash (Hex Output)
```bash
# SHA-256 hash (default hex output)
# MD5 hash
# BLAKE3 hash (fastest)
# CRC32 checksum
# xxHash64 (ultra-fast non-cryptographic)
# xxHash3 (newest, fastest)
# Ascon (lightweight for IoT/embedded)
# KangarooTwelve (high-speed with XOF)
```
### xxHash Configuration
xxHash algorithms support customization through seed values and secrets (for XXH3 variants). These options allow you to generate different hash outputs from the same input, which is useful for hash tables, distributed systems, and avoiding hash flooding attacks.
#### Seed Configuration
All xxHash algorithms (xxHash32, xxHash64, XXH3-64, XXH3-128) support a seed parameter - a 64-bit unsigned integer that changes the hash output. The seed allows you to:
- Generate different hash values for the same data
- Avoid hash collisions in distributed systems
- Protect against hash flooding attacks
- Create domain-specific hash functions
**CLI Usage:**
```bash
# Default seed (0) - backward compatible
# Custom seed via CLI flag
# Works with all xxHash variants
echo "data" | base-d --hash xxhash3-128 --hash-seed 999
```
**Config File:**
You can set a default seed in your `dictionaries.toml`:
```toml
[settings.xxhash]
default_seed = 42
```
CLI flags always override config file settings.
#### XXH3 Secret Configuration
XXH3-64 and XXH3-128 support an additional secret - a buffer of at least 136 bytes that acts as a key for the hash function. This provides an extra layer of customization beyond the seed.
**Why use secrets?**
- Stronger protection against hash flooding attacks
- Create cryptographically-separated hash spaces
- Domain-specific hashing with large key space
**CLI Usage:**
```bash
# Generate a secret file (must be >= 136 bytes)
head -c 200 /dev/urandom > ~/.config/base-d/xxh3-secret.bin
# Hash with secret from stdin
cat secret.bin | base-d --hash xxhash3-128 --hash-seed 42 --hash-secret-stdin input.txt
```
**Config File:**
```toml
[settings.xxhash]
default_seed = 0
default_secret_file = "~/.config/base-d/xxh3-secret.bin"
```
**Important Notes:**
- Secrets only work with XXH3-64 and XXH3-128
- Secret must be at least 136 bytes
- Using `--hash-secret-stdin` with xxHash32/xxHash64 will show a warning and ignore the secret
- Tilde expansion (`~`) is supported in config file paths
### Hash with Custom Encoding
```bash
# SHA-256 encoded as base64
# BLAKE3 encoded as base85
# SHA-512 encoded as emoji
# CRC32C encoded as base64
# xxHash3-128 encoded as base64
```
### Hash Files
```bash
# Hash a file
base-d --hash sha256 document.txt
# Hash large files efficiently
base-d --hash blake3 large_file.iso
# Hash and encode with custom dictionary
base-d --hash sha256 -e base64 myfile.bin
```
### Pipeline Integration
```bash
# Decode, then hash
# Hash and then compress result (hash then encode compressed hash)
## Algorithm Comparison
### Security Recommendations
- ✅ **Recommended**: SHA-256, SHA-512, SHA3-*, BLAKE2*, BLAKE3, Ascon, K12
- ⚠️ **Legacy/Specific Use**: SHA-224, SHA-384, Keccak-* (Ethereum)
- ❌ **NOT Secure**: MD5 (collisions known, use only for checksums)
### Performance Characteristics
**Cryptographic Hashes** (Relative speeds on modern hardware):
1. **BLAKE3**: ~1000 MB/s (fastest, parallelized)
2. **K12** (KangarooTwelve): ~800 MB/s (high-speed XOF)
3. **BLAKE2b**: ~800 MB/s
4. **BLAKE2s**: ~700 MB/s
5. **MD5**: ~600 MB/s (not secure)
6. **SHA-512**: ~500 MB/s (faster than SHA-256 on 64-bit)
7. **SHA-256**: ~300 MB/s
8. **Ascon**: ~150 MB/s (optimized for constrained devices)
9. **SHA3-256**: ~150 MB/s
10. **Keccak-256**: ~150 MB/s
**Non-Cryptographic** (Much faster):
1. **xxHash3-64**: ~30 GB/s (newest, fastest)
2. **xxHash64**: ~15 GB/s (ultra-fast)
3. **xxHash32**: ~12 GB/s
4. **CRC32C** (hardware): ~10 GB/s
5. **CRC32**: ~1 GB/s
### Use Case Guide
| **Cryptographic** | |
| General checksums | SHA-256, BLAKE3 |
| High-speed checksums | BLAKE3, K12 |
| Cryptographic signatures | SHA-256, SHA-512 |
| File integrity (secure) | SHA-256, BLAKE2b |
| IoT/Embedded devices | Ascon |
| High-speed with XOF | K12 (KangarooTwelve) |
| Ethereum/blockchain | Keccak-256 |
| **Non-Cryptographic** | |
| File integrity (fast) | CRC32, CRC32C |
| ZIP/PNG compatibility | CRC32 |
| Hash tables | xxHash3, xxHash32, xxHash64 |
| Data deduplication | xxHash3, xxHash64, CRC64 |
| Cache keys | xxHash3, xxHash32, xxHash64 |
| Legacy compatibility | MD5, CRC16 |
### When to Use What
- **Use cryptographic hashes** when you need security (tamper resistance, collision resistance)
- **Use CRC** for error detection in files/networks (ZIP, Ethernet)
- **Use xxHash** for maximum speed when security isn't needed (caching, deduplication)
## Pure Rust Implementation
All hash algorithms are implemented in **pure Rust** with:
- ✅ No C dependencies
- ✅ No OpenSSL linking
- ✅ Cross-platform compilation
- ✅ Memory-safe by design
- ✅ Constant-time operations where applicable (cryptographic)
### Libraries Used
**Cryptographic**:
- `sha2` - SHA-224, SHA-256, SHA-384, SHA-512
- `sha3` - SHA3 family and Keccak variants
- `blake2` - BLAKE2b and BLAKE2s
- `blake3` - BLAKE3 (Rust-first design)
- `md-5` - MD5 (legacy)
- `ascon-hash` - Ascon (lightweight authenticated encryption)
- `k12` - KangarooTwelve (high-speed XOF)
**Non-Cryptographic**:
- `crc` - CRC16, CRC32, CRC32C, CRC64
- `twox-hash` - xxHash32, xxHash64, xxHash3-64, xxHash3-128
## Library API
```rust
use base_d::{HashAlgorithm, hash};
// Compute hash
let data = b"hello world";
let hash_output = hash(data, HashAlgorithm::Sha256);
// Output size
let size = HashAlgorithm::Sha256.output_size(); // 32 bytes
// Parse from string
let algo = HashAlgorithm::from_str("sha256")?;
```
## Advanced Examples
### Verify File Integrity
```bash
# Create checksum
base-d --hash sha256 file.zip > file.zip.sha256
# Verify later
### Multi-Algorithm Verification
```bash
# Generate multiple checksums
echo "data" | base-d --hash blake3 > file.blake3
```
### Encoded Hash Storage
```bash
# Store hash in base64 (more compact)
base-d --hash sha512 document.pdf -e base64 > document.pdf.hash
# Store in base85 (even more compact)
base-d --hash sha256 file.bin -e base85
```
## Implementation Notes
### Memory Usage
- All algorithms use constant memory regardless of input size
- Stream processing for large files
- No buffering of entire input
### Thread Safety
- All hash functions are thread-safe
- Can be called concurrently
- No shared mutable state
### Platform Support
- Works on all Rust-supported platforms
- No architecture-specific requirements
- ARM, x86, x86_64, RISC-V, etc.
## Error Handling
```bash
# Invalid algorithm name
base-d --hash invalid
# Error: Unknown hash algorithm: invalid
# Works with any input encoding
```
## Future Enhancements
Potential additions (see ROADMAP.md):
- HMAC support with keyed hashing
- Incremental hashing for streaming
- Hash comparison utilities
- Multi-threaded parallel hashing for large files