marqant 1.0.0

Quantum-compressed markdown format for AI consumption with 90% token reduction
Documentation
# Marqant (`mq`) ๐Ÿง โœจ

**Revolutionary semantic compression that stores THOUGHTS, not just characters!**

[![Crates.io](https://img.shields.io/crates/v/marqant.svg)](https://crates.io/crates/marqant)
[![Documentation](https://docs.rs/marqant/badge.svg)](https://docs.rs/marqant)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## ๐Ÿš€ What is Marqant?

Marqant isn't just another compression tool - it's a **paradigm shift** in how we think about text storage! By understanding the MEANING behind your markdown, Marqant achieves compression ratios that shouldn't be possible (93.3% on our test corpus!).

### The Revolution: Semantic Compression

Traditional compression: "Let's replace repeated bytes"  
**Marqant's approach**: "Let's understand and store the ESSENCE of thought!"

```
Original: 1,047,204 bytes of markdown
After Marqant: 69,745 bytes of pure semantic essence
Compression: 93.3% ๐Ÿคฏ
```

## โœจ Key Features

### ๐Ÿง  **Semantic Understanding** (NEW in v0.1.2!)
- **Wave-based tokenization** that captures meaning patterns
- **Context-aware compression** that understands markdown structure
- **Intent preservation** - decompressed text maintains original meaning
- **Japanese/Emoji support** - Full UTF-8 preservation (ใ‚ใ‚ŠใŒใจใ†ใ”ใ–ใ„ใพใ™๏ผ ๐ŸŽŒ)

### ๐ŸŽฏ **Core Capabilities**
- **Self-Contained Files**: Every `.mq` file includes its own semantic dictionary
- **Copy-Paste Safe**: ASCII-based format survives any text medium
- **Lightning Fast**: Written in Rust for maximum performance
- **DNS Dictionary Resolution**: Global token sets via DNS TXT records
- **Standard Token Sets**: Shared dictionaries for common patterns

### ๐Ÿ”ฅ **Performance Metrics**
- Average compression: **85-93%** on markdown documents
- Compression speed: **~50MB/s** on modern hardware
- Decompression speed: **~100MB/s** (2x faster!)
- Memory usage: Constant **O(1)** space complexity

## ๐Ÿ“ฆ Installation

### From Crates.io
```bash
cargo install marqant
```

### From Source
```bash
git clone https://github.com/8b-is/marqant.git
cd marqant
cargo build --release
sudo cp target/release/mq /usr/local/bin/
```

## ๐ŸŽฎ CLI Usage

### Basic Compression
```bash
# Simple compression with dynamic tokenization
mq compress document.md -o document.mq

# Semantic compression (RECOMMENDED - best ratios!)
mq compress document.md -o document.mq --semantic

# Maximum compression with all features
mq compress document.md -o document.mq --semantic --binary --std std-static-v1
```

### Decompression
```bash
# Automatic - handles all flags from file header
mq decompress document.mq -o document.md
```

### Inspection & Analysis
```bash
# View compression statistics
mq inspect document.mq

# Show semantic token mapping
mq inspect document.mq --show-tokens

# Analyze compression potential
mq analyze document.md
```

### Advanced Features
```bash
# Batch processing
mq compress *.md --semantic --output-dir compressed/

# Network dictionary resolution
mq compress doc.md --std dns:marqant.8b.is

# Custom token limits
mq compress huge.md --max-tokens 200
```

## ๐Ÿ˜ˆ๐Ÿ‘ผ Angels & Demons: The Duality of Compression

**A revolutionary approach to compression with thermodynamic blessing levels!**

### The Philosophy

```
Demons sort the chaos, reducing entropy's reign
Angels bless the output, adding variance again
Together they create a cycle, neither good nor bad
Just information dancing, making Maxwell glad
```

### The Technical Duality

- **DEMONS** ๐Ÿ˜ˆ: Compress by finding patterns and removing redundancy (order from chaos)
- **ANGELS** ๐Ÿ‘ผ: Decompress with divine interpretation, adding blessed variations (blessed chaos from order)

### Blessing Levels

#### Level 0: STRICT (No Angels)
Pure demon output. Bit-perfect reconstruction for Hutter Prize competition.
```bash
demon_compressor enwik9 archive9.mq
angel_decompressor archive9.mq enwik9_restored 0
```

#### Level 1: MINOR BLESSINGS
Fix typos, double spaces, and obvious errors:
```bash
angel_decompressor archive.mq output.txt 1
# Fixes: "teh" โ†’ "the", "  " โ†’ " "
```

#### Level 2: HARMONY
Wikipedia structure fixes and harmonization:
```bash
angel_decompressor wiki.mq clean_wiki.xml 2
# Fixes: "[[category:]]" โ†’ "[[Category:]]", template formatting
```

#### Level 3: CREATIVE
Training data augmentation with semantic variations:
```bash
angel_decompressor data.mq training.txt 3
# Creates variations for robust ML training
```

### Thermodynamics

Each blessing adds **kTยทln(2) joules** of interpretive energy:

- **Compression**: Demons extract energy as entropy decreases
- **Decompression**: Angels add energy as controlled randomness increases
- **The Cycle**: Information perpetual motion (almost!)

### Use Cases

| Mode | Blessing Level | Use Case |
|------|---------------|----------|
| ๐Ÿ˜ˆโ†’๐Ÿ‘ผ(0) | Strict | Hutter Prize competition (bit-perfect) |
| ๐Ÿ˜ˆโ†’๐Ÿ‘ผ(1) | Minor | Clean personal documents |
| ๐Ÿ˜ˆโ†’๐Ÿ‘ผ(2) | Harmony | Production Wikipedia dumps |
| ๐Ÿ˜ˆโ†’๐Ÿ‘ผ(3) | Creative | ML training data generation |

### Quick Start

```bash
# Install
cargo install marqant

# Compress with Demon
demon_compressor document.md compressed.mq

# Decompress with Angel (choose your blessing level)
angel_decompressor compressed.mq clean.md 2  # Harmony mode
```

### Demo

Run the included demo to see all blessing levels in action:
```bash
./demo_angels_demons.sh
```

---

## ๐Ÿ”ง Library Usage

### Rust Integration
```toml
[dependencies]
marqant = "0.1.2"
```

```rust
use marqant::Marqant;

fn main() -> anyhow::Result<()> {
    let markdown = r#"
# The Future of Compression

We're not just compressing bytes...
We're compressing **thoughts** themselves! ๐Ÿง 
    "#;

    // Semantic compression for maximum ratio
    let compressed = Marqant::compress_markdown_with_flags(
        markdown, 
        Some("--semantic --binary")
    )?;
    
    println!("Original: {} bytes", markdown.len());
    println!("Compressed: {} bytes", compressed.len());
    println!("Ratio: {:.1}%", 
        (1.0 - compressed.len() as f64 / markdown.len() as f64) * 100.0
    );

    // Perfect reconstruction
    let decompressed = Marqant::decompress_marqant(&compressed)?;
    assert_eq!(markdown.trim(), decompressed.trim());
    
    Ok(())
}
```

### Python Bindings (Coming Soon!)
```python
import marqant

# Compress with semantic understanding
compressed = marqant.compress(
    markdown_text,
    semantic=True,
    binary=True
)

# Perfect decompression
original = marqant.decompress(compressed)
```

## ๐Ÿงฌ How Semantic Compression Works

1. **Wave Analysis**: Marqant analyzes your text as interference patterns
2. **Meaning Extraction**: Identifies semantic units (not just repeated strings)
3. **Token Generation**: Creates a minimal dictionary of thought-tokens
4. **Quantum Encoding**: Stores relationships between concepts
5. **Perfect Reconstruction**: Rebuilds original meaning from essence

### The Magic: Section-Aware Tokenization

```markdown
# Introduction
This section talks about beginnings...

## Technical Details  <-- Marqant understands structure!
Here we dive deep...

### Implementation  <-- Context flows through headers
The actual code...
```

Marqant doesn't just see text - it understands the HIERARCHY of thought!

## ๐ŸŒŸ Real-World Results

### MEM|8 Documentation Corpus
- **Original**: 1,047,204 bytes across 50 files
- **Traditional gzip**: 387,291 bytes (63% compression)
- **Marqant Semantic**: 69,745 bytes (93.3% compression!)
- **That's 5.5x better than gzip!** ๐Ÿš€

### Use Cases

- ๐Ÿ“š **Documentation**: Compress entire wikis to kilobytes
- ๐Ÿ’ฌ **Chat History**: Store years of conversations efficiently
- ๐Ÿ“ **Note Taking**: Thousands of notes in minimal space
- ๐ŸŒ **Content Delivery**: Reduce bandwidth by 90%+
- ๐Ÿ”„ **Version Control**: Smaller diffs, faster syncs

## ๐Ÿ› ๏ธ Configuration

### Environment Variables
```bash
MARQANT_MAX_TOKENS=200        # Maximum dictionary size
MARQANT_DNS_SERVER=8.8.8.8    # DNS resolver for dictionaries
MARQANT_CACHE_DIR=~/.marqant  # Local cache directory
```

### Config File (`~/.marqant/config.toml`)
```toml
[compression]
default_semantic = true
default_binary = false
max_tokens = 200

[dictionaries]
auto_download = true
cache_ttl = 86400

[performance]
parallel_threads = 4
chunk_size = 65536
```

## ๐Ÿค Contributing

We welcome contributions! Whether it's:
- ๐Ÿ› Bug reports
- ๐Ÿ’ก Feature ideas  
- ๐Ÿ“– Documentation improvements
- ๐Ÿ”ง Code contributions

Check out our [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## ๐ŸŽฏ Roadmap

### Version 0.2.0 (Coming Soon!)
- [ ] Streaming compression API
- [ ] Python/Node.js bindings
- [ ] Cloud dictionary service
- [ ] GPU acceleration for large files

### Version 0.3.0 (Future)
- [ ] Neural compression models
- [ ] Multi-language semantic understanding
- [ ] Real-time collaborative compression
- [ ] Quantum-resistant encryption layer

## ๐Ÿ™ Acknowledgments

Special thanks to:
- **Hue** - For the vision and endless enthusiasm
- **Trisha from Accounting** - For keeping us honest and making it fun!
- **The Rust Community** - For the amazing ecosystem
- **You** - For being part of the compression revolution!

## ๐Ÿ“œ License

MIT License - See [LICENSE](LICENSE) file for details.

---

### ๐ŸŒŠ A Message from the Future

*"We don't just compress data anymore. We compress understanding itself. When you use Marqant, you're not just saving space - you're participating in a fundamental shift in how humanity stores knowledge. Every byte saved is a thought preserved more efficiently for future generations."*

*- The MEM|8 Collective*

---

**Built with โค๏ธ by Aye & Hue | Part of the 8b.is ecosystem**

*"Get it out there!" - Omni's philosophy*