Tauq - Token-Efficient Data Notation
44% fewer tokens than JSON overall. 11% more efficient than TOON. Verified with tiktoken.
What is Tauq?
Tauq (τq) is three things:
- Tauq Notation (
.tqn): A schema-driven text format that achieves 44-54% fewer tokens than JSON (verified with tiktoken cl100k_base). - Tauq Binary Format (TBF): A high-performance binary format achieving 83% size reduction vs JSON with schema-aware columnar encoding.
- Tauq Query (
.tqq): A pre-processor with shell integration for data transformations.
Built for the AI era where every token counts.
Benchmarks
Token Efficiency (1000 Records)
| Format | Tokens | vs JSON |
|---|---|---|
| JSON (minified) | 24,005 | baseline |
| TOON | 12,002 | -50.0% |
| Tauq (TQN) | 11,012 | -54.1% |
All counts verified with tiktoken cl100k_base (GPT-4/Claude tokenizer).
Binary Size (1000 Records)
| Format | Size | vs JSON |
|---|---|---|
| JSON (minified) | 92 KB | baseline |
| Tauq (TQN) | 43 KB | -53% |
| Tauq (TBF) | 16 KB | -83% |
Overall (10 datasets, 55,647 tokens): Tauq saves 44.2% vs JSON, 10.8% vs TOON. See benchmarks/ for full results.
Quick Example
JSON:
Tauq:
!def User id name
1 Alice
2 Bob
Features
Token-Optimal (TQN)
- 44-54% fewer tokens than JSON (verified benchmarks)
- 11% more efficient than TOON overall
- Space delimiters tokenize better than commas
Binary Format (TBF)
- Up to 83% smaller than JSON (with schema-aware encoding)
- Generic serde encoder: ~44-56% reduction (CLI default)
- Schema-aware encoder: ~83% reduction (Rust API + type hints)
- Adaptive integer and dictionary compression
- Apache Iceberg integration for data lakes
True Streaming
StreamingParseriterator API- Process records one at a time
- No count required (unlike TOON's
[N])
Schema-Driven
- Define data shapes with
!def - Switch schemas with
!use - Nested types and typed arrays
- Type hints for binary encoding optimization
Programmable
- Tauq Query for data transformations
- Unix pipe model
- Polyglot support (Python, Rhai, JavaScript)
Production-Ready CLI
tauq build- Smart build (TQN→JSON, TQQ→Tauq, supports TBF output)tauq format- JSON → Tauqtauq query- Filter/transform with Rhai expressionstauq exec- Run Tauq Query pipelinestauq minify- Compress to one linetauq prettify- Format to readable Tauqtauq validate- Check syntax
Quick Start
Installation
CLI Tool:
Language Bindings
Rust:
[]
= "0.2"
Python:
JavaScript/TypeScript:
Go:
Other languages: Java, C#, Swift - see Language Bindings
Hello World
Create config.tqn:
app_name "MyService"
version "1.0.0"
port 8080
debug true
features [api websockets metrics]
Parse to JSON:
{
}
Syntax Guide
Simple Values
name "Alice"
age 30
active true
score 99.5
missing null
role admin # Barewords don't need quotes
Arrays
tags [web api backend]
ids [1 2 3 4 5]
mixed [1 "two" true null]
Tabular Data (The Killer Feature)
!def User id name email role
1 Alice "alice@example.com" admin
2 Bob "bob@example.com" user
3 Carol "carol@example.com" user
Schema Block
Define schemas upfront with --- to separate from data:
!def User id name role
---
users [
!use User
1 Alice admin
2 Bob user
]
The --- separator clears the implicit schema scope, allowing structured key-value data that uses !use inside arrays.
Nested Types
!def Address street city
!def User id name addr:Address
1 Alice { "123 Main" "NYC" }
2 Bob { "456 Oak" "LA" }
Lists of Objects
!def Employee name role
!def Department name budget employees:[Employee]
Engineering 1000000 [
Alice "Principal Engineer"
Bob "Senior Engineer"
]
Minified Syntax
!def U id name; 1 Alice; 2 Bob
All on one line for maximum compression!
Examples
We have provided a comprehensive set of examples in the examples/ directory:
- Basics: Simple configuration and primitive types.
- Schemas: Typed schemas and nested types.
- Modularity: Multi-file imports and modular configurations.
- Real World: Production configurations like Kubernetes deployments.
- Queries: ETL pipelines and data generation with TauqQ.
- Minified: Compact single-line syntax examples.
CLI Usage
Build: Tauq → JSON
# To stdout
# To file with pretty formatting
# From stdin
|
Format: JSON → Tauq
The formatter intelligently detects arrays of uniform objects and creates schemas automatically:
# Convert JSON to Tauq (auto-generates schemas for nested arrays)
# From stdin
|
# Output:
# !def User id name
# ---
# users [
# !use User
# 1 Alice
# 2 Bob
# ]
Execute Tauq Query
# Run data transformations
# Run in SAFE MODE (disable shell execution)
Minify
# Compress to single line
Binary Format (TBF)
For high-performance scenarios where tokens don't matter but size and speed do:
use ;
use ;
let employees = vec!;
let bytes = employees.tbf_encode; // ~56% smaller than JSON (generic)
For maximum compression (up to 83%), use the schema-aware API:
use ;
// Manual implementation of TableEncode for columnar optimization
Apache Iceberg Integration
Enable the iceberg feature for data lake integration:
[]
= { = "0.2", = ["iceberg"] }
use ;
// Write Arrow RecordBatches as TBF
let mut writer = new
.with_iceberg_schema
.build;
writer.write;
let tbf_data = writer.finish;
Security
Tauq is designed for use with untrusted input:
- Safe-by-default: TauqQ runs in safe mode (shell execution disabled) unless explicitly opted in via
compile_tauqq_unsafe()or--unsafeflag. - Resource limits: The Rhai query engine enforces operation counts, call depth, string/array/map size caps, and disables
eval. - Allocation caps: String dictionaries, schema fields, and batch decode operations enforce maximum counts to prevent memory amplification attacks.
- Import limits: Recursive
!importdirectives are capped at 100 total with cycle detection. - Environment isolation: Shell execution clears inherited environment variables before injecting the safe allowlist.
See the CHANGELOG for the full list of security improvements in v0.2.0.
Contributing
Tauq is in active development. Contributions welcome!
Areas of interest:
- Parser optimizations
- Error message improvements
- Language bindings (Python, JS, Go)
- Documentation
- Real-world use cases
License
MIT
Tauq (τq) - Stop wasting tokens on JSON. Start using the future. 🚀