Awful Jade (aj) π²
Awful Jade (aka aj) is your command-line sidekick for working with Large Language Models (LLMs).
Think of it as an LLM Swiss Army knife with the best intentions π.
Ask questions, run interactive sessions, sanitize messy OCR book dumps, synthesize exam questions, all without leaving your terminal.
It's built in Rust for speed, safety, and peace of mind. π¦
Ξ» aj --help
Awful Jade β a CLI for local LLM tinkering with memories, templates, and vibes.
Usage: aj <COMMAND>
Commands:
ask Ask a single question and print the assistant's response
interactive Start an interactive REPL-style conversation
init Initialize configuration and default templates in the platform config directory
reset Reset the database to a pristine state
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version

β¨ Features
- Ask the AI: Run
aj ask "question"and get answers powered by your configured model. - Interactive Mode: A REPL-style conversation with memory & vector search (your AI "remembers" past context).
- RAG Support: Load documents for context-aware responses with
--ragflag. Automatic chunking, caching, and retrieval. - Pretty Printing: Beautiful markdown rendering with syntax highlighting for code blocks (
--prettyflag). - Progress Indicators: Real-time feedback with spinners for API calls, memory search, and model loading.
- Vector Store: Uses HNSW + sentence embeddings to remember what you've said before. Basically, your AI gets a brain. π§
- Brains with Limits: Keeps only as many tokens as you allow. When full, it forgets the oldest stuff. (Like you after 3 AM pizza.)
- Config & Templates: YAML-driven configs and prompt templates. Customize everything, break nothing.
- Auto-downloads embeddings model: Uses Candle (pure Rust ML framework) to automatically download the
all-MiniLM-L6-v2BERT model from HuggingFace Hub when needed. - Pure Rust: No Python dependencies! Everything runs in pure Rust using Candle for ML inference.
π¦ Installation
From crates.io:
This gives you the aj binary.
Requirements:
- Rust (use rustup if you don't have it).
The embeddings model (all-MiniLM-L6-v2) will be downloaded automatically from HuggingFace Hub to your system's cache directory when first needed. Models are cached using the hf-hub crate, typically at:
- macOS:
~/.cache/huggingface/hub/ - Linux:
~/.cache/huggingface/hub/ - Windows:
C:\Users\YOU\AppData\Local\huggingface\hub\
π·π½ββοΈ Setup
No special setup required! Just install with cargo install awful_aj and you're ready to go.
The embeddings model will be downloaded automatically from HuggingFace Hub the first time you use a feature that requires it.
π Usage
1. Initialize
Create default configs, templates, and database:
aj init
This will generate:
config.yamlwith sensible defaultstemplates/default.yamlandtemplates/simple_question.yaml- A SQLite database (
aj.db) for sessions in your config directory
Options:
--overwrite: Force overwrite existing config, templates, and database files
Example:
2. Ask a Question
aj ask "Is Bibi really from Philly?"
You'll get a colorful, model-dependent answer in yellow (or dark gray if the model uses <think> tags for reasoning).
Options:
-t, --template <NAME>: Use a specific template (e.g.,simple_question)-s, --session <NAME>: Save to a named session for context retention-o, --one-shot: Ignore any session configured in config.yaml (force standalone prompt)-r, --rag <FILES>: Comma-separated list of files to use as RAG context-k, --rag-top-k <N>: Number of RAG chunks to retrieve (default: 3)-p, --pretty: Enable markdown rendering with syntax highlighting
Examples:

3. Interactive Mode
Talk with the AI in an interactive REPL:
aj interactive
Supports memory via the vector store, so it won't immediately forget your name. (Unlike your barista.)
Colors:
- Your input appears in blue
- Assistant responses appear in yellow
- Model reasoning (in
<think>tags) appears in dark gray
Options:
-t, --template <NAME>: Use a specific template-s, --session <NAME>: Use a named session-r, --rag <FILES>: Comma-separated list of files for RAG context (loaded once for entire session)-k, --rag-top-k <N>: Number of RAG chunks to retrieve (default: 3)-p, --pretty: Enable markdown rendering with syntax highlighting for all responses
Examples:
4. Reset Database
Start fresh by resetting the database to a pristine state:
aj reset
This drops all sessions, messages, and recreates the schema. Useful when you want a clean slate.
Aliases: aj r
5. Configuration
Edit your config at:
~/.config/aj/config.yaml # Linux
~/Library/Application Support/com.awful-sec.aj/config.yaml # macOS
Example:
api_base: "http://localhost:1234/v1"
api_key: "CHANGEME"
model: "jade_qwen3_4b_mlx"
context_max_tokens: 8192
assistant_minimum_context_tokens: 2048
stop_words:
- "<|im_end|>\\n<|im_start|>"
- "<|im_start|>\n"
session_db_url: "/Users/you/Library/Application Support/com.awful-sec.aj/aj.db"
session_name: "default" # Set to null for no session persistence
should_stream: true # Enable streaming responses
6. RAG (Retrieval-Augmented Generation)
Load documents as context for your queries:
How it works:
- Documents are automatically chunked (512 tokens, 128 overlap)
- Chunks are embedded and cached in
~/.config/aj/rag_cache/(or platform equivalent) - Top-k most relevant chunks are retrieved and injected into context
- Cache is reused for faster subsequent queries on the same files
Cache location:
- macOS:
~/Library/Application Support/com.awful-sec.aj/rag_cache/ - Linux:
~/.config/aj/rag_cache/ - Windows:
%APPDATA%\com.awful-sec\aj\rag_cache\
7. Templates
Templates are YAML files in your config directory. Here's a baby template:
system_prompt: "You are Awful Jade, a helpful AI assistant programmed by Awful Security."
messages:
response_format: null
pre_user_message_content: null
post_user_message_content: null
Add more, swap them in with -t <name> or --template <name>.
π§ How it Works
- Brain: Token-budgeted working memory with FIFO eviction. Keeps memories in a deque, trims when it gets too wordy.
- VectorStore: Embeds your inputs using
all-MiniLM-L6-v2via Candle (pure Rust ML), saves to HNSW index for semantic search. - RAG System: Intelligent document chunking (512 tokens, 128 overlap), embedding caching, and k-nearest neighbor retrieval.
- Pretty Printing: Markdown rendering with syntax highlighting for 100+ languages using Syntect.
- Progress UI: Real-time spinners and feedback using Indicatif (API calls, memory search, model loading).
- Candle: Pure Rust ML framework from HuggingFace. Automatically downloads and caches models from HuggingFace Hub.
- Config: YAML-based, sane defaults, easy to tweak.
- Templates: Prompt engineering without copy-pasting into your terminal like a caveman.
- No Python: Everything runs in pure Rust with no external ML runtime dependencies.
π§βπ» Development
Clone, hack, repeat:
git clone https://github.com/graves/awful_aj.git
cd awful_aj
cargo build
Run tests:
cargo test
π€ Contributing
PRs welcome! Bugs, docs, new templates, vector hacksβbring it on. But remember: with great power comes great YAML.
π License
CC-BY-SA-4.0 (Creative Commons Attribution-ShareAlike 4.0 International)
Share and adapt freely, but give credit and share alike. Don't blame us when your AI remembers your browser history.
π‘ Awful Jade: bad name, good brain.