shimmyjinja
Pure-Rust Jinja2 engine for Hugging Face chat_template strings — the layer
that turns a raw GGUF file's template field into a correctly-formatted LLM prompt,
no Python required.
Part of the shimmy inference ecosystem:
| Crate | Role |
|---|---|
| shimmyjinja ← you are here | Jinja2 template engine for chat_template strings |
| shimmytok | GGUF-native tokenizer (BPE/SentencePiece) |
| airframe | WebGPU inference server — uses both shimmytok and shimmyjinja |
| shimmy | OpenAI-compatible server powered by Airframe |
What it does
Every GGUF model file ships a tokenizer.chat_template key — a Jinja2 template
string that controls how a list of chat messages gets formatted into the single
prompt string the model expects. Example (TinyLlama):
{% for message in messages %}
{% if message['role'] == 'user' %}
<|user|>
{{ message['content'] }}{{ eos_token }}
{% endif %}
{% endfor %}
{% if add_generation_prompt %}<|assistant|>
{% endif %}
shimmyjinja evaluates templates like this in pure Rust. No Python process, no
jinja2 dependency, no subprocess call to HuggingFace transformers.
Supported Jinja2 subset
Everything used by real production chat_template strings today:
| Feature | Example |
|---|---|
for loops |
{% for message in messages %}...{% endfor %} |
if / elif / else |
{% if message['role'] == 'user' %} |
| String concatenation | '<s>' + message['content'] |
| Equality / comparison | ==, !=, <, >, <=, >= |
| Boolean logic | and, or, not |
| Membership test | in, not in |
| Inline ternary | 'yes' if flag else 'no' |
namespace() |
{% set ns = namespace(found=false) %} |
set / dotted set |
{% set ns.found = true %} |
raise_exception() |
Raises on invalid usage |
| Method calls | message.get('content', '') |
| Context variables | bos_token, eos_token, add_generation_prompt |
| Bracket access | message['role'] |
Supported model families (tested with real GGUF files)
TinyLlama · Llama 3.2 · Mistral · Gemma 2 · Phi-3 / Phi-3.5 · Qwen 2 · Qwen 3 · DeepSeek-LLM
Quick start
[]
= "0.5"
use ;
let template = r#"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{'<|im_start|>assistant\n'}}{% endif %}"#;
let messages = vec!;
let mut ctx = new;
ctx.set_var;
ctx.set_var;
ctx.set_flag;
let prompt = render_chat_template_with_context;
// "<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\n"
For a non-panicking variant that returns Result:
use try_render_chat_template_with_context;
match try_render_chat_template_with_context
Using with a GGUF file
Pair with shimmytok to extract both the template and token strings directly from the model file:
use Tokenizer;
let tok = from_gguf_file?;
let template = tok.chat_template.unwrap;
let bos = tok.bos_token;
let eos = tok.eos_token;
// then render with shimmyjinja as above
Testing
The test suite covers:
- Unit tests — lexer, parser, evaluator edge cases (16 tests)
- Real model templates — embedded verbatim
chat_templatestrings from 6 model families, disk-free (21 tests) - GGUF extraction tests — end-to-end with real GGUF files on disk, skip-if-missing (9 tests)
- Property-based tests — 13 proptest properties: determinism, no content loss, no unresolved tags, token literal pass-through, generation-prompt suppression, long content, empty input
- Doc-tests — 2 tests embedded in the API documentation
Design goals
- Zero dependencies at runtime — no
proc-macro, no heavy crates. The[dependencies]section ofCargo.tomlis empty. cargo publishclean — nobuild.rs, no C/C++ compilation, no bindgen.- Explicit newline semantics — no newlines are invented by the engine; all whitespace comes from the template string after JSON decoding.
- Fail loudly on bad templates —
parse()returnsErrrather than silently producing wrong output.
Governance & contributions
Maintainer: Michael Kuykendall (michaelallenkuykendall@gmail.com).
Significant behavioral changes affecting chat_template compatibility should
be discussed in an issue first. Small focused fixes welcome as PRs.
License
MIT — see LICENSE.