shimmyjinja 0.5.0

Minimal Jinja-like engine for Hugging Face chat_template strings
Documentation
  • Coverage
  • 66.67%
    10 out of 15 items documented2 out of 9 items with examples
  • Size
  • Source code size: 181.3 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 844.55 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 2s Average build duration of successful builds.
  • all releases: 8s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • Michael-A-Kuykendall/shimmyjinja
    0 1 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • Michael-A-Kuykendall

shimmyjinja

crates.io docs.rs CI License: MIT

Pure-Rust Jinja2 engine for Hugging Face chat_template strings — the layer that turns a raw GGUF file's template field into a correctly-formatted LLM prompt, no Python required.

Part of the shimmy inference ecosystem:

Crate Role
shimmyjinja ← you are here Jinja2 template engine for chat_template strings
shimmytok GGUF-native tokenizer (BPE/SentencePiece)
airframe WebGPU inference server — uses both shimmytok and shimmyjinja
shimmy OpenAI-compatible server powered by Airframe

What it does

Every GGUF model file ships a tokenizer.chat_template key — a Jinja2 template string that controls how a list of chat messages gets formatted into the single prompt string the model expects. Example (TinyLlama):

{% for message in messages %}
{% if message['role'] == 'user' %}
<|user|>
{{ message['content'] }}{{ eos_token }}
{% endif %}
{% endfor %}
{% if add_generation_prompt %}<|assistant|>
{% endif %}

shimmyjinja evaluates templates like this in pure Rust. No Python process, no jinja2 dependency, no subprocess call to HuggingFace transformers.


Supported Jinja2 subset

Everything used by real production chat_template strings today:

Feature Example
for loops {% for message in messages %}...{% endfor %}
if / elif / else {% if message['role'] == 'user' %}
String concatenation '<s>' + message['content']
Equality / comparison ==, !=, <, >, <=, >=
Boolean logic and, or, not
Membership test in, not in
Inline ternary 'yes' if flag else 'no'
namespace() {% set ns = namespace(found=false) %}
set / dotted set {% set ns.found = true %}
raise_exception() Raises on invalid usage
Method calls message.get('content', '')
Context variables bos_token, eos_token, add_generation_prompt
Bracket access message['role']

Supported model families (tested with real GGUF files)

TinyLlama · Llama 3.2 · Mistral · Gemma 2 · Phi-3 / Phi-3.5 · Qwen 2 · Qwen 3 · DeepSeek-LLM


Quick start

[dependencies]

shimmyjinja = "0.5"

use shimmyjinja::{ChatMessage, RenderContext, render_chat_template_with_context};

let template = r#"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{'<|im_start|>assistant\n'}}{% endif %}"#;

let messages = vec![
    ChatMessage { role: "user".into(), content: "Hello!".into() },
];

let mut ctx = RenderContext::new();
ctx.set_var("bos_token", "<s>");
ctx.set_var("eos_token", "</s>");
ctx.set_flag("add_generation_prompt", true);

let prompt = render_chat_template_with_context(template, &messages, &ctx);
// "<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\n"

For a non-panicking variant that returns Result:

use shimmyjinja::try_render_chat_template_with_context;

match try_render_chat_template_with_context(template, &messages, &ctx) {
    Ok(prompt) => println!("{}", prompt),
    Err(e) => eprintln!("Template error: {}", e), // fall back gracefully
}

Using with a GGUF file

Pair with shimmytok to extract both the template and token strings directly from the model file:

use shimmytok::Tokenizer;

let tok = Tokenizer::from_gguf_file("model.gguf")?;
let template = tok.chat_template().unwrap();
let bos = tok.bos_token();
let eos = tok.eos_token();
// then render with shimmyjinja as above

Testing

cargo test          # 87 tests: unit + integration + property-based + doc-tests

The test suite covers:

  • Unit tests — lexer, parser, evaluator edge cases (16 tests)
  • Real model templates — embedded verbatim chat_template strings from 6 model families, disk-free (21 tests)
  • GGUF extraction tests — end-to-end with real GGUF files on disk, skip-if-missing (9 tests)
  • Property-based tests — 13 proptest properties: determinism, no content loss, no unresolved tags, token literal pass-through, generation-prompt suppression, long content, empty input
  • Doc-tests — 2 tests embedded in the API documentation

Design goals

  • Zero dependencies at runtime — no proc-macro, no heavy crates. The [dependencies] section of Cargo.toml is empty.
  • cargo publish clean — no build.rs, no C/C++ compilation, no bindgen.
  • Explicit newline semantics — no newlines are invented by the engine; all whitespace comes from the template string after JSON decoding.
  • Fail loudly on bad templatesparse() returns Err rather than silently producing wrong output.

Governance & contributions

Maintainer: Michael Kuykendall (michaelallenkuykendall@gmail.com).

Significant behavioral changes affecting chat_template compatibility should be discussed in an issue first. Small focused fixes welcome as PRs.

License

MIT — see LICENSE.