Structured generation (in Rust).
Outlines-core
This package provides the core functionality for structured generation, formerly implemented in Outlines, with a focus on performance and portability, it offers a convenient way to:
-
build regular expressions from JSON schemas
-
construct an
Indexobject by combining aVocabularyand regular expression to efficiently map tokens from a given vocabulary to state transitions in a finite-state automation
Example
Basic example of how it all fits together.
use *;
// Define a JSON schema
let schema = r#"{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"]
}"#;
// Generate a regular expression from it
let regex = regex_from_str?;
// Create `Vocabulary` from pretrained large language model (but manually is also possible)
let vocabulary = from_pretrained?;
// Create new `Index` from regex and a given `Vocabulary`
let index = new?;
let initial_state = index.initial_state;
let allowed_tokens = index.allowed_tokens.expect;
let token_id = allowed_tokens.first.expect;
let next_state = index.next_state;
let final_states = index.final_states;
Vocabulary
You can create a Vocabulary in three ways:
-
Vocabulary::from_pretrained(model, parameters)- Loads from a pretrained model (as in the example above) -
Manual creation - You can create a vocabulary from token mappings:
-
Vocabulary::new(eos_token_id)- Creates an empty vocabulary, then add tokens withtry_insert():let mut vocabulary = new; vocabulary.try_insert?; vocabulary.try_insert?; -
Vocabulary::try_from((eos_token_id, tokens))- Creates a vocabulary by directly providing the token mappings.-
It can be done either with the tokens as strings:
use FxHashMap as HashMap; let eos_token_id: u32 = 50256; let mut tokens: = default; tokens.insert; tokens.insert; let vocabulary = try_from?; -
Or with the tokens as byte vector keys:
use FxHashMap as HashMap; let eos_token_id: u32 = 50256; let mut tokens: = default; tokens.insert; tokens.insert; let vocabulary = try_from?;
-
-
Important: When creating a Vocabulary manually from tokenizer data, ensure tokens are converted to their string representations to replace special tokens that wouldn't be recognized by the DFA.
Python Bindings
Additionally, project provides interfaces to integrate the crate's functionality with Python.
=
=
=
=
=
# Get current state of the Guide:
=
# Get allowed tokens for the current state of the Guide:
=
# Advance Guide to the next state via some token_id and return allowed tokens for that new state:
=
# To check if Guide is finished:
# If it's finished then this assertion holds:
assert ==
How to contribute?
Setup
Fork the repository on GitHub and clone the fork locally:
Create a new virtual environment and install the dependencies in editable mode:
Before pushing your code
If working with Python bindings don't forget to build Rust extension before testing, for example, in debug mode:
Run Python tests:
Run Rust tests:
Or alternatively using Makefile for both:
Finally, run the code style checks:
Or using Makefile:
If necessary you can run benchmarks locally: