Trait ChatModelExt

Source
pub trait ChatModelExt: CreateChatSession {
    // Provided methods
    fn chat(&self) -> Chat<Self>
       where Self: Clone { ... }
    fn task(&self, description: impl ToString) -> Task<Self>
       where Self: Clone { ... }
    fn boxed_chat_model(self) -> BoxedChatModel
       where Self: ChatModel<Error: Send + Sync + Error + 'static, ChatSession: ChatSession<Error: Error + Send + Sync + 'static> + Clone + Send + Sync + 'static> + Sized + Send + Sync + 'static { ... }
    fn boxed_typed_chat_model<T>(self) -> BoxedStructuredChatModel<T>
       where Self: StructuredChatModel<Self::DefaultConstraints, Error: Send + Sync + Error + 'static, ChatSession: ChatSession<Error: Error + Send + Sync + 'static> + Clone + Send + Sync + 'static> + CreateDefaultChatConstraintsForType<T> + Sized + Send + Sync + 'static,
             T: 'static { ... }
}
Expand description

An extension trait for chat models with helpers for handling chat sessions. This trait is implemented automatically for all crate::ChatModels.

Provided Methods§

Source

fn chat(&self) -> Chat<Self>
where Self: Clone,

Create a new chat session with the model. Let’s start with a simple chat application:

// Before you create a chat session, you need a model. Llama::new_chat will create a good default chat model.
let model = Llama::new_chat().await.unwrap();
// Then you can build a chat session that uses that model
let mut chat = model.chat()
    // The builder exposes methods for settings like the system prompt and constraints the bot response must follow
    .with_system_prompt("The assistant will act like a pirate");

loop {
    // To use the chat session, you need to add messages to it
    let mut response_stream = chat(&prompt_input("\n> ").unwrap());
    // And then display the response stream to the user
    response_stream.to_std_out().await.unwrap();
}

If you run the application, you may notice that it takes more time for the assistant to start responding to long prompts. The LLM needs to read and transform the prompt into a format it understands before it can start generating a response. Kalosm stores that state in a chat session, which can be saved and loaded from the filesystem to make loading existing chat sessions faster.

You can save and load chat sessions from the filesystem using the ChatSession::to_bytes and [ChatBuilder::from_bytes] methods:

// First, create a model to chat with
let model = Llama::new_chat().await.unwrap();
// Then try to load the chat session from the filesystem
let save_path = std::path::PathBuf::from("./chat.llama");
let mut chat = model.chat();
if let Some(old_session) = std::fs::read(&save_path)
    .ok()
    .and_then(|bytes| LlamaChatSession::from_bytes(&bytes).ok())
{
    chat = chat.with_session(old_session);
}

// Then you can add messages to the chat session as usual
let mut response_stream = chat(&prompt_input("\n> ").unwrap());
// And then display the response stream to the user
response_stream.to_std_out().await.unwrap();

// After you are done, you can save the chat session to the filesystem
let session = chat.session().unwrap();
let bytes = session.to_bytes().unwrap();
std::fs::write(&save_path, bytes).unwrap();

LLMs are powerful because of their generality, but sometimes you need more control over the output. For example, you might want the assistant to start with a certain phrase, or to follow a certain format.

In kalosm, you can use constraints to guide the model’s response. Constraints are a way to specify the format of the output. When generating with constraints, the model will always respond with the specified format.

Let’s create a chat application that uses constraints to guide the assistant’s response to always start with “Yes!”:

let model = Llama::new_chat().await.unwrap();
// Create constraints that parses Yes! and then stops on the end of the assistant's response
let constraints = LiteralParser::new("Yes!")
    .then(model.default_assistant_constraints());
// Create a chat session with the model and the constraints
let mut chat = model.chat();

// Chat with the user
loop {
    let mut output_stream = chat(&prompt_input("\n> ").unwrap()).with_constraints(constraints.clone());
    output_stream.to_std_out().await.unwrap();
}
Source

fn task(&self, description: impl ToString) -> Task<Self>
where Self: Clone,

Create a new task with the model.

§Tasks

Any model that implements ChatModel or StructuredChatModel can be used with tasks to repeatedly perform work with the same system prompt.

You can create a task with the ChatModelExt::task method with a description of the task and then call the task like a function to start generating a response:

use kalosm::language::*;

#[tokio::main]
async fn main() {
    let model = Llama::new_chat().await.unwrap();
    let task = model.task("You are an editing assistant who offers suggestions for improving the quality of the text. You will be given some text and will respond with a list of suggestions for how to improve the text.");
    let mut stream = task("this isnt correct. or is it?");
    stream.to_std_out().await.unwrap();
}

Once you have the response builder, you can modify it with any of the methods on [ChatResponseBuilder]. For example, you can change the sampler with [ChatResponseBuilder::with_sampler]:

use kalosm::language::*;

#[tokio::main]
async fn main() {
    let model = Llama::new_chat().await.unwrap();
    let task = model.task("You are an editing assistant who offers suggestions for improving the quality of the text. You will be given some text and will respond with a list of suggestions for how to improve the text.");
    let mut stream = task("this isnt correct. or is it?").with_sampler(GenerationParameters::default());
    stream.to_std_out().await.unwrap();
}
§Structured Generation

You can use structured generation to force the output of the task to fit a specific format. Before you add structured generation to the tasks, you need to define a parser.

§Defining the parser

There are a few different ways create a parser for structured generation:

  1. Derive a parser for your data
  2. Create a parser from the set of prebuilt combinators
  3. Create a parser from a regex
§Deriving a parser from a struct

The simplest way to get started is to derive a parser for your data:

#[derive(Parse, Clone)]
struct Pet {
    name: String,
    age: u32,
    description: String,
}

Then you can generate text that works with the parser in a Task:

#[tokio::main]
async fn main() {
    // First create a model
    let model = Llama::new_chat().await.unwrap();
    // Then create a parser for your data. Any type that implements the `Parse` trait has the `new_parser` method
    let parser = Pet::new_parser();
    // Create a task with the constraints
    let task = model.task("You generate realistic JSON placeholders for pets in the form {\"name\": \"Pet name\", \"age\": 0, \"description\": \"Pet description\"}")
        // The task constraints must be clone. If they don't implement Clone, you can wrap them in an Arc
        .with_constraints(Arc::new(parser));
    // Then run the task
    let pet: Pet = task("Ruffles is a 3 year old adorable dog").await.unwrap();
    println!("{pet:?}");
}
§Creating a Parser from the Set of Prebuilt Combinators

Kalosm also provides a set of prebuilt combinators for creating more complex parsers. You can use these combinators to create a parser with a custom format:

use kalosm::language::*;
use std::sync::Arc;

#[tokio::main]
async fn main() {
    // First create a model
    let model = Llama::new_chat().await.unwrap();
    // Then create a parser for your custom format
    let parser = LiteralParser::from("[")
        .ignore_output_then(String::new_parser())
        .then_literal(", ")
        .then(u8::new_parser())
        .then_literal(", ")
        .then(String::new_parser())
        .then_literal("]");
    // Create a task with the constraints
    let task = model.task("You generate realistic JSON placeholders for pets in the form [\"Pet name\", age number, \"Pet description\"]")
        // The task constraints must be clone. If they don't implement Clone, you can wrap them in an Arc
        .with_constraints(Arc::new(parser));
    // Then run the task
    let ((name, age), description) = task("Ruffles is a 3 year old adorable dog").await.unwrap();
    println!("{name} {age} {description}");
}
§Creating a Parser from a Regex

You can also create a parser from a regex:

use kalosm::language::*;
use std::sync::Arc;

#[tokio::main]
async fn main() {
    // First create a model
    let model = Llama::new_chat().await.unwrap();
    // Then create a parser for your data. Any
    let parser = RegexParser::new(r"\[(\w+), (\d+), (\w+)\]").unwrap();
    // Create a task with the constraints
    let task = model.task("You generate realistic JSON placeholders for pets in the form [\"Pet name\", age number, \"Pet description\"]")
        // The task constraints must be clone. If they don't implement Clone, you can wrap them in an Arc
        .with_constraints(Arc::new(parser));
    // Finally, run the task. Unlike derived and custom parsers, regex parsers do not provide a useful output type
    task("Ruffles is a 3 year old adorable dog").to_std_out().await.unwrap();
}
§Tasks with Constraints

Once you have a parser, you can force the model to generate text that conforms to that parser with the Task::with_constraints:

use kalosm::language::*;
use std::sync::Arc;

#[derive(Parse, Clone, Debug)]
struct Pet {
    name: String,
    age: u32,
    description: String,
}

#[tokio::main]
async fn main() {
    // First create a model
    let model = Llama::new_chat().await.unwrap();
    // Then create a parser for your data.
    // Any type that implements the `Parse` trait has the `new_parser` method
    let parser = Pet::new_parser();
    // Create a task with the constraints
    let task = model.task("You generate realistic JSON placeholders for pets in the form {\"name\": \"Pet name\", \"age\": 0, \"description\": \"Pet description\"}")
            // The task constraints must be clone. If they don't implement Clone, you can wrap them in an Arc
        .with_constraints(Arc::new(parser));
    // Then run the task
    let pet: Pet = task("Ruffles is a 3 year old adorable dog").await.unwrap();
    println!("{pet:?}");
}
Source

fn boxed_chat_model(self) -> BoxedChatModel
where Self: ChatModel<Error: Send + Sync + Error + 'static, ChatSession: ChatSession<Error: Error + Send + Sync + 'static> + Clone + Send + Sync + 'static> + Sized + Send + Sync + 'static,

Erase the type of the chat model. This can be used to make multiple implementations of ChatModel compatible with the same type.

§Example
let model = loop {
    let input = prompt_input("Choose Model (gpt, claude, llama, or phi): ").unwrap();
    match input.to_lowercase().as_str() {
        "gpt" => {
            break OpenAICompatibleChatModel::builder()
                .with_gpt_4o_mini()
                .build()
                .boxed_chat_model()
        }
        "claude" => {
            break AnthropicCompatibleChatModel::builder()
                .with_claude_3_5_haiku()
                .build()
                .boxed_chat_model()
        }
        "llama" => {
            break Llama::builder()
                .with_source(LlamaSource::llama_3_1_8b_chat())
                .build()
                .await
                .unwrap()
                .boxed_chat_model()
        }
        "phi" => {
            break Llama::builder()
                .with_source(LlamaSource::phi_3_5_mini_4k_instruct())
                .build()
                .await
                .unwrap()
                .boxed_chat_model()
        }
        _ => {}
    }
};

let mut chat = model
    .chat()
    .with_system_prompt("The assistant will act like a pirate");

// Then chat with the session
loop {
    chat(&prompt_input("\n> ").unwrap())
        .to_std_out()
        .await
        .unwrap();
}
Source

fn boxed_typed_chat_model<T>(self) -> BoxedStructuredChatModel<T>
where Self: StructuredChatModel<Self::DefaultConstraints, Error: Send + Sync + Error + 'static, ChatSession: ChatSession<Error: Error + Send + Sync + 'static> + Clone + Send + Sync + 'static> + CreateDefaultChatConstraintsForType<T> + Sized + Send + Sync + 'static, T: 'static,

Erase the type of the structured chat model. This can be used to make multiple implementations of StructuredChatModel compatible with the same type.

§Example
// You can derive an efficient parser for your struct with the `Parse` trait
// OpenAI doesn't support root anyof schemas, so we need to wrap the constraints in a struct
#[derive(Parse, Clone, Schema, Deserialize, Debug)]
struct Response {
    action: Action,
}

#[derive(Parse, Clone, Schema, Deserialize, Debug)]
#[serde(tag = "type")]
#[serde(content = "data")]
pub enum Action {
    Do(String),
    Say(String),
}

let model: BoxedStructuredChatModel<Response> = loop {
    let input = prompt_input("Choose Model (gpt, llama, or phi): ").unwrap();
    match input.to_lowercase().as_str() {
        "gpt" => {
            break OpenAICompatibleChatModel::builder()
                .with_gpt_4o_mini()
                .build()
                .boxed_typed_chat_model()
        }
        "llama" => {
            break Llama::builder()
                .with_source(LlamaSource::llama_3_1_8b_chat())
                .build()
                .await
                .unwrap()
                .boxed_typed_chat_model()
        }
        "phi" => {
            break Llama::builder()
                .with_source(LlamaSource::phi_3_5_mini_4k_instruct())
                .build()
                .await
                .unwrap()
                .boxed_typed_chat_model()
        }
        _ => {}
    }
};

let mut chat = model
    .chat()
    .with_system_prompt("The assistant will act like a pirate. You will respond with either something you do or something you say. Respond with JSON in the format { \"type\": \"Say\", \"data\": \"hello\" } or { \"type\": \"Do\", \"data\": \"run away\" }");

// Then chat with the session
loop {
    let mut response = chat(&prompt_input("\n> ").unwrap()).typed::<Response>();
    response.to_std_out().await.unwrap();
    println!("{:?}", response.await);
}

Implementors§