Skip to main content

Crate llmsim

Crate llmsim 

Source
Expand description

§LLMSim - LLM Traffic Simulator

A lightweight, high-performance LLM API simulator that replicates the traffic shape of real LLM APIs without running actual models.

§Features

  • Realistic latency simulation (time-to-first-token, inter-token delays)
  • Streaming support (Server-Sent Events)
  • Accurate token counting using tiktoken-rs
  • Error injection for testing error handling
  • Multiple response generators (lorem, echo, fixed, random)
  • Realistic model profiles from models.dev with context windows

§Usage

§As a CLI

# Start the server
llmsim serve --port 8080

§As a Library

use llmsim::{
    openai::{ChatCompletionRequest, Message},
    generator::LoremGenerator,
    latency::LatencyProfile,
};

// Create a generator
let generator = LoremGenerator::new(100);

// Create a latency profile
let latency = LatencyProfile::gpt5();

// Count tokens
let tokens = llmsim::tokens::count_tokens("Hello, world!", "gpt-5").unwrap();

Re-exports§

pub use errors::ErrorConfig;
pub use errors::ErrorInjector;
pub use errors::SimulatedError;
pub use generator::create_generator;
pub use generator::EchoGenerator;
pub use generator::FixedGenerator;
pub use generator::LoremGenerator;
pub use generator::RandomWordGenerator;
pub use generator::ResponseGenerator;
pub use generator::SequenceGenerator;
pub use latency::LatencyProfile;
pub use responses_stream::ResponsesTokenStream;
pub use responses_stream::ResponsesTokenStreamBuilder;
pub use script::OnExhausted;
pub use script::Script;
pub use script::ScriptError;
pub use script::ScriptSpec;
pub use script::ScriptedResponse;
pub use script::SimError;
pub use script::SimToolCall;
pub use script::SimTurn;
pub use stats::new_shared_stats;
pub use stats::EndpointType;
pub use stats::SharedStats;
pub use stats::Stats;
pub use stats::StatsSnapshot;
pub use stream::TokenStream;
pub use stream::TokenStreamBuilder;
pub use tokens::count_tokens;
pub use tokens::count_tokens_default;
pub use tokens::TokenCounter;
pub use tokens::TokenError;

Modules§

cli
CLI module for LLMSim server functionality.
errors
generator
latency
openai
openresponses
responses_stream
script
script_stream
stats
Statistics module for tracking real-time metrics.
stream
tokens