Skip to main content

AppState

Struct AppState 

Source
pub struct AppState {
Show 19 fields pub queue: Sender<BatchRequest>, pub model_id: String, pub loaded_at: u64, pub default_sampler: SamplerConfig, pub vocab_bytes: Option<VocabBytes>, pub hidden_size: usize, pub metrics: Arc<Metrics>, pub batch_store: BatchStore, pub batch_disk_store: Arc<BatchStore>, pub batch_queue_tx: BatchQueueSender, pub model_pool: Mutex<ModelPool>, pub prefix_cache: Arc<Mutex<PrefixKvCache>>, pub loras: Arc<RwLock<HashMap<String, Arc<LoadedLora>>>>, pub threads_store: Option<Arc<ThreadStore>>, pub run_queue_tx: Option<RunQueueSender>, pub files_store: Option<Arc<FilesStore>>, pub run_event_tx_broadcast: Option<RunEventSender>, pub responses_store: Option<Arc<ResponseStore>>, pub per_key_rate_limiter: Option<Arc<PerKeyRateLimiter>>,
}
Expand description

Shared application state accessible by all route handlers.

All inference is delegated to the single background worker via queue. Read-only metadata (model ID, default sampler, vocabulary, hidden size) is cached here so handlers never need to reach into the engine.

Fields§

§queue: Sender<BatchRequest>

Channel to send inference requests to the worker.

§model_id: String

The model name/identifier for API responses.

§loaded_at: u64

Unix timestamp (seconds) when the model was loaded.

§default_sampler: SamplerConfig

Default sampler configuration read from EngineConfig at startup.

Route handlers clone this and apply per-request overrides on top.

§vocab_bytes: Option<VocabBytes>

Vocabulary byte table used for grammar-constrained sampling.

None when the model has no tokenizer (should not happen at serve time).

§hidden_size: usize

Hidden-state dimension for the /v1/embeddings endpoint.

§metrics: Arc<Metrics>

Shared metrics store.

§batch_store: BatchStore

In-memory batch job registry (legacy OpenAI batch compat layer).

§batch_disk_store: Arc<BatchStore>

Disk-backed batch job store (C3: disk-spool backend).

§batch_queue_tx: BatchQueueSender

Sender into the disk-backed batch processing queue (C3).

§model_pool: Mutex<ModelPool>

Multi-model LRU warm-pool (C1).

Wrapped in Mutex so admin routes can mutate it without blocking the inference worker. In the current single-worker design the worker also holds the pool; admin mutations use try_lock to avoid deadlocks.

§prefix_cache: Arc<Mutex<PrefixKvCache>>

Prefix KV cache for system-prompt reuse across requests.

When a new request shares a long prefix with a previously-cached sequence (e.g. a fixed system prompt), the matching KV state is restored and only the suffix tokens need a fresh prefill pass.

§loras: Arc<RwLock<HashMap<String, Arc<LoadedLora>>>>

Loaded LoRA adapter registry: stable name → Arc<LoadedLora>.

Populated via POST /admin/loras. Request handlers look up adapters by name and pass them to the worker via BatchRequest::Generate.

§threads_store: Option<Arc<ThreadStore>>

Persistent thread/message/run store for the Assistants API.

None when the Assistants API has not been configured (no --threads-dir flag was passed at startup). Route handlers return 503 in this case.

§run_queue_tx: Option<RunQueueSender>

Sender into the run processing queue for the Assistants API.

None when threads_store is None.

§files_store: Option<Arc<FilesStore>>

Persistent files store for the Files API (/v1/files).

None when the Files API has not been configured.

§run_event_tx_broadcast: Option<RunEventSender>

Broadcast sender for run lifecycle events (SSE streaming).

None when SSE streaming is not enabled.

§responses_store: Option<Arc<ResponseStore>>

In-memory store for Responses API objects.

None when the Responses API has not been enabled. Route handlers return 503 (ModelNotReady) in this case.

§per_key_rate_limiter: Option<Arc<PerKeyRateLimiter>>

Per-API-key token-bucket rate limiter.

None when per-key rate limiting has not been configured.

Implementations§

Source§

impl AppState

Source

pub fn new( queue: Sender<BatchRequest>, model_id: String, default_sampler: SamplerConfig, vocab_bytes: Option<VocabBytes>, hidden_size: usize, ) -> Self

Create new app state from all required fields.

queue must be connected to a live inference worker.

Source

pub fn with_threads(self, store: Arc<ThreadStore>, tx: RunQueueSender) -> Self

Attach a threads store and run queue to this AppState.

Returns self with the threads_store and run_queue_tx fields populated. Designed for use in a builder chain:

let state = AppState::new(...).with_threads(store, tx);
Source

pub fn with_files(self, store: Arc<FilesStore>) -> Self

Attach a files store to this AppState.

Source

pub fn with_run_event_sender(self, tx: RunEventSender) -> Self

Attach a run-event broadcast sender to this AppState.

When set, the run worker broadcasts lifecycle events that SSE handlers can subscribe to.

Source

pub fn with_responses_store(self, store: Arc<ResponseStore>) -> Self

Attach a Responses API store to this AppState.

When set, the /v1/responses routes are fully operational.

Source

pub fn with_per_key_rate_limiter(self, limiter: Arc<PerKeyRateLimiter>) -> Self

Attach a per-API-key rate limiter to this AppState.

When set, the per_key_rate_limit_middleware is applied to all routes in build_app_with_config.

Source

pub fn with_batch_pipeline( queue: Sender<BatchRequest>, model_id: String, default_sampler: SamplerConfig, vocab_bytes: Option<VocabBytes>, hidden_size: usize, batch_disk_store: Arc<DiskBatchStore>, batch_queue_tx: BatchQueueSender, ) -> Self

Create app state with an explicit disk batch store and queue sender.

Used by the server startup code to wire up the full batch pipeline.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<A, B, T> HttpServerConnExec<A, B> for T
where B: Body,