Skip to main content

Crate ferrum_interfaces

Crate ferrum_interfaces 

Source
Expand description

Core interface definitions for the Ferrum inference framework

This crate carries the stable, GPU-free trait contracts shared across the workspace: model execution, scheduling, KV cache management, tokenization, sampling, and the lifecycle/modality engine traits. Hardware backends live in ferrum-kernels (the Backend<B> trait and its supertraits); only types that compile without GPU features belong here.

Re-exports§

pub use engine::InferenceEngine;
pub use kv_cache::AllocationRequest;
pub use kv_cache::BlockTable;
pub use kv_cache::CacheHandleStats;
pub use kv_cache::KvCacheHandle;
pub use kv_cache::KvCacheManager;
pub use kv_dtype::KvBf16;
pub use kv_dtype::KvDtypeKind;
pub use kv_dtype::KvFp16;
pub use kv_dtype::KvFp8;
pub use kv_dtype::KvInt8;
pub use model_executor::DecodeInput;
pub use model_executor::DecodeOutput;
pub use model_executor::ModelExecutor;
pub use model_executor::PrefillInput;
pub use model_executor::PrefillOutput;
pub use sampler::LogitsProcessor;
pub use sampler::Sampler;
pub use sampler::SamplingConfig;
pub use sampler::SamplingContext;
pub use scheduler::BatchHint;
pub use scheduler::BatchPlan;
pub use scheduler::Scheduler as SchedulerInterface;
pub use tensor::TensorFactory;
pub use tensor::TensorLike;
pub use tensor::TensorOps;
pub use tensor::TensorRef;
pub use tokenizer::IncrementalTokenizer;
pub use tokenizer::Tokenizer;
pub use tokenizer::TokenizerFactory;
pub use tokenizer::TokenizerInfo;

Modules§

engine
Inference engine interfaces — split per modality.
kv_cache
KV-Cache abstraction with handle semantics and block management
kv_dtype
KV cache element-type markers (Dim 5 of the 5-dimension architecture).
model_executor
Model execution interface with clear prefill/decode separation
sampler
Sampling and logits processing interfaces
scheduler
Unified scheduler interface with resource awareness and SLA support
tensor
Tensor abstraction with zero-copy and device-aware semantics
tokenizer
Tokenizer interface for text encoding/decoding

Structs§

BackendConfig
Backend configuration
BatchId
Batch identifier
ClientId
Client identifier for multi-tenancy
ComponentHealth
Individual component health snapshot
ComponentStatus
Aggregated component health map
EngineConfig
Engine configuration
EngineMetrics
Aggregated engine metrics
EngineStatus
Engine status information
HealthStatus
Health check status
InferenceRequest
Inference request
InferenceResponse
Inference response
MemoryUsage
Memory usage statistics
ModelId
Model identifier
ModelInfo
Model information and metadata
RequestId
Request identifier
SamplingParams
Sampling parameters for generation
SchedulerConfig
Scheduler configuration
SchedulerStats
Scheduler statistics
SessionId
Session identifier for stateful interactions
SpecialTokens
Special tokens configuration
StreamChunk
Streaming response chunk
TaskId
Task identifier for execution tasks
TokenId
Token identifier used across the inference pipeline.
TokenizerConfig
Tokenizer configuration

Enums§

DataType
Data type for tensors
Device
Device type for computation
FerrumError
Main error type for Ferrum operations
FinishReason
Reason for completion
ModelSource
Model loading source specification
ModelType
Model type enumeration
Priority
Request priority levels

Type Aliases§

BlockId
Block identifier type
Result
Result type used throughout Ferrum