Crate ferrum_interfaces

Expand description

Core interface definitions for the Ferrum inference framework

This crate carries the stable, GPU-free trait contracts shared across the workspace: model execution, scheduling, KV cache management, tokenization, sampling, and the lifecycle/modality engine traits. Hardware backends live in ferrum-kernels (the Backend<B> trait and its supertraits); only types that compile without GPU features belong here.

Re-exports§

pub use engine::InferenceEngine;
pub use kv_cache::AllocationRequest;
pub use kv_cache::BlockTable;
pub use kv_cache::CacheHandleStats;
pub use kv_cache::KvCacheHandle;
pub use kv_cache::KvCacheManager;
pub use kv_dtype::KvBf16;
pub use kv_dtype::KvDtypeKind;
pub use kv_dtype::KvFp16;
pub use kv_dtype::KvFp8;
pub use kv_dtype::KvInt8;
pub use model_executor::DecodeInput;
pub use model_executor::DecodeOutput;
pub use model_executor::ModelExecutor;
pub use model_executor::PrefillInput;
pub use model_executor::PrefillOutput;
pub use sampler::LogitsProcessor;
pub use sampler::Sampler;
pub use sampler::SamplingConfig;
pub use sampler::SamplingContext;
pub use scheduler::BatchHint;
pub use scheduler::BatchPlan;
pub use scheduler::Scheduler as SchedulerInterface;
pub use tensor::TensorFactory;
pub use tensor::TensorLike;
pub use tensor::TensorOps;
pub use tensor::TensorRef;
pub use tokenizer::IncrementalTokenizer;
pub use tokenizer::Tokenizer;
pub use tokenizer::TokenizerFactory;
pub use tokenizer::TokenizerInfo;

Modules§

engine: Inference engine interfaces — split per modality.
kv_cache: KV-Cache abstraction with handle semantics and block management
kv_dtype: KV cache element-type markers (Dim 5 of the 5-dimension architecture).
model_executor: Model execution interface with clear prefill/decode separation
sampler: Sampling and logits processing interfaces
scheduler: Unified scheduler interface with resource awareness and SLA support
tensor: Tensor abstraction with zero-copy and device-aware semantics
tokenizer: Tokenizer interface for text encoding/decoding

Structs§

BackendConfig: Backend configuration
BatchId: Batch identifier
ClientId: Client identifier for multi-tenancy
ComponentHealth: Individual component health snapshot
ComponentStatus: Aggregated component health map
EngineConfig: Engine configuration
EngineMetrics: Aggregated engine metrics
EngineStatus: Engine status information
HealthStatus: Health check status
InferenceRequest: Inference request
InferenceResponse: Inference response
MemoryUsage: Memory usage statistics
ModelId: Model identifier
ModelInfo: Model information and metadata
RequestId: Request identifier
SamplingParams: Sampling parameters for generation
SchedulerConfig: Scheduler configuration
SchedulerStats: Scheduler statistics
SessionId: Session identifier for stateful interactions
SpecialTokens: Special tokens configuration
StreamChunk: Streaming response chunk
TaskId: Task identifier for execution tasks
TokenId: Token identifier used across the inference pipeline.
TokenizerConfig: Tokenizer configuration

Enums§

DataType: Data type for tensors
Device: Device type for computation
FerrumError: Main error type for Ferrum operations
FinishReason: Reason for completion
ModelSource: Model loading source specification
ModelType: Model type enumeration
Priority: Request priority levels

Type Aliases§

BlockId: Block identifier type
Result: Result type used throughout Ferrum

Crate ferrum_interfaces

Crate ferrum_interfaces Copy item path

Re-exports§

Modules§

Structs§

Enums§

Type Aliases§

Crate ferrum_interfaces