Crate ferrum_interfaces

Expand description

Core interface definitions for the Ferrum inference framework

This crate defines all the stable trait interfaces that different components of Ferrum implement. It provides a clean abstraction layer that allows for pluggable implementations of tokenizers, model executors, schedulers, cache managers, and other core components.

The interfaces are designed following the principles outlined in the refactoring documentation:

Single responsibility with stable boundaries
Zero-copy and handle semantics
Capability discovery driven
Performance-first API design

Re-exports§

pub use backend::BackendCapabilities;
pub use backend::ComputeBackend;
pub use backend::WeightLoader;
pub use decode_backend::DecodeBackend;
pub use engine::InferenceEngine;
pub use kv_cache::AllocationRequest;
pub use kv_cache::BlockTable;
pub use kv_cache::CacheHandleStats;
pub use kv_cache::KvCacheHandle;
pub use kv_cache::KvCacheManager;
pub use memory::DeviceMemoryManager;
pub use memory::MemoryHandle;
pub use memory::StreamHandle;
pub use model_builder::BuildOptions;
pub use model_builder::ModelBuilder;
pub use model_executor::DecodeInput;
pub use model_executor::DecodeOutput;
pub use model_executor::ModelExecutor;
pub use model_executor::PrefillInput;
pub use model_executor::PrefillOutput;
pub use sampler::LogitsProcessor;
pub use sampler::Sampler;
pub use sampler::SamplingConfig;
pub use sampler::SamplingContext;
pub use scheduler::BatchHint;
pub use scheduler::BatchPlan;
pub use scheduler::Scheduler as SchedulerInterface;
pub use tensor::TensorFactory;
pub use tensor::TensorLike;
pub use tensor::TensorOps;
pub use tensor::TensorRef;
pub use tokenizer::IncrementalTokenizer;
pub use tokenizer::Tokenizer;
pub use tokenizer::TokenizerFactory;
pub use tokenizer::TokenizerInfo;
pub use transformer::TransformerConfig;
pub use transformer::TransformerWeights;
pub use kernel_ops::ActivationOps;
pub use kernel_ops::AttentionOps;
pub use kernel_ops::AttentionParams;
pub use kernel_ops::KernelOps;
pub use kernel_ops::KernelOpsDispatch;
pub use kernel_ops::LinearOps;
pub use kernel_ops::NormOps;
pub use kernel_ops::PositionOps;
pub use kernel_ops::QuantScheme;
pub use kernel_ops::RoPEConfig;
pub use kernel_ops::SamplingOps;
pub use kernel_ops::SamplingParams as KernelSamplingParams;

Modules§

backend: Backend abstraction split into compute and weight loading concerns
decode_backend: Decode backend abstraction.
engine: Inference engine interface with streaming and batch support
kernel_ops: Kernel backend abstraction layer for LLM-specific fused operations.
kv_cache: KV-Cache abstraction with handle semantics and block management
memory: Memory management interfaces for device memory operations
model_builder: Model builder interface for constructing model executors
model_executor: Model execution interface with clear prefill/decode separation
sampler: Sampling and logits processing interfaces
scheduler: Unified scheduler interface with resource awareness and SLA support
tensor: Tensor abstraction with zero-copy and device-aware semantics
tokenizer: Tokenizer interface for text encoding/decoding
transformer: Transformer model weight abstraction.

Structs§

BackendConfig: Backend configuration
BatchId: Batch identifier
ClientId: Client identifier for multi-tenancy
ComponentHealth: Individual component health snapshot
ComponentStatus: Aggregated component health map
EngineConfig: Engine configuration
EngineMetrics: Aggregated engine metrics
EngineStatus: Engine status information
HealthStatus: Health check status
InferenceRequest: Inference request
InferenceResponse: Inference response
MemoryUsage: Memory usage statistics
ModelId: Model identifier
ModelInfo: Model information and metadata
RequestId: Request identifier
SamplingParams: Sampling parameters for generation
SchedulerConfig: Scheduler configuration
SchedulerStats: Scheduler statistics
SessionId: Session identifier for stateful interactions
SpecialTokens: Special tokens configuration
StreamChunk: Streaming response chunk
TaskId: Task identifier for execution tasks
TokenId: Token identifier used across the inference pipeline.
TokenizerConfig: Tokenizer configuration

Enums§

DataType: Data type for tensors
Device: Device type for computation
FerrumError: Main error type for Ferrum operations
FinishReason: Reason for completion
ModelSource: Model loading source specification
ModelType: Model type enumeration
Priority: Request priority levels

Type Aliases§

BlockId: Block identifier type
Result: Result type used throughout Ferrum

Crate ferrum_interfaces

Crate ferrum_interfaces Copy item path

Re-exports§

Modules§

Structs§

Enums§

Type Aliases§

Crate ferrum_interfaces