Skip to main content

Crate ferrum_interfaces

Crate ferrum_interfaces 

Source
Expand description

Core interface definitions for the Ferrum inference framework

This crate defines all the stable trait interfaces that different components of Ferrum implement. It provides a clean abstraction layer that allows for pluggable implementations of tokenizers, model executors, schedulers, cache managers, and other core components.

The interfaces are designed following the principles outlined in the refactoring documentation:

  • Single responsibility with stable boundaries
  • Zero-copy and handle semantics
  • Capability discovery driven
  • Performance-first API design

Re-exports§

pub use backend::BackendCapabilities;
pub use backend::ComputeBackend;
pub use backend::WeightLoader;
pub use decode_backend::DecodeBackend;
pub use engine::InferenceEngine;
pub use kv_cache::AllocationRequest;
pub use kv_cache::BlockTable;
pub use kv_cache::CacheHandleStats;
pub use kv_cache::KvCacheHandle;
pub use kv_cache::KvCacheManager;
pub use memory::DeviceMemoryManager;
pub use memory::MemoryHandle;
pub use memory::StreamHandle;
pub use model_builder::BuildOptions;
pub use model_builder::ModelBuilder;
pub use model_executor::DecodeInput;
pub use model_executor::DecodeOutput;
pub use model_executor::ModelExecutor;
pub use model_executor::PrefillInput;
pub use model_executor::PrefillOutput;
pub use sampler::LogitsProcessor;
pub use sampler::Sampler;
pub use sampler::SamplingConfig;
pub use sampler::SamplingContext;
pub use scheduler::BatchHint;
pub use scheduler::BatchPlan;
pub use scheduler::Scheduler as SchedulerInterface;
pub use tensor::TensorFactory;
pub use tensor::TensorLike;
pub use tensor::TensorOps;
pub use tensor::TensorRef;
pub use tokenizer::IncrementalTokenizer;
pub use tokenizer::Tokenizer;
pub use tokenizer::TokenizerFactory;
pub use tokenizer::TokenizerInfo;
pub use transformer::TransformerConfig;
pub use transformer::TransformerWeights;
pub use kernel_ops::ActivationOps;
pub use kernel_ops::AttentionOps;
pub use kernel_ops::AttentionParams;
pub use kernel_ops::KernelOps;
pub use kernel_ops::KernelOpsDispatch;
pub use kernel_ops::LinearOps;
pub use kernel_ops::NormOps;
pub use kernel_ops::PositionOps;
pub use kernel_ops::QuantScheme;
pub use kernel_ops::RoPEConfig;
pub use kernel_ops::SamplingOps;
pub use kernel_ops::SamplingParams as KernelSamplingParams;

Modules§

backend
Backend abstraction split into compute and weight loading concerns
decode_backend
Decode backend abstraction.
engine
Inference engine interface with streaming and batch support
kernel_ops
Kernel backend abstraction layer for LLM-specific fused operations.
kv_cache
KV-Cache abstraction with handle semantics and block management
memory
Memory management interfaces for device memory operations
model_builder
Model builder interface for constructing model executors
model_executor
Model execution interface with clear prefill/decode separation
sampler
Sampling and logits processing interfaces
scheduler
Unified scheduler interface with resource awareness and SLA support
tensor
Tensor abstraction with zero-copy and device-aware semantics
tokenizer
Tokenizer interface for text encoding/decoding
transformer
Transformer model weight abstraction.

Structs§

BackendConfig
Backend configuration
BatchId
Batch identifier
ClientId
Client identifier for multi-tenancy
ComponentHealth
Individual component health snapshot
ComponentStatus
Aggregated component health map
EngineConfig
Engine configuration
EngineMetrics
Aggregated engine metrics
EngineStatus
Engine status information
HealthStatus
Health check status
InferenceRequest
Inference request
InferenceResponse
Inference response
MemoryUsage
Memory usage statistics
ModelId
Model identifier
ModelInfo
Model information and metadata
RequestId
Request identifier
SamplingParams
Sampling parameters for generation
SchedulerConfig
Scheduler configuration
SchedulerStats
Scheduler statistics
SessionId
Session identifier for stateful interactions
SpecialTokens
Special tokens configuration
StreamChunk
Streaming response chunk
TaskId
Task identifier for execution tasks
TokenId
Token identifier used across the inference pipeline.
TokenizerConfig
Tokenizer configuration

Enums§

DataType
Data type for tensors
Device
Device type for computation
FerrumError
Main error type for Ferrum operations
FinishReason
Reason for completion
ModelSource
Model loading source specification
ModelType
Model type enumeration
Priority
Request priority levels

Type Aliases§

BlockId
Block identifier type
Result
Result type used throughout Ferrum