pub struct GenericBackend<D: Driver, A: ModelArch<D>> { /* private fields */ }Expand description
Generic backend that pairs a Driver with a ModelArch.
Implements EmbedBackend by calling arch.forward(driver, encodings).
The driver provides hardware-specific compute primitives; the architecture
orchestrates them into a full forward pass.
§Lifetime invariant
_mmap must be declared after arch so it is dropped last. The
architecture’s weight tensors reference pages in the memory-mapped file
via zero-copy Metal buffers; dropping the mmap first would invalidate them.
Implementations§
Source§impl<D: Driver, A: ModelArch<D>> GenericBackend<D, A>
impl<D: Driver, A: ModelArch<D>> GenericBackend<D, A>
Sourcepub fn new(
driver: D,
arch: A,
max_tokens: usize,
is_gpu: bool,
mmap: Mmap,
) -> Self
pub fn new( driver: D, arch: A, max_tokens: usize, is_gpu: bool, mmap: Mmap, ) -> Self
Create a new generic backend from a driver, architecture, and mmap.
The mmap must be the memory-mapped safetensors file whose pages back
the weight tensors stored in arch.
Create a new generic backend.
For GPU backends, runs a warm-up forward pass to prime the buffer pool. This is skipped for large models (max_tokens > 1024) where the warm-up cost exceeds the benefit. Create a new generic backend.
max_batch controls how many encodings are sent in each forward pass.
Metal: 32 (optimal for M2 Max AMX). CUDA: 128+ (needs more work to
saturate 128 SMs on RTX 4090).
Trait Implementations§
Source§impl<D, A> EmbedBackend for GenericBackend<D, A>
impl<D, A> EmbedBackend for GenericBackend<D, A>
Source§fn embed_batch(&self, encodings: &[Encoding]) -> Result<Vec<Vec<f32>>>
fn embed_batch(&self, encodings: &[Encoding]) -> Result<Vec<Vec<f32>>>
Source§fn supports_clone(&self) -> bool
fn supports_clone(&self) -> bool
Source§fn clone_backend(&self) -> Box<dyn EmbedBackend>
fn clone_backend(&self) -> Box<dyn EmbedBackend>
Source§fn max_tokens(&self) -> usize
fn max_tokens(&self) -> usize
Auto Trait Implementations§
impl<D, A> Freeze for GenericBackend<D, A>
impl<D, A> RefUnwindSafe for GenericBackend<D, A>where
D: RefUnwindSafe,
A: RefUnwindSafe,
impl<D, A> Send for GenericBackend<D, A>where
A: Send,
impl<D, A> Sync for GenericBackend<D, A>where
A: Sync,
impl<D, A> Unpin for GenericBackend<D, A>
impl<D, A> UnsafeUnpin for GenericBackend<D, A>where
D: UnsafeUnpin,
A: UnsafeUnpin,
impl<D, A> UnwindSafe for GenericBackend<D, A>where
D: UnwindSafe,
A: UnwindSafe,
Blanket Implementations§
Source§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
Source§type ArchivedMetadata = ()
type ArchivedMetadata = ()
Source§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be
downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further
downcast into Rc<ConcreteType> where ConcreteType implements Trait.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &Any’s vtable from &Trait’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &mut Any’s vtable from &mut Trait’s.Source§impl<T> DowncastSend for T
impl<T> DowncastSend for T
Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> LayoutRaw for T
impl<T> LayoutRaw for T
Source§fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
Source§impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
Source§unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
Source§fn resolve_niched(out: Place<NichedOption<T, N1>>)
fn resolve_niched(out: Place<NichedOption<T, N1>>)
out indicating that a T is niched.