pub struct CrossModalTokenizer { /* private fields */ }Expand description
Unified cross-modal tokenizer.
Manages one ModalityTokenizer per registered modality and applies:
- Per-modality linear projection (input → shared_dim)
- Modality-type embedding offset (for identity disambiguation)
- Shared alignment projection (shared_dim → shared_dim)
- Nearest-neighbour codebook quantization
Implementations§
Source§impl CrossModalTokenizer
impl CrossModalTokenizer
Sourcepub fn new(shared_dim: usize) -> TokenizerResult<Self>
pub fn new(shared_dim: usize) -> TokenizerResult<Self>
Create a new cross-modal tokenizer with the given shared embedding dimension.
Sourcepub fn add_modality(
&mut self,
config: ModalityTokenizerConfig,
) -> TokenizerResult<()>
pub fn add_modality( &mut self, config: ModalityTokenizerConfig, ) -> TokenizerResult<()>
Register a new modality.
The config.token_dim must equal self.shared_dim.
Sourcepub fn tokenize(
&self,
modality: &ModalityKind,
input: &Array1<f32>,
) -> TokenizerResult<CrossModalToken>
pub fn tokenize( &self, modality: &ModalityKind, input: &Array1<f32>, ) -> TokenizerResult<CrossModalToken>
Tokenize a single-modality input.
Steps:
- Encode raw input →
shared_dimembedding (per-modality encoder + GELU) - Add modality-type embedding offset
- Apply shared alignment projection
- Quantize against per-modality codebook
Sourcepub fn tokenize_batch(
&self,
inputs: &[(ModalityKind, Array1<f32>)],
) -> TokenizerResult<CrossModalSequence>
pub fn tokenize_batch( &self, inputs: &[(ModalityKind, Array1<f32>)], ) -> TokenizerResult<CrossModalSequence>
Tokenize a batch of (modality, signal) pairs and return them as a CrossModalSequence.
Sourcepub fn decode(&self, token: &CrossModalToken) -> TokenizerResult<Array1<f32>>
pub fn decode(&self, token: &CrossModalToken) -> TokenizerResult<Array1<f32>>
Decode a CrossModalToken back to the raw input space.
Uses the per-modality codebook entry as the quantized embedding, inverts the shared projection, removes the modality offset, and applies the pseudo-inverse decoder.
The shared embedding dimension.
Sourcepub fn num_modalities(&self) -> usize
pub fn num_modalities(&self) -> usize
Number of registered modalities.
Sourcepub fn modality_names(&self) -> Vec<String>
pub fn modality_names(&self) -> Vec<String>
Sorted list of registered modality keys.
Sourcepub fn robotics_preset() -> TokenizerResult<Self>
pub fn robotics_preset() -> TokenizerResult<Self>
Robotics preset: audio (16-dim), control (6-dim), sensor (9-dim) → shared_dim 64.
Sourcepub fn audio_video_preset() -> TokenizerResult<Self>
pub fn audio_video_preset() -> TokenizerResult<Self>
Audio-video preset: audio (80-dim), video (512-dim) → shared_dim 256.
Trait Implementations§
Source§impl SignalTokenizer for CrossModalTokenizer
SignalTokenizer implementation for CrossModalTokenizer.
impl SignalTokenizer for CrossModalTokenizer
SignalTokenizer implementation for CrossModalTokenizer.
Treats the input as a concatenation of registered modality signals (in registration order). Encodes each slice, concatenates the resulting embeddings, and returns the full multi-modal embedding vector.
For decode, the embedding is split back into per-modality chunks,
decoded, and concatenated.
Source§fn encode(&self, signal: &Array1<f32>) -> TokenizerResult<Array1<f32>>
fn encode(&self, signal: &Array1<f32>) -> TokenizerResult<Array1<f32>>
Encode a concatenated multi-modal signal.
The input must be the concatenation of all registered modalities’
raw signals (in sorted key order). Each modality’s token embedding
is concatenated into a single output vector of length
num_modalities * shared_dim.
Source§fn decode(&self, tokens: &Array1<f32>) -> TokenizerResult<Array1<f32>>
fn decode(&self, tokens: &Array1<f32>) -> TokenizerResult<Array1<f32>>
Decode a concatenated embedding vector back to the raw input space.
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Returns 0 (continuous-style tokenizer; each modality has its own discrete codebook).
Auto Trait Implementations§
impl Freeze for CrossModalTokenizer
impl RefUnwindSafe for CrossModalTokenizer
impl Send for CrossModalTokenizer
impl Sync for CrossModalTokenizer
impl Unpin for CrossModalTokenizer
impl UnsafeUnpin for CrossModalTokenizer
impl UnwindSafe for CrossModalTokenizer
Blanket Implementations§
Source§impl<T> BatchTokenizer for Twhere
T: SignalTokenizer,
impl<T> BatchTokenizer for Twhere
T: SignalTokenizer,
Source§fn encode_batch(&self, signals: &Array2<f32>) -> TokenizerResult<Array2<f32>>
fn encode_batch(&self, signals: &Array2<f32>) -> TokenizerResult<Array2<f32>>
Source§fn decode_batch(&self, tokens: &Array2<f32>) -> TokenizerResult<Array2<f32>>
fn decode_batch(&self, tokens: &Array2<f32>) -> TokenizerResult<Array2<f32>>
Source§fn encode_batch_padded_to(
&self,
signals: &[Array1<f32>],
target_len: usize,
) -> TokenizerResult<Array2<f32>>
fn encode_batch_padded_to( &self, signals: &[Array1<f32>], target_len: usize, ) -> TokenizerResult<Array2<f32>>
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more