symproj
Symbolic projection (embeddings). Maps discrete symbols to continuous vectors using a Codebook and pooling.
Intuition
Imagine a library where every book has a call number. The call number isn't just a label; it tells you where the book sits in a 3D space. symproj is the system that maps "book names" (tokens) to "library coordinates" (vectors).
Usage
use ;
use BpeTokenizer;
// 1. Load a Codebook (matrix + dimension)
let matrix = vec!; // flattened [vocab_size * dim]
let codebook = new.unwrap;
// 2. Create a Projection (tokenizer + codebook)
let tokenizer = from_file?;
let proj = new;
// 3. Encode text -> vector (mean pooling)
let vec = proj.encode.unwrap;
Features
- Codebook: Dense embedding matrix lookup.
- Pooling: Mean, weighted mean (SIF), and sequence output.
- Normalization: L2 normalization and component removal (PCA-based denoising).
License
MIT OR Apache-2.0