symproj 0.1.0

Symbolic projection and embeddings for Tekne
Documentation

symproj

Symbolic projection (embeddings) for Tekne.

Maps discrete symbols to continuous vectors using a Codebook.

Naming note: this crate was previously named proj, but proj is already taken on crates.io by GeoRust's PROJ bindings (geospatial). We publish this crate as symproj.

Intuition First

Imagine a library where every book has a call number. The call number isn't just a label; it tells you where the book sits in a 3D space. symproj is the system that maps "book names" (tokens) to "library coordinates" (vectors).

Provenance (minimal citations)

What this crate implements is the long-lived primitive: [ (t_1,\dots,t_n)\mapsto \mathbb{R}^d ] via (1) embedding lookup (a codebook) and (2) pooling (mean).

  • Word embeddings / lookup tables: Mikolov et al. (word2vec), 2013. arXiv:1301.3781
  • Subword tokenization:
  • Sentence embeddings baseline: Arora et al. (SIF), 2017. ICLR OpenReview
  • Modern sentence embedding fine-tuning:
  • Retrieval context (token vectors + pooling/compression):

Nearby Rust ecosystem crates (context, not dependencies)