tokitai-operator 0.1.0

Verified DL kernel compiler: formally-checked GEMM, p-adic, sheaf, contract-carrying ops. Paper-artifact grade.
Documentation
//! Phase 2.5: lightweight bridge from the tokitai-search SQLite ledgers
//! to the 0.7B MoE training loop.
//!
//! The real `tokitai-search::crates::training` crate depends on
//! `tokitai-search-core` (which we explicitly do NOT want to pull into
//! `tokitai-operator`). This module is a self-contained, **dependency-free
//! of `tokitai-search-core`** replacement that:
//!
//! 1. Opens `quality_decisions.db` and `quality_outcomes.db` with
//!    `rusqlite` directly (see [`sqlite_reader`]).
//! 2. Joins the two ledgers on `decision_group_id` and yields
//!    [`LocalSample { features: Vec<f32>, labels: Vec<f32> }`] values
//!    (96-dim features, 20-dim labels) via [`local_dataset::LocalDataset`].
//! 3. Converts each `LocalSample` into the [`Tensor<f32>`] shape used by
//!    the model layer / training driver via [`adapter`].
//!
//! **No dependency on `tokitai-search-core`.** Only `rusqlite`, `rand`,
//! and the in-tree IR object layer.
//!
//! ## Join semantics
//!
//! The reader performs an **inner join** on `decision_group_id`. A
//! decision with no matching outcome is dropped from the training stream.
//! Rationale: the downstream task is *supervised* — a sample without a
//! label cannot contribute to the loss. To preserve the option for
//! self-supervised pre-training, the per-decision row is also exposed
//! in [`sqlite_reader::DecisionRow`] so callers can build their own
//! outer-join iterator if needed.
//!
//! ## Feature / label encoding
//!
//! 74 categorical + 22 numerical features are concatenated into a
//! 96-dim `Vec<f32>`. Categoricals are 0/1 one-hot; numericals are
//! milli-units in `[0, 1000]` clamped to `f32`. The 20-dim label is a
//! concatenation of a 12-way outcome one-hot and 8 aux metric scalars
//! (in `[0, 1]`). See [`adapter`] for the precise contract.

pub mod adapter;
pub mod local_dataset;
pub mod sqlite_reader;

pub use adapter::{to_input_tensor, to_target_tensor};
pub use local_dataset::{LocalDataset, LocalSample};
pub use sqlite_reader::{DecisionRow, OutcomeRow, SqliteDatasetReader};

/// Dimensionality of the input feature vector (74 categorical one-hot
/// + 22 numerical milli-units). Matches
/// `tokitai-search::crates::training::CATEGORICAL_DIMS + NUMERICAL_DIMS`.
pub const FEATURE_DIM: usize = 96;
/// Dimensionality of the target label vector (12 outcome one-hot + 8
/// aux metric scalars). Matches
/// `tokitai-search::crates::training::OUTCOME_KIND_DIMS + AUX_METRIC_DIMS`.
pub const LABEL_DIM: usize = 20;