sphereql-embed 0.3.0

Vector embedding projection engine for sphereQL
Documentation

sphereql-embed

Vector embedding projection engine for the sphereQL project.

Projects high-dimensional embeddings onto S² via one of four families — PCA, Kernel PCA (Gaussian/RBF), Laplacian eigenmap (connectivity-preserving), or UMAP-on-sphere (tangent-bundle Adam optimizer, kNN attractive + uniform-negative repulsive, optional category supervision) — unified behind a ConfiguredProjection enum so the pipeline can switch families without touching generics.

Provides a query pipeline (SphereQLPipeline) with k-NN search, similarity thresholds, concept paths, glob detection, local manifold fitting, and a Category Enrichment Layer: inter-category graph, bridge detection with Genuine / OverlapArtifact / Weak classification, automatic inner spheres, drill-down, and hierarchical domain-group routing for low-EVR regimes (hierarchical_nearest).

Ships a metalearning framework on top: a PipelineConfig hierarchy for every tunable constant (with #[serde(default)] so partial overrides work), a QualityMetric trait plus four concrete metrics (territorial health, bridge coherence, cluster silhouette, graph modularity) with composite presets, an auto_tune sweep over a discrete SearchSpace (Grid / Random / Bayesian TPE-lite), a MetaModel layer (NearestNeighbor, DistanceWeighted) with an on-disk store at ~/.sphereql/meta_records.json, and FeedbackEvent / FeedbackAggregator primitives for blending user satisfaction into the training record.

Includes a TextEmbedder trait (plus NoEmbedder default and FnEmbedder closure wrapper) so downstream crates — GraphQL, REPLs, custom harnesses — can accept natural-language queries without sphereql-embed depending on any specific embedder backend.

Example

use sphereql_embed::{
    PipelineInput, PipelineQuery, SphereQLOutput, SphereQLPipeline, SphereQLQuery,
};

let pipeline = SphereQLPipeline::new(PipelineInput {
    categories: vec![
        "science".into(), "science".into(),
        "cooking".into(), "cooking".into(),
    ],
    embeddings: vec![
        vec![0.1, 0.9, 0.3, 0.0],
        vec![0.2, 0.8, 0.4, 0.1],
        vec![0.9, 0.1, 0.0, 0.5],
        vec![0.8, 0.2, 0.1, 0.4],
    ],
})?;

let out = pipeline.query(
    SphereQLQuery::Nearest { k: 3 },
    &PipelineQuery { embedding: vec![0.15, 0.85, 0.35, 0.05] },
)?;

if let SphereQLOutput::Nearest(results) = out {
    for r in results {
        println!("{} ({}) at {:.3} rad", r.id, r.category, r.distance);
    }
}

To pick a non-default projection family, build with SphereQLPipeline::new_with_config and set PipelineConfig::projection_kind (Pca, KernelPca, LaplacianEigenmap, or UmapSphere) — or let auto_tune sweep the kind as a tuner axis.

Versioning

Part of the sphereQL workspace, currently 0.3.0; API may change before 1.0. Notable recent changes (see the workspace CHANGELOG): run_self_tune and SphereQLPipeline::to_json now return Result, MetaModel gained is_fitted, auto_tune warm-starts trial 0 from the meta-model prediction, and for UMAP projections explained_variance_ratio now reports kNN-recall trustworthiness rather than the old variance proxy (records stored under the old proxy are not score-comparable).

See the main repository for full documentation, examples (auto_tune, meta_learn, meta_warm_start, meta_feedback, spatial_analysis, category_enrichment), and architecture overview.