Skip to main content

SphereQLPipeline

Struct SphereQLPipeline 

Source
pub struct SphereQLPipeline { /* private fields */ }
Expand description

The main SphereQL pipeline: fitted projection + spatial index + category enrichment layer + optional tunable config.

Build one with Self::new for defaults, Self::new_with_config for an explicit PipelineConfig, or Self::new_from_metamodel / Self::new_from_metamodel_tuned to consult a trained meta-model on past tuner runs.

Implementations§

Source§

impl SphereQLPipeline

Source

pub fn new(input: PipelineInput) -> Result<Self, PipelineError>

Build a pipeline from raw inputs with PipelineConfig::default.

  • input.categories[i] is the category for sentence i
  • input.embeddings[i] is the embedding vector for sentence i
  • All embedding vectors must have the same dimensionality (>= 3).
Source

pub fn new_with_config( input: PipelineInput, config: PipelineConfig, ) -> Result<Self, PipelineError>

Build a pipeline with an explicit configuration. Fits the projection internally using PipelineConfig::projection_kind and any relevant sub-config (e.g. LaplacianConfig).

Source

pub fn new_from_metamodel<M: MetaModel>( input: PipelineInput, model: &M, ) -> Result<(Self, CorpusFeatures, PipelineConfig), PipelineError>

Build a pipeline using a config predicted by a MetaModel.

Extracts CorpusFeatures from the input, asks the model for a predicted PipelineConfig, then builds the pipeline with it. Returns the pipeline alongside the extracted features and the predicted config so the caller can log, audit, or save them as a new MetaTrainingRecord.

This is the “tune-or-recall” entry point: once you’ve accumulated a handful of training records, call this instead of crate::tuner::auto_tune when you want to skip search entirely. For a warm-start hybrid that does some tuning on top of the prediction, use Self::new_from_metamodel_tuned.

Source

pub fn new_from_metamodel_tuned<M, Q>( input: PipelineInput, model: &M, space: &SearchSpace, metric: &Q, strategy: SearchStrategy, ) -> Result<(Self, CorpusFeatures, TuneReport), PipelineError>
where M: MetaModel, Q: QualityMetric,

Warm-started hybrid: predict a config with model, then run a small-budget tuner pass using that prediction as base_config.

The prediction supplies values only for knobs NOT enumerated by space — any knob the space lists is searched cold across its axes, and the predicted value for it is ignored. Under SearchStrategy::Random and SearchStrategy::Bayesian the predicted config itself is additionally evaluated as trial 0 (counted against the budget), so it competes directly with the searched candidates; SearchStrategy::Grid skips that seed trial to keep its trial set the exact Cartesian enumeration.

Returns the winning pipeline, the extracted corpus features, and the full TuneReport. Callers can feed the report back into MetaTrainingRecord::from_tune_result to accumulate more training data for the next recall.

Source

pub fn with_projection( categories: Vec<String>, embeddings: Vec<Embedding>, pca: PcaProjection, ) -> Result<Self, PipelineError>

Build a pipeline from pre-computed embeddings and an existing PCA projection, with PipelineConfig::default.

This is the legacy entry point — use Self::with_configured_projection_and_config directly when you have a non-PCA ConfiguredProjection.

Source

pub fn with_projection_and_config( categories: Vec<String>, embeddings: Vec<Embedding>, pca: PcaProjection, config: PipelineConfig, ) -> Result<Self, PipelineError>

Legacy configurable PCA entry point. Prefer Self::with_configured_projection_and_config for new code.

Source

pub fn with_configured_projection_and_config( categories: Vec<String>, embeddings: Vec<Embedding>, projection: ConfiguredProjection, config: PipelineConfig, ) -> Result<Self, PipelineError>

Core pipeline constructor: accepts any ConfiguredProjection and a PipelineConfig.

Source

pub fn has_category(&self, name: &str) -> bool

True if name is a known category in this pipeline. Pair with Self::query to disambiguate “unknown category” from “category exists but is disconnected on the graph” without pattern-matching on PipelineError::UnknownCategory.

Source

pub fn has_id(&self, id: &str) -> bool

True if id is an indexed item in this pipeline.

Source

pub fn ids(&self) -> &[String]

All indexed item ids, in the order they were inserted (i.e. parallel to the input embeddings/categories). Currently auto-generated as s-{i:04} strings; callers that need stable mapping back to their own ids should keep their own parallel array.

Source

pub fn query( &self, q: SphereQLQuery<'_>, query_embedding: &PipelineQuery, ) -> Result<SphereQLOutput, PipelineError>

Execute a typed query against the pipeline.

Returns PipelineError::UnknownCategory when a category query references a name not in the pipeline, and PipelineError::UnknownId when a concept-path query references an id not in the index. Previously those paths collapsed into empty results / None, which callers couldn’t distinguish from legitimate “found nothing” outcomes.

Source

pub fn num_items(&self) -> usize

Total number of indexed items.

Source

pub fn categories(&self) -> &[String]

Slice of per-item category labels (index-aligned with insertion order).

Source

pub fn projected_points(&self) -> Vec<(&str, &str, [f64; 3])>

Export (id, category, cartesian [x, y, z]) triples for every indexed item.

Source

pub fn projection(&self) -> &ConfiguredProjection

Borrow the fitted projection regardless of kind.

Returns a &ConfiguredProjection, which implements the crate::projection::Projection trait — so most callers never need to pattern-match on the enum. The old .pca() accessor was removed because it panicked under any non-PCA config and every caller already worked through this method or its trait impl.

Source

pub fn projection_kind(&self) -> ProjectionKind

Active outer-sphere projection kind.

Source

pub fn exported_points(&self) -> Vec<ExportedPoint>

Export all projected points with their Cartesian and spherical coordinates.

Returns one ExportedPoint per indexed item, in insertion order.

Source

pub fn explained_variance_ratio(&self) -> f64

The active projection’s explained-variance-ratio-equivalent quality score, in [0, 1]. PCA returns the classical EVR; kernel PCA returns its kernel-space EVR; Laplacian eigenmap returns a compatible connectivity ratio (see LaplacianEigenmapProjection::connectivity_ratio); UMAP returns its kNN-recall — the fraction of each point’s high-dimensional neighbors preserved on the sphere. All four feed the EVR-adaptive thresholds downstream.

Source

pub fn num_categories(&self) -> usize

Number of unique categories in the corpus.

Source

pub fn unique_categories(&self) -> Vec<String>

Unique category names in insertion order.

Source

pub fn category_layer(&self) -> &CategoryLayer

Access the category enrichment layer directly.

Source

pub fn category_path(&self, source: &str, target: &str) -> Option<CategoryPath>

Shortcut: find the shortest path between two categories.

Source

pub fn bridge_items( &self, source: &str, target: &str, max: usize, ) -> Vec<&BridgeItem>

Shortcut: get bridge items between two categories.

Source

pub fn has_inner_sphere(&self, category: &str) -> bool

Shortcut: check if a category has an inner sphere.

Source

pub fn num_inner_spheres(&self) -> usize

Shortcut: number of categories with inner spheres.

Source

pub fn inner_sphere_stats(&self) -> Vec<InnerSphereReport>

Shortcut: inner sphere statistics for all categories.

Source

pub fn projection_warnings(&self) -> &[ProjectionWarning]

Projection quality warnings. Empty if EVR is above threshold.

Source

pub fn raw_embeddings(&self) -> Option<&[Vec<f64>]>

Returns the original high-dimensional embeddings if the retain-embeddings feature was active at construction time. The returned slice is aligned with ids(), categories(), and projected_points().

Returns None if the feature was not active or embeddings were not retained.

Source

pub fn embedding_dim(&self) -> usize

Embedding dimensionality (length of each embedding vector), or 0 if embeddings are not retained.

Source

pub fn pairwise_similarities(&self) -> Option<Result<Vec<f64>, SphereQlError>>

Compute the pairwise cosine similarity matrix from the retained raw embeddings. Returns the upper triangle as a flat vector aligned with ids() ordering.

Returns None if embeddings were not retained (feature retain-embeddings not active at construction).

Returns Err if the stored embeddings have mismatched dimensions (should not happen if the pipeline was constructed correctly).

Source

pub fn nearest_by_embedding( &self, query_embedding: &[f64], k: usize, ) -> Option<Result<Vec<(usize, f64)>, SphereQlError>>

Find the k concepts most similar to query_embedding by cosine similarity in the original embedding space (not the projected space).

Returns (index, similarity) pairs sorted by descending similarity, where index aligns with ids(), categories(), and projected_points().

Returns None if embeddings were not retained. Returns Err(DimensionMismatch) if query_embedding.len() differs from the stored embedding dimensionality.

Cost: scans every retained embedding — O(N·D) similarity computations — with a size-k heap keeping selection at O(N log k) time and O(k) extra allocation.

Source

pub fn domain_groups(&self) -> &[DomainGroup]

Coarse-grained domain groups detected from Voronoi adjacency + cap overlap. Single source of truth: the same vector used by default_nearest’s inner-sphere routing and hierarchical_nearest’s drill-down.

Source

pub fn route_to_group(&self, embedding: &Embedding) -> Option<&DomainGroup>

Coarse routing: find the domain group whose centroid is angularly nearest to the query’s projected position.

Source

pub fn hierarchical_nearest( &self, embedding: &Embedding, k: usize, ) -> Vec<NearestResult>

Hierarchical nearest-neighbor search: group → category → items.

When EVR is at or above RoutingConfig::low_evr_threshold, this is a plain outer-sphere k-NN (identical to SphereQLQuery::Nearest).

Below that threshold the outer sphere is unreliable, so we:

  1. Route the query to its nearest domain group.
  2. Drill down into each member category using its inner sphere (or the outer sphere if none exists).
  3. Merge the per-category results, sort by distance, truncate to k.
Source

pub fn default_nearest( &self, embedding: &Embedding, k: usize, ) -> Vec<NearestResult>

Default nearest path (v2 routing).

Routes the query to its closest domain group when the outer projection’s EVR is below [HIGH_EVR_ROUTING_BYPASS], the choice is unambiguous (d_nearest / d_second_nearest < group_routing_alpha), and the group has an inner sphere; otherwise falls back to plain outer-sphere k-NN. At EVR ≥ [HIGH_EVR_ROUTING_BYPASS] routing is bypassed entirely — the outer angular distances are already more accurate than any inner-sphere re-projection.

Source

pub fn quality_config(&self) -> &QualityConfig

Current quality configuration.

Source

pub fn set_quality_config(&mut self, config: QualityConfig)

Update the quality configuration (e.g., to enable filtering).

Source

pub fn annotate_relations(&mut self, labels: &[String])

Annotate every bridge in the category layer with an inferred RelationType.

labels[i] must correspond to the same item as BridgeItem::item_index == i — i.e., the pipeline’s item list.

Source

pub fn config(&self) -> &PipelineConfig

Full tunable configuration this pipeline was built with.

Source

pub fn to_json(&self) -> Result<String, Error>

Serialize all projected points as a JSON array string.

Returns Err when serialization fails — degenerate projections can produce non-finite coordinates, which JSON cannot represent.

Source

pub fn to_csv(&self) -> String

Serialize all projected points as RFC 4180-compliant CSV with a header row.

String fields (id, category) are quoted to handle embedded commas and special characters safely.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more