Skip to main content

Crate text_retrieval

Crate text_retrieval 

Source
Expand description

§text-retrieval

Library-first semantic and hybrid retrieval for moritzbrantner-video-analysis.

Default builds are deterministic and local-first. Transcript integration is feature-gated, and native model execution stays outside the default dependency closure.

For the high-level text workflow, see docs/TEXT_WORKSPACE_GUIDE.md. For a lower-level walkthrough of how RetrievalIndex relates to TextCorpus, lexical scoring, hashed semantic search, and corpus analysis reports, see docs/TEXT_CORPUS_GUIDE.md.

§Highlights

  • Deterministic text chunking with token overlap
  • Exact semantic retrieval over moritzbrantner-vector-analysis-index
  • BM25 lexical retrieval over moritzbrantner-text-lexical
  • Hybrid weighted ranking with metadata filters
  • Related-content lookup and persistence-friendly export helpers

§Stable contract

The stable surface is chunk construction, SearchDocument, TextDocumentContract/TextSegmentContract ingestion, retrieval request/result types, metadata filters, snapshot planning, and persistence DTOs.

§Quality and limits

Hybrid score calibration and ranking quality are best-effort. Persistence helper types are stable DTOs, but default package-surface operations plan or build in-memory indexes and do not write files.

§Package surface

  • Primary workflow: retrieval.search builds a transient in-memory retrieval index and searches it.
  • Workflow operations: retrieval.chunk, retrieval.search, retrieval.rerank, and retrieval.snapshotPlan.
  • Debug operations: describe inspects package metadata and operation support.
  • Runtime support: pure Rust, available through library, CLI, server, and WASM wrappers.
  • Sample output includes title, message, summary, result, and operation-specific fields such as chunks, report, mode, results, or snapshot planning details.
  • Package-surface operations do not write persistence artifacts or run native model inference; retrieval.snapshotPlan plans in-memory persistence work but does not write files.
  • moritzbrantner-text-embeddings
  • moritzbrantner-text-lexical
  • moritzbrantner-vector-analysis-index

Modules§

surface
Library-owned runtime surface for text-retrieval.

Structs§

ChunkingOptions
Options for explicit chunk construction.
DocumentChunk
Data type for document chunk.
HybridConfig
Data type for hybrid config.
IngestReport
Data type for ingest report.
IngestionOptions
Data type for ingestion options.
PersistedChunkRecord
Data type for persisted chunk record.
PersistedCorpusMetadata
Data type for persisted corpus metadata.
PersistedSearchIndex
Data type for persisted search index.
RerankExecutionContext
Caller-supplied runtime context for reranking.
RerankRequest
Request for query/document reranking.
RerankResponse
Response for query/document reranking.
RerankResult
One reranked document.
RetrievalFile
Data type for retrieval file.
RetrievalIndex
Data type for retrieval index.
RetrievalManifest
Data type for retrieval manifest.
SearchDocument
Data type for search document.
SearchFilter
Data type for search filter.
SearchQuery
Data type for search query.
SearchResult
Data type for search result.

Enums§

ChunkingStrategy
Strategy for constructing retrieval chunks.
RetrievalMode
Variants describing retrieval mode.
StorageError
Variants describing storage error.

Traits§

IntoSearchDocument

Functions§

chunk_search_document
Chunks one search document with explicit strategy options.
rerank_documents
Reranks documents from imported scores or deterministic lexical overlap.
rerank_documents_with_context
Reranks documents using a runtime backend when supplied, otherwise falls back.

Type Aliases§

Result
Type alias for result.