Skip to main content

Crate cognis_rag

Crate cognis_rag 

Source
Expand description

§cognis-rag

v2-beta RAG primitives: embeddings, vector stores, document loaders, text splitters, retrievers, and an indexing pipeline.

Top-level modules:

  • document — the universal Document type.
  • embeddingsEmbeddings trait + Fake/OpenAI/Ollama impls.
  • vectorstoreVectorStore trait + InMemoryVectorStore.
  • loaders — text/markdown/json/directory/csv/html loaders.
  • splitters — recursive-char + markdown-aware splitters.
  • retrievers — vector / BM25 / ensemble retrievers (each is a Runnable).
  • indexing — wire load → split → embed → store with one call.

Re-exports§

pub use cross_encoder::CrossEncoder;
pub use cross_encoder::CrossEncoderReranker;
pub use cross_encoder::FnCrossEncoder;
pub use distance::Distance;
pub use docstore::Docstore;
pub use docstore::InMemoryDocstore;
pub use document::Document;
pub use embeddings::OllamaEmbeddings;
pub use embeddings::OpenAIEmbeddings;
pub use embeddings::BatchedEmbeddings;
pub use embeddings::CachedEmbeddings;
pub use embeddings::EmbeddingRouter;
pub use embeddings::Embeddings;
pub use embeddings::EmbeddingsRouter;
pub use embeddings::FakeEmbeddings;
pub use embeddings::FnRouter;
pub use embeddings::LengthRouter;
pub use example_selectors::AsyncExampleSelector;
pub use example_selectors::EmbedMode;
pub use example_selectors::MmrExampleSelector;
pub use example_selectors::SemanticSimilarityExampleSelector;
pub use indexing::IncrementalReport;
pub use indexing::IndexingPipeline;
pub use loaders::DirectoryLoader;
pub use loaders::DocumentLoader;
pub use loaders::DocumentStream;
pub use loaders::JsonLoader;
pub use loaders::MarkdownLoader;
pub use loaders::TextLoader;
pub use multi_vector::MultiVectorIndexer;
pub use record_manager::fingerprint;
pub use record_manager::InMemoryRecordManager;
pub use record_manager::RecordManager;
pub use retrievers::BM25Retriever;
pub use retrievers::CachingRetriever;
pub use retrievers::CompressorPipeline;
pub use retrievers::EnsembleRetriever;
pub use retrievers::MultiVectorRetriever;
pub use retrievers::ParentDocumentRetriever;
pub use retrievers::QueryTranslatorRetriever;
pub use retrievers::VectorRetriever;
pub use splitters::CharacterSplitter;
pub use splitters::CodeLanguage;
pub use splitters::CodeSplitter;
pub use splitters::HtmlSplitter;
pub use splitters::JsonSplitter;
pub use splitters::MarkdownSplitter;
pub use splitters::RecursiveCharSplitter;
pub use splitters::SentenceSplitter;
pub use splitters::TextSplitter;
pub use splitters::TokenAwareSplitter;
pub use transformers::Dedup;
pub use transformers::Enrichment;
pub use transformers::LongContextReorder;
pub use transformers::MetadataTransformer;
pub use vectorstore::Filter;
pub use vectorstore::InMemoryVectorStore;
pub use vectorstore::SearchResult;
pub use vectorstore::VectorStore;

Modules§

cross_encoder
Cross-encoder scoring trait + cross-encoder-based reranker.
distance
Distance metrics for vector similarity.
docstore
Docstore — keyed Document storage by stable id.
document
Document — the unit of RAG: a piece of text plus typed metadata.
embeddings
Embeddings trait + implementations.
example_selectors
Embedding-driven example selectors for few-shot prompts.
indexing
Indexing pipeline — load → split → embed → store.
loaders
Document loaders — read sources into Documents.
multi_vector
MultiVectorIndexer — index many representations of one document under a shared parent id.
prelude
Common imports for v2 RAG user code.
record_manager
Incremental indexing — track per-document fingerprints so re-indexing only re-embeds new or changed documents and removes deleted ones.
retrievers
Retrievers — Runnable<String, Vec<Document>>.
splitters
Text splitters — chunk a Document into smaller Documents suitable for embedding.
transformers
Document-list transformers — Runnable<Vec<Document>, Vec<Document>>.
vectorstore
Vector store trait + SearchResult + Filter.

Structs§

CharTokenizer
Trivial char-as-token implementation. Conservative upper bound on real tokenizer counts; useful as a default for budgeting.
FnTokenizer
Closure-backed tokenizer.

Traits§

Tokenizer
Counts tokens in a piece of text.