Crate swiftide

Expand description

§Swiftide

Swiftide is a data indexing and processing library, tailored for Retrieval Augmented Generation (RAG). When building applications with large language models (LLM), these LLMs need access to external resources. Data needs to be transformed, enriched, split up, embedded, and persisted. It is build in Rust, using parallel, asynchronous streams and is blazingly fast.

Part of the bosun.ai project. An upcoming platform for autonomous code improvement.

We <3 feedback: project ideas, suggestions, and complaints are very welcome. Feel free to open an issue.

Read more about the project on the swiftide website

§Features

Extremely fast streaming indexing pipeline with async, parallel processing
Integrations with OpenAI, Redis, Qdrant, FastEmbed, Treesitter and more
A variety of loaders, transformers, and embedders and other common, generic tools
Bring your own transformers by extending straightforward traits
Splitting and merging pipelines
Jinja-like templating for prompts
Store into multiple backends
tracing supported for logging and tracing, see /examples and the tracing crate for more information.

§Querying

After running an indexing pipeline, you can use the query module to query the indexed data.

§Examples

§Indexing markdown


 Pipeline::from_loader(FileLoader::new(".").with_extensions(&["md"]))
         .then_chunk(ChunkMarkdown::from_chunk_range(10..512))
         .then(MetadataQAText::new(openai_client.clone()))
         .then_in_batch(Embed::new(openai_client.clone()).with_batch_size(10))
         .then_store_with(
             Qdrant::try_from_url(qdrant_url)?
                 .batch_size(50)
                 .vector_size(1536)
                 .collection_name("swiftide-examples".to_string())
                 .build()?,
         )
         .run()
         .await

§Querying


query::Pipeline::default()
    .then_transform_query(query_transformers::GenerateSubquestions::from_client(
        openai_client.clone(),
    ))
    .then_transform_query(query_transformers::Embed::from_client(
        openai_client.clone(),
    ))
    .then_retrieve(qdrant.clone())
    .then_transform_response(response_transformers::Summary::from_client(
        openai_client.clone(),
    ))
    .then_answer(answers::Simple::from_client(openai_client.clone()))
    .query("What is swiftide?")
    .await?;

§Feature flags

Swiftide has little features enabled by default, as there are some dependency heavy integrations. You need to cherry-pick the tools and integrations you want to use.

§Integrations

qdrant — Enables Qdrant for storage and retrieval
pgvector — Enables PgVector for storage and retrieval
redis — Enables Redis as an indexing cache and storage
tree-sitter — Tree-sitter for various code transformers
openai — OpenAI for embedding and prompting
groq — Groq prompting
dashscope — Dashscope prompting
open-router — OpenRouter prompting
ollama — Ollama prompting
fastembed — FastEmbed (by qdrant) for fast, local, sparse and dense embeddings
scraping — Scraping via spider as loader and a html to markdown transformer
aws-bedrock — AWS Bedrock for prompting
lancedb — Lancdb for persistance and querying
fluvio — Fluvio loader
parquet — Parquet loader
redb — Redb embeddable nodecache

§Other features

test-utils — Various testing utilities

§Experimental

Modules§

agents: Swiftide agents are a flexible way to build fast and reliable AI agents.
chat_completion
indexing: This module serves as the main entry point for indexing in Swiftide.
integrations: Integrations with various platforms and external services.
prompt: Prompts templating and management
query: Querying pipelines
template
traits: Common traits for common behaviour, re-exported from indexing and query

Structs§

SparseEmbedding

Type Aliases§

Embedding
Embeddings
SparseEmbeddings

Crate swiftideCopy item path

§Swiftide

§Features

§Querying

§Examples

§Indexing markdown

§Querying

§Feature flags

§Integrations

§Other features

§Experimental

Modules§

Structs§

Type Aliases§

Crate swiftide