Expand description
Swiftide is a data indexing and processing library, tailored for Retrieval Augmented Generation (RAG). When building applications with large language models (LLM), these LLMs need access to external resources. Data needs to be transformed, enriched, split up, embedded, and persisted. It is build in Rust, using parallel, asynchronous streams and is blazingly fast.
Part of the bosun.ai project. An upcoming platform for autonomous code improvement.
We <3 feedback: project ideas, suggestions, and complaints are very welcome. Feel free to open an issue.
Read more about the project on the swiftide website
§Features
- Extremely fast streaming indexing pipeline with async, parallel processing
- Integrations with
OpenAI
,Redis
,Qdrant
,FastEmbed
,Treesitter
and more - A variety of loaders, transformers, and embedders and other common, generic tools
- Bring your own transformers by extending straightforward traits
- Splitting and merging pipelines
- Jinja-like templating for prompts
- Store into multiple backends
tracing
supported for logging and tracing, see /examples and thetracing
crate for more information.
§Querying
After running an indexing pipeline, you can use the query
module to query the indexed data.
§Examples
§Indexing markdown
Pipeline::from_loader(FileLoader::new(".").with_extensions(&["md"]))
.then_chunk(ChunkMarkdown::from_chunk_range(10..512))
.then(MetadataQAText::new(openai_client.clone()))
.then_in_batch(Embed::new(openai_client.clone()).with_batch_size(10))
.then_store_with(
Qdrant::try_from_url(qdrant_url)?
.batch_size(50)
.vector_size(1536)
.collection_name("swiftide-examples".to_string())
.build()?,
)
.run()
.await
§Querying
query::Pipeline::default()
.then_transform_query(query_transformers::GenerateSubquestions::from_client(
openai_client.clone(),
))
.then_transform_query(query_transformers::Embed::from_client(
openai_client.clone(),
))
.then_retrieve(qdrant.clone())
.then_transform_response(response_transformers::Summary::from_client(
openai_client.clone(),
))
.then_answer(answers::Simple::from_client(openai_client.clone()))
.query("What is swiftide?")
.await?;
§Feature flags
Swiftide has little features enabled by default, as there are some dependency heavy integrations. You need to cherry-pick the tools and integrations you want to use.
§Integrations
qdrant
— Enables Qdrant for storage and retrievalpgvector
— Enables PgVector for storage and retrievalredis
— Enables Redis as an indexing cache and storagetree-sitter
— Tree-sitter for various code transformersopenai
— OpenAI for embedding and promptinggroq
— Groq promptingdashscope
— Dashscope promptingopen-router
— OpenRouter promptingollama
— Ollama promptingfastembed
— FastEmbed (by qdrant) for fast, local, sparse and dense embeddingsscraping
— Scraping via spider as loader and a html to markdown transformeraws-bedrock
— AWS Bedrock for promptinglancedb
— Lancdb for persistance and queryingfluvio
— Fluvio loaderparquet
— Parquet loaderredb
— Redb embeddable nodecacheduckdb
— Duckdb; sqlite fork, support Persist, Retrieve and NodeCachemcp
— MCP tool support for agents (tools only)test-utils
— Various mocking and testing utilities
§Experimental
Modules§
- agents
swiftide-agents
- Swiftide agents are a flexible way to build fast and reliable AI agents.
- chat_
completion - indexing
- This module serves as the main entry point for indexing in Swiftide.
- integrations
- Integrations with various platforms and external services.
- prompt
- Prompts templating and management
- query
- Querying pipelines
- traits
- Common traits for common behaviour, re-exported from indexing and query
Structs§
Type Aliases§
Attribute Macros§
- indexing_
transformer macros
- Generates boilerplate for an indexing transformer.
- tool
macros
- Creates a
Tool
from an async function.
Derive Macros§
- Tool
macros
- Derive
Tool
on a struct.