airust
๐ง airust is a modular, trainable AI library written in Rust.
It supports compile-time knowledge through JSON files and provides sophisticated prediction engines for natural language input.
๐ AiRust Capabilities
โ What You Can Concretely Do:
๐ง 1. Build Your Own AI Agents
- Train agents with examples (Question โ Answer)
- Supported Agent Types:
- Exact Match โ precise matching
- Fuzzy Match โ tolerant to typos (Levenshtein)
- TF-IDF/BM25 โ semantic similarity
- ContextAgent โ remembers previous dialogues
๐ฌ 2. Manage Your Own Knowledge Database
- Save/load training data (
train.json) - Weighting and metadata per entry
- Import legacy data possible
๐ 3. PDF Knowledge Extraction
- Convert PDF documents into structured knowledge bases
- Intelligent text chunking with configurable parameters
- Automatic metadata generation for search context
- Merge multiple PDF sources into unified knowledge
- Command-line tools for batch processing
๐งช 4. Text Analysis
- Tokenization, stop words, N-grams
- Similarity measures: Levenshtein, Jaccard
- Text normalization
๐งฐ 5. Custom CLI Tools
- Launch
airustCLI for:- Interactive sessions with an agent
- Knowledge base management
- Quick data testing
- PDF conversion and import
๐ 6. Integration into Other Projects
- Use
airustas a Rust library in your own applications (Web, CLI, Desktop, IoT)
๐ง Example Application Ideas:
- ๐ค FAQ Bot for your website
- ๐ Intelligent document search
- ๐งพ Customer support via terminal
- ๐ฃ๏ธ Voice assistant with context understanding
- ๐ Similarity search for text databases
- ๐ Local assistance tool for developer documentation
- ๐ Smart PDF document analyzer and query system
๐ Advanced Features
-
๐งฉ Modular Architecture with Unified Traits:
Agentโ Base trait for all agents with enhanced prediction capabilitiesTrainableAgentโ For agents that can be trained with examplesContextualAgentโ For context-aware conversational agentsConfidenceAgentโ New trait for agents that can provide prediction confidence
-
๐ง Intelligent Agent Implementations:
MatchAgentโ Advanced matching with configurable strategies- Exact matching
- Fuzzy matching with dynamic thresholds
- Configurable Levenshtein distance options
TfidfAgentโ Sophisticated similarity detection using BM25 algorithm- Customizable term frequency scaling
- Document length normalization
ContextAgent<A>โ Flexible context-aware wrapper- Multiple context formatting strategies
- Configurable context history size
-
๐ Enhanced Response Handling:
ResponseFormatwith support for:- Plain text
- Markdown
- JSON
- Metadata and confidence tracking
- Seamless type conversions
-
๐พ Intelligent Knowledge Base:
- Compile-time knowledge via
train.json - Runtime knowledge expansion
- Backward compatibility with legacy formats
- Weighted training examples
- Optional metadata support
- Compile-time knowledge via
-
๐ PDF Processing and Knowledge Extraction:
PdfLoaderwith configurable extraction parameters:- Min/max chunk sizes for optimal text segmentation
- Chunk overlap for context preservation
- Sentence-aware splitting for natural text boundaries
- Intelligent PDF text extraction
- Automatic training example generation from PDF content
- PDF metadata preservation
- Command-line tools for batch processing
- Multi-document knowledge base merging
-
๐ Advanced Text Processing:
- Tokenization with Unicode support
- Stopword removal
- Text normalization
- N-gram generation
- Advanced string similarity metrics
- Levenshtein distance
- Jaccard similarity
-
๐ ๏ธ Unified CLI Tool:
- Interactive mode
- Multiple agent type selection
- Knowledge base management
- Flexible querying
- PDF import and conversion
๐ง Usage
Integration in other projects
[]
= "0.1.5"
Sample Code (Updated)
use ;
๐ Training Data Format
The file format knowledge/train.json has been extended to support both the old and new format:
Legacy format is still supported for backward compatibility.
๐ฅ๏ธ CLI Usage
# Simple query
# Interactive mode
# Knowledge base management
๐ PDF Conversion and Import
AIRust includes powerful tools for converting PDF documents into structured knowledge bases:
Using the PDF2KB Tool
# Convert a PDF file to a knowledge base with default settings
# Specify custom output location
# With custom chunk parameters
# Additional options
Using AIRust's PDF Import Feature
# Import PDF directly through AIRust
Merging Multiple Knowledge Bases
After converting multiple PDFs to knowledge bases, merge them into a unified knowledge source:
# Merge all JSON files in the knowledge/ directory
PDF Processing Configuration Options
--min-chunk <size>: Minimum chunk size in characters (default: 50)--max-chunk <size>: Maximum chunk size in characters (default: 1000)--overlap <size>: Overlap between chunks in characters (default: 200)--weight <value>: Weight for generated training examples (default: 1.0)--no-metadata: Disable inclusion of metadata in training examples--no-sentence-split: Disable sentence boundary detection for chunking
๐ Advanced Usage โ Context Agent
use ;
๐ PDF Knowledge Extraction Example
use ;
๐ New in Version 0.1.5
Matching Strategies
// Configurable fuzzy matching
let agent = new;
Context Formatting
// Multiple context representation strategies
let context_agent = new
.with_context_format;
// Other formats: QAPairs, Sentence, Custom
Advanced Text Utilities
// Text processing capabilities
let tokens = tokenize;
let unique_terms = unique_terms;
let ngrams = create_ngrams;
PDF Processing
// Advanced PDF configuration
let config = PdfLoaderConfig ;
let loader = with_config;
// Convert PDF to knowledge base
let kb = loader.pdf_to_knowledge_base?;
๐ License
MIT
Built with โค๏ธ in Rust.
Contributions and extensions are welcome!
๐ Migration Guide for airust 0.1.5
This guide helps you migrate from airust 0.1.x to 0.1.5.
1. Trait and Type Changes
New Trait Hierarchy
New Response Format
let answer: ResponseFormat = agent.predict;
let answer_string: String = Stringfrom;
Updated TrainingExample Struct
2. Agent Replacements
SimpleAgent and FuzzyAgent โ MatchAgent
let mut agent = new_exact;
let mut agent = new_fuzzy;
With options:
let mut agent = new;
ContextAgent is Now Generic
let mut base_agent = new;
base_agent.train;
let mut agent = new;
StructuredAgent Removed (use ResponseFormat)
3. Knowledge Base Changes
let kb = from_embedded;
let data = kb.get_examples;
let mut kb = new;
kb.add_example;
4. CLI Tool Migration
5. New PDF Processing Tools
# Convert PDFs to knowledge bases
# Import PDF directly in AIRust
# Merge PDF-derived knowledge bases
6. Recommendations
- Upgrade your dependencies
- Use new
lib.rsre-exports - Test thoroughly
- Explore new context formatting
- Try PDF knowledge extraction for document analysis