Expand description
§LangExtract
A Rust library for extracting structured and grounded information from text using LLMs.
This library provides a clean, async API for working with various language model providers to extract structured data from unstructured text.
§Features
- Support for multiple LLM providers (Gemini, OpenAI, Ollama)
- Async/await API for concurrent processing
- Schema-driven extraction with validation
- Text chunking and tokenization
- Flexible output formats (JSON, YAML)
- Built-in visualization and progress tracking
§Quick Start
use langextract_rust::{extract, ExampleData, Extraction, FormatType};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let examples = vec![
ExampleData {
text: "John Doe is 30 years old".to_string(),
extractions: vec![
Extraction::new("person".to_string(), "John Doe".to_string()),
Extraction::new("age".to_string(), "30".to_string()),
],
}
];
let result = extract(
"Alice Smith is 25 years old and works as a doctor",
Some("Extract person names and ages from the text"),
&examples,
Default::default(),
).await?;
println!("{:?}", result);
Ok(())
}
Re-exports§
pub use data::AlignmentStatus;
pub use data::AnnotatedDocument;
pub use data::CharInterval;
pub use data::Document;
pub use data::ExampleData;
pub use data::Extraction;
pub use data::FormatType;
pub use exceptions::LangExtractError;
pub use exceptions::LangExtractResult;
pub use inference::BaseLanguageModel;
pub use inference::ScoredOutput;
pub use providers::ProviderConfig;
pub use providers::ProviderType;
pub use providers::UniversalProvider;
pub use resolver::ValidationConfig;
pub use resolver::ValidationResult;
pub use resolver::ValidationError;
pub use resolver::ValidationWarning;
pub use resolver::CoercionSummary;
pub use resolver::CoercionDetail;
pub use resolver::CoercionTargetType;
pub use visualization::ExportFormat;
pub use visualization::ExportConfig;
pub use visualization::export_document;
Modules§
- alignment
- Text alignment functionality for mapping extractions to source text positions.
- annotation
- Text annotation functionality.
- chunking
- Text chunking functionality for processing large documents.
- data
- Core data types for the annotation pipeline.
- exceptions
- Error types and result definitions for LangExtract.
- factory
- Factory for creating language model instances.
- inference
- Language model inference abstractions and implementations.
- io
- I/O utilities for loading text from various sources.
- multipass
- Multi-pass extraction system for improved recall and quality.
- progress
- Progress tracking functionality.
- prompting
- Advanced prompt template system with dynamic variables and provider adaptation.
- providers
- Language model provider implementations.
- resolver
- Output resolution and parsing functionality.
- schema
- Schema definitions and abstractions for structured prompt outputs.
- tokenizer
- Text tokenization functionality.
- visualization
- Visualization utilities for annotated documents.
Structs§
- Extract
Config - Configuration for the extract function