Expand description
§cvxtract
LLM-powered structured extraction from CVs and resumes.
cvxtract loads a CV/resume in any common format (PDF, DOCX, HTML, plain text), sends the text to an LLM, and deserialises the response directly into typed Rust structs — no regex, no hand-written parsers.
§Quick start
use cvxtract::{Extractor, Model};
#[tokio::main]
async fn main() {
// Use any provider — here we use a local quantised model (no API key required).
let mut extractor = Extractor::new(Some(Model::from_local()));
match extractor.extract_resume("resume.pdf".into()).await {
Ok(resume) => println!("{:#?}", resume),
Err(e) => eprintln!("Extraction failed: {e}"),
}
}§Providers
| Constructor | Backend | Requires |
|---|---|---|
Model::from_local() | llama-cpp-2 on-device (Qwen3.5-2B) | nothing — model auto-downloaded |
Model::from_openai() | OpenAI API | OPENAI_API_KEY env var |
Model::from_openrouter() | OpenRouter | OPENROUTER_API_KEY env var |
Model::from_ollama() | Local Ollama | Ollama running on localhost:11434 |
Model::from_openai_compatible() | Any OpenAI-compatible endpoint | explicit key + URL |
Model::from_copilot() | GitHub Copilot | COPILOT_TOKEN env var |
§GPU acceleration
Compile with a feature flag to offload the local model to your GPU:
# NVIDIA CUDA
cargo build --release --features cuda
# Apple Silicon (Metal)
cargo build --release --features metal
# AMD / Intel / Vulkan
cargo build --release --features vulkan§Custom types
Implement serde::Deserialize and schemars::JsonSchema on any struct to
extract arbitrary shapes from a CV:
use cvxtract::{Extractor, Model};
use schemars::JsonSchema;
use serde::Deserialize;
#[derive(Debug, Deserialize, JsonSchema)]
struct ContactInfo {
name: String,
email: Option<String>,
phone: Option<String>,
}
#[tokio::main]
async fn main() {
let mut extractor = Extractor::new(Some(Model::from_local()));
let info: ContactInfo = extractor.extract("resume.pdf".into()).await.unwrap();
println!("{:#?}", info);
}Structs§
- Award
- Certification
- Date
Range - Half-open date range.
startandendare both null when dates are unknown. - Document
- Represents extracted document content with metadata
- Document
Element - Document elements (structured content)
- Document
Metadata - Document metadata
- Education
- Experience
- Extractor
- Orchestrates document loading and LLM-powered structured extraction.
- Language
- Model
- An LLM provider that can generate text from a prompt.
- Partial
Date - Year-always-present, month/day optional.
- Project
- Resume
- Top-level resume — covers the vast majority of real-world CVs.
- Skill
Group - Skills can be flat or grouped by category.
- Unstructured
Loader - Loads CV/resume documents in any supported format, automatically detecting the type.
Enums§
- Extraction
Error - Errors that can occur during structured extraction.
- File
Type - Supported file types
- Loader
Error - Error types for document loading