Skip to main content

Crate cvxtract

Crate cvxtract 

Source
Expand description

§cvxtract

LLM-powered structured extraction from CVs and resumes.

cvxtract loads a CV/resume in any common format (PDF, DOCX, HTML, plain text), sends the text to an LLM, and deserialises the response directly into typed Rust structs — no regex, no hand-written parsers.

§Quick start

use cvxtract::{Extractor, Model};

#[tokio::main]
async fn main() {
    // Use any provider — here we use a local quantised model (no API key required).
    let mut extractor = Extractor::new(Some(Model::from_local()));

    match extractor.extract_resume("resume.pdf".into()).await {
        Ok(resume) => println!("{:#?}", resume),
        Err(e) => eprintln!("Extraction failed: {e}"),
    }
}

§Providers

ConstructorBackendRequires
Model::from_local()llama-cpp-2 on-device (Qwen3.5-2B)nothing — model auto-downloaded
Model::from_openai()OpenAI APIOPENAI_API_KEY env var
Model::from_openrouter()OpenRouterOPENROUTER_API_KEY env var
Model::from_ollama()Local OllamaOllama running on localhost:11434
Model::from_openai_compatible()Any OpenAI-compatible endpointexplicit key + URL
Model::from_copilot()GitHub CopilotCOPILOT_TOKEN env var

§GPU acceleration

Compile with a feature flag to offload the local model to your GPU:

# NVIDIA CUDA
cargo build --release --features cuda
# Apple Silicon (Metal)
cargo build --release --features metal
# AMD / Intel / Vulkan
cargo build --release --features vulkan

§Custom types

Implement serde::Deserialize and schemars::JsonSchema on any struct to extract arbitrary shapes from a CV:

use cvxtract::{Extractor, Model};
use schemars::JsonSchema;
use serde::Deserialize;

#[derive(Debug, Deserialize, JsonSchema)]
struct ContactInfo {
    name: String,
    email: Option<String>,
    phone: Option<String>,
}

#[tokio::main]
async fn main() {
    let mut extractor = Extractor::new(Some(Model::from_local()));
    let info: ContactInfo = extractor.extract("resume.pdf".into()).await.unwrap();
    println!("{:#?}", info);
}

Structs§

Award
Certification
DateRange
Half-open date range. start and end are both null when dates are unknown.
Document
Represents extracted document content with metadata
DocumentElement
Document elements (structured content)
DocumentMetadata
Document metadata
Education
Experience
Extractor
Orchestrates document loading and LLM-powered structured extraction.
Language
Model
An LLM provider that can generate text from a prompt.
PartialDate
Year-always-present, month/day optional.
Project
Resume
Top-level resume — covers the vast majority of real-world CVs.
SkillGroup
Skills can be flat or grouped by category.
UnstructuredLoader
Loads CV/resume documents in any supported format, automatically detecting the type.

Enums§

ExtractionError
Errors that can occur during structured extraction.
FileType
Supported file types
LoaderError
Error types for document loading