cvxtract 0.1.0

LLM-powered structured extraction from CVs/resumes — PDF, DOCX, HTML, TXT input; typed Rust structs output.
Documentation

cvxtract

Crates.io Docs.rs License

LLM-powered structured extraction from CVs and resumes.

cvxtract loads a CV/resume in any common format (PDF, DOCX, HTML, plain text), sends the text to an LLM of your choice, and deserialises the response directly into typed Rust structs — no regex, no hand-written parsers.

Quick start

use cvxtract::{Extractor, Model};

#[tokio::main]
async fn main() {
    // No API key needed — model is downloaded automatically on first run.
    let mut extractor = Extractor::new(Some(Model::from_local()));

    match extractor.extract_resume("resume.pdf".into()).await {
        Ok(resume) => {
            println!("Name:  {}", resume.name);
            println!("Email: {}", resume.email.as_deref().unwrap_or("-"));
            println!("Jobs:  {}", resume.experience.len());
        }
        Err(e) => eprintln!("Extraction failed: {e}"),
    }
}

Installation

[dependencies]

cvxtract = "0.1"

tokio = { version = "1", features = ["full"] }

Providers

Constructor Backend Auth
Model::from_local() llama-cpp-2 on-device (Qwen3.5-2B) none — model auto-downloaded
Model::from_openai() OpenAI API OPENAI_API_KEY
Model::from_openrouter() OpenRouter OPENROUTER_API_KEY
Model::from_ollama() Local Ollama Ollama running on localhost:11434
Model::from_openai_compatible() Any OpenAI-compatible endpoint explicit key + URL
Model::from_copilot() GitHub Copilot COPILOT_TOKEN
// OpenAI
let model = Model::from_openai("gpt-4o-mini");

// Ollama (local)
let model = Model::from_ollama("llama3.2");

// Any OpenAI-compatible endpoint
let model = Model::from_openai_compatible(
    "https://api.my-provider.com/v1",
    std::env::var("MY_API_KEY").unwrap(),
    "my-model-name",
);

Supported input formats

Format Extension
PDF .pdf
Word .docx
HTML .html, .htm
Plain text .txt

Built-in Resume type

extract_resume() populates a comprehensive Resume struct:

pub struct Resume {
    pub name:           String,
    pub email:          Option<String>,
    pub phone:          Option<String>,
    pub location:       Option<String>,
    pub linkedin:       Option<String>,
    pub github:         Option<String>,
    pub website:        Option<String>,
    pub summary:        Option<String>,
    pub experience:     Vec<Experience>,   // company, role, dates, highlights
    pub education:      Vec<Education>,    // institution, degree, field, dates
    pub skills:         Vec<SkillGroup>,   // grouped or flat skill lists
    pub projects:       Vec<Project>,      // name, tech stack, URL
    pub certifications: Vec<Certification>,
    pub languages:      Vec<Language>,
    pub awards:         Vec<Award>,
}

Custom types

Extract any shape by deriving serde::Deserialize and schemars::JsonSchema:

use cvxtract::{Extractor, Model};
use schemars::JsonSchema;
use serde::Deserialize;

#[derive(Debug, Deserialize, JsonSchema)]
struct ContactInfo {
    name:  String,
    email: Option<String>,
    phone: Option<String>,
}

#[tokio::main]
async fn main() {
    let mut extractor = Extractor::new(Some(Model::from_openai("gpt-4o-mini")));
    let info: ContactInfo = extractor
        .extract::<ContactInfo>("resume.pdf".into())
        .await
        .unwrap();
    println!("{info:#?}");
}

GPU acceleration (local model)

When using Model::from_local(), compile with a feature flag to offload layers to your GPU. llama.cpp auto-fits what it can into VRAM and falls back to CPU for the remainder — this is safe even on GPUs with limited memory.

# NVIDIA CUDA

cargo build --release --features cuda


# Apple Silicon (Metal)

cargo build --release --features metal


# AMD / Intel Vulkan

cargo build --release --features vulkan

# Cargo.toml

[dependencies]

cvxtract = { version = "0.1", features = ["cuda"] }

Error handling

All async methods return Result<T, ExtractionError>:

use cvxtract::ExtractionError;

match extractor.extract_resume(path).await {
    Ok(resume) => { /* use resume */ }
    Err(ExtractionError::LoadError(e))  => eprintln!("Could not load file: {e}"),
    Err(ExtractionError::ModelError(m)) => eprintln!("LLM error: {m}"),
    Err(ExtractionError::ParseError(e)) => eprintln!("JSON parse error: {e}"),
}

Raw document loading

Use UnstructuredLoader to extract plain text from a file without any LLM call:

use cvxtract::UnstructuredLoader;

let loader = UnstructuredLoader::new();
let doc = loader.load("resume.pdf")?;
println!("{} characters extracted", doc.content.len());

License

Licensed under either of

at your option.

  1. Write tests for your changes
  2. Submit a pull request

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.