anytomd

A pure Rust tool and library that converts various document formats into Markdown — designed for LLM consumption.

Why?

MarkItDown is a great Python library for converting documents to Markdown. But integrating Python into Rust applications means bundling a Python runtime (~50 MB), dealing with cross-platform compatibility issues, and managing dependency hell.

anytomd solves this with a single cargo add anytomd — zero external runtime, no C bindings, no subprocess calls. Just pure Rust.

Supported Formats

Format	Extensions	Notes
DOCX	`.docx`	Headings, tables, lists, bold/italic, hyperlinks, images
PPTX	`.pptx`	Slides, tables, speaker notes, images
XLSX	`.xlsx`	Multi-sheet, date/time handling, images
XLS	`.xls`	Legacy Excel (via calamine)
HTML	`.html`, `.htm`	Full DOM: headings, tables, lists, links, blockquotes, code blocks
CSV	`.csv`	Converted to Markdown tables
Jupyter Notebook	`.ipynb`	Markdown cells preserved, code cells in fenced blocks with language detection
JSON	`.json`	Pretty-printed in fenced code blocks
XML	`.xml`	Pretty-printed in fenced code blocks
Images	`.png`, `.jpg`, `.gif`, `.webp`, `.bmp`, `.tiff`, `.svg`, `.heic`, `.avif`	Optional LLM-based alt text via `ImageDescriber`
Code	`.py`, `.rs`, `.js`, `.ts`, `.c`, `.cpp`, `.go`, `.java`, `.rb`, `.swift`, `.sh`, ...	Fenced code blocks with language identifier
Plain Text	`.txt`, `.md`, `.rst`, `.log`, `.toml`, `.yaml`, `.ini`, etc.	Passthrough with encoding detection (UTF-8, UTF-16, Windows-1252)

Note on PDF: PDF conversion is intentionally out of scope. Gemini, ChatGPT, and Claude already provide native PDF support (with plan/model-specific limits), so anytomd focuses on formats that still benefit from dedicated Markdown conversion.

Format is auto-detected from magic bytes and file extension. ZIP-based formats (DOCX/PPTX/XLSX) are distinguished by inspecting internal archive structure.

Installation

cargo add anytomd

Feature Flags

Feature	Dependencies	Description
(default)	`async-gemini`	Async API + `AsyncGeminiDescriber` — all async features enabled out of the box
`async`	`futures-util`	Async API (`convert_file_async`, `convert_bytes_async`, `AsyncImageDescriber` trait)
`async-gemini`	`async` + `reqwest`	`AsyncGeminiDescriber` for concurrent image descriptions via Gemini

Async features are included by default. To opt out:

anytomd = { version = "0.11", default-features = false }

CLI

Install

cargo install anytomd

Usage

# Convert a single file
anytomd document.docx > output.md

# Convert multiple files (separated by <!-- source: path --> comments)
anytomd report.docx data.csv slides.pptx > combined.md

# Write output to a file
anytomd document.docx -o output.md

# Read from stdin (--format is required)
cat data.csv | anytomd --format csv

# Override format detection
anytomd --format html page.dat

# Strict mode: treat recoverable errors as hard errors
anytomd --strict document.docx

# Plain text output (Markdown formatting stripped)
anytomd --plain-text document.docx

# Plain text from stdin
echo "Name,Age" | anytomd --format csv --plain-text

# Auto image descriptions (just set GEMINI_API_KEY)
export GEMINI_API_KEY=your-key
anytomd presentation.pptx

Exit Codes

Code	Meaning
0	Success
1	Conversion failure
2	Invalid arguments

Quick Start (Library)

use anytomd::{convert_file, convert_bytes, ConversionOptions};

// Convert a file (format auto-detected from extension and magic bytes)
let options = ConversionOptions::default();
let result = convert_file("document.docx", &options).unwrap();
println!("{}", result.markdown);

// Convert raw bytes with an explicit format
let csv_data = b"Name,Age\nAlice,30\nBob,25";
let result = convert_bytes(csv_data, "csv", &options).unwrap();
println!("{}", result.markdown);

Plain Text Output

Every conversion produces both Markdown and plain text output. The plain text is extracted directly from the source document — no post-processing or markdown stripping — so source characters like **kwargs or # comment are preserved exactly.

use anytomd::{convert_file, ConversionOptions};

let result = convert_file("document.docx", &ConversionOptions::default()).unwrap();

// Markdown output
println!("{}", result.markdown);

// Plain text output (no headings, bold, tables, code fences, etc.)
println!("{}", result.plain_text);

Extracting Embedded Images

use anytomd::{convert_file, ConversionOptions};

let options = ConversionOptions {
    extract_images: true,
    ..Default::default()
};
let result = convert_file("presentation.pptx", &options).unwrap();

for (filename, bytes) in &result.images {
    std::fs::write(filename, bytes).unwrap();
}

LLM-Based Image Descriptions

anytomd can generate alt text for images using any LLM backend via the ImageDescriber trait. A built-in Google Gemini implementation is included.

use std::sync::Arc;
use anytomd::{convert_file, ConversionOptions, ImageDescriber, ConvertError};
use anytomd::gemini::GeminiDescriber;

// Option 1: Use the built-in Gemini describer
let describer = GeminiDescriber::from_env()  // reads GEMINI_API_KEY
    .unwrap()
    .with_model("gemini-3-flash-preview".to_string());

let options = ConversionOptions {
    image_describer: Some(Arc::new(describer)),
    ..Default::default()
};
let result = convert_file("document.docx", &options).unwrap();
// Images now have LLM-generated alt text: ![A chart showing quarterly revenue](chart.png)

// Option 2: Implement your own describer for any backend
struct MyDescriber;

impl ImageDescriber for MyDescriber {
    fn describe(
        &self,
        image_bytes: &[u8],
        mime_type: &str,
        prompt: &str,
    ) -> Result<String, ConvertError> {
        // Call your preferred LLM API here
        Ok("description of the image".to_string())
    }
}

Async Image Descriptions

For documents with many images, the async API resolves all descriptions concurrently. Included by default since v0.11.0.

use std::sync::Arc;
use anytomd::{convert_file_async, AsyncConversionOptions, AsyncImageDescriber, ConvertError};
use anytomd::gemini::AsyncGeminiDescriber;

#[tokio::main]
async fn main() {
    let describer = AsyncGeminiDescriber::from_env().unwrap();

    let options = AsyncConversionOptions {
        async_image_describer: Some(Arc::new(describer)),
        ..Default::default()
    };

    let result = convert_file_async("presentation.pptx", &options).await.unwrap();
    println!("{}", result.markdown);
    // All images described concurrently — significant speedup for multi-image documents
}

The library has no tokio dependency — the caller provides the async runtime. Any runtime (tokio, async-std, etc.) works.

API

`convert_file`

/// Convert a file at the given path to Markdown.
/// Format is auto-detected from magic bytes and file extension.
pub fn convert_file(
    path: impl AsRef<Path>,
    options: &ConversionOptions,
) -> Result<ConversionResult, ConvertError>

`convert_bytes`

/// Convert raw bytes to Markdown with an explicit format extension.
pub fn convert_bytes(
    data: &[u8],
    extension: &str,
    options: &ConversionOptions,
) -> Result<ConversionResult, ConvertError>

`convert_file_async`

Included by default (requires the async feature if default features are disabled).

/// Convert a file at the given path to Markdown with async image description.
/// If an async_image_describer is set, all image descriptions are resolved concurrently.
pub async fn convert_file_async(
    path: impl AsRef<Path>,
    options: &AsyncConversionOptions,
) -> Result<ConversionResult, ConvertError>

`convert_bytes_async`

Included by default (requires the async feature if default features are disabled).

/// Convert raw bytes to Markdown with async image description.
pub async fn convert_bytes_async(
    data: &[u8],
    extension: &str,
    options: &AsyncConversionOptions,
) -> Result<ConversionResult, ConvertError>

`ConversionOptions`

Field	Type	Default	Description
`extract_images`	`bool`	`false`	Extract embedded images into `result.images`
`max_total_image_bytes`	`usize`	50 MB	Hard cap for total extracted image bytes
`max_input_bytes`	`usize`	100 MB	Maximum input file size
`max_uncompressed_zip_bytes`	`usize`	500 MB	ZIP bomb guard
`strict`	`bool`	`false`	Error on recoverable failures instead of warnings
`image_describer`	`Option<Arc<dyn ImageDescriber>>`	`None`	LLM backend for image alt text generation

`ConversionResult`

pub struct ConversionResult {
    pub markdown: String,                  // The converted Markdown
    pub plain_text: String,                // Plain text (extracted directly, no markdown syntax)
    pub title: Option<String>,             // Document title, if detected
    pub images: Vec<(String, Vec<u8>)>,    // Extracted images (filename, bytes)
    pub warnings: Vec<ConversionWarning>,  // Recoverable issues encountered
}

Error Handling

Conversion is best-effort by default. If a single element fails to parse (e.g., a corrupted table), it is skipped and a warning is added to result.warnings. The rest of the document is still converted.

Set strict: true in ConversionOptions to turn recoverable failures into errors instead.

Warning codes: SkippedElement, UnsupportedFeature, ResourceLimitReached, MalformedSegment.

Development

Build and Test

cargo build && cargo test && cargo clippy -- -D warnings

Docker

A Docker environment is available for reproducible Linux builds:

docker compose run --rm verify    # Full loop: fmt + clippy + test + release build
docker compose run --rm test      # Run all tests
docker compose run --rm lint      # clippy + fmt check
docker compose run --rm shell     # Interactive bash

License

Apache-2.0

anytomd 0.11.0