Skip to main content

Crate mdkit

Crate mdkit 

Source
Expand description

§mdkit — get markdown out of any document.

See the README for the full design rationale; the short version is: dispatch by file extension to the best backend per format. Pandoc for DOCX/PPTX/EPUB/RTF/ODT/LaTeX, Pdfium for PDF, OS-native APIs for OCR, calamine for spreadsheets.

§Quick start

use mdkit::Engine;
use std::path::Path;

let engine = Engine::with_defaults();
let doc = engine.extract(Path::new("report.pdf"))?;
println!("{}", doc.markdown);

§Custom extractor

Implement Extractor for your own format and register it on an Engine:

use mdkit::{Document, Engine, Extractor, Result};
use std::path::Path;

struct MyParser;

impl Extractor for MyParser {
    fn extensions(&self) -> &[&'static str] { &["custom"] }
    fn extract(&self, path: &Path) -> Result<Document> {
        Ok(Document::new(std::fs::read_to_string(path)?))
    }
}

let mut engine = Engine::new();
engine.register(Box::new(MyParser));

Structs§

Document
The result of extracting one document. Markdown is always present; title and metadata are best-effort and may be empty depending on the backend.
Engine
Dispatches extract calls to the registered Extractor for the file’s extension. Construct with Engine::new for an empty engine, or Engine::with_defaults to populate the defaults that match enabled feature flags.

Enums§

Error
Errors that can arise during extraction.

Traits§

Extractor
A backend that knows how to convert one or more file formats to markdown. Implementors register themselves with an Engine.

Type Aliases§

Result
Result alias used across the crate.