Skip to main content

Crate unhwp

Crate unhwp 

Source
Expand description

§unhwp

A high-performance Rust library for extracting HWP/HWPX Korean word processor documents into structured Markdown with assets.

§Supported Formats

  • HWP 5.0+: Binary format using OLE containers (most common)
  • HWPX: XML-based format using ZIP containers (modern standard)
  • HWP 3.x: Legacy binary format (with hwp3 feature)

§Quick Start

use unhwp::{parse_file, RenderOptions};

fn main() -> unhwp::Result<()> {
    // Parse a document
    let document = parse_file("document.hwp")?;

    // Render to Markdown
    let options = RenderOptions::default();
    let markdown = unhwp::render::render_markdown(&document, &options)?;

    println!("{}", markdown);
    Ok(())
}

§Features

  • hwp5 (default): HWP 5.0 binary format support
  • hwpx (default): HWPX XML format support
  • hwp3: Legacy HWP 3.x format support
  • async: Async I/O support with Tokio

Re-exports§

pub use cleanup::cleanup;
pub use cleanup::CleanupOptions;
pub use detect::detect_format;
pub use detect::detect_format_from_bytes;
pub use detect::detect_format_from_path;
pub use detect::FormatType;
pub use error::Error;
pub use error::Result;
pub use model::Document;
pub use parse_options::ErrorMode;
pub use parse_options::ExtractMode;
pub use parse_options::ParseOptions;
pub use render::RenderOptions;
pub use render::TableFallback;

Modules§

cleanup
Cleanup Pipeline
detect
Format detection for HWP/HWPX documents.
equation
Equation script to LaTeX conversion for HWP documents.
error
Error types for unhwp library.
hwp5
HWP 5.0 binary format parser.
hwpx
HWPX (OWPML) XML format parser.
model
Document model (Intermediate Representation).
parse_options
Parsing options for document extraction.
render
Markdown rendering for documents.

Structs§

ParsedDocument
A parsed document ready for rendering.
Unhwp
Builder for parsing and rendering documents.

Functions§

extract_text
Extracts plain text from a document file.
parse_bytes
Parses a document from bytes.
parse_file
Parses a document from a file path.
parse_reader
Parses a document from a reader.
to_markdown
Converts a document to Markdown with default options.
to_markdown_with_options
Converts a document to Markdown with custom options.