hwpers
A Rust library for parsing Korean Hangul Word Processor (HWP) files with full layout rendering support.
Features
- Complete HWP 5.0 Format Support: Parse all document components including text, formatting, tables, and embedded objects
- Visual Layout Rendering: Reconstruct documents with pixel-perfect accuracy when layout data is available
- Font and Style Preservation: Extract and apply original fonts, sizes, colors, and text formatting
- Advanced Layout Engine: Support for multi-column layouts, line-by-line positioning, and character-level formatting
- SVG Export: Render documents to scalable vector graphics
- Zero-copy Parsing: Efficient parsing with minimal memory allocation
- Safe Rust: Memory-safe implementation with comprehensive error handling
Quick Start
Add this to your Cargo.toml
:
[]
= "0.1"
Basic Usage
use HwpReader;
// Parse an HWP file
let document = from_file?;
// Extract text content
let text = document.extract_text;
println!;
// Access document properties
if let Some = document.get_properties
// Iterate through sections and paragraphs
for in document.sections.enumerate
Visual Layout Rendering
use ;
let document = from_file?;
// Create renderer with custom options
let options = RenderOptions ;
let renderer = new;
let result = renderer.render;
// Export first page to SVG
if let Some = result.to_svg
println!;
Advanced Formatting Access
// Access character and paragraph formatting
for section in document.sections
Supported Features
Document Structure
- ✅ File header and version detection
- ✅ Document properties and metadata
- ✅ Section definitions and page layout
- ✅ Paragraph and character formatting
- ✅ Font definitions (FaceName)
- ✅ Styles and templates
Content Types
- ✅ Text content with full Unicode support
- ✅ Tables and structured data
- ✅ Control objects (images, OLE objects)
- ✅ Numbering and bullet lists
- ✅ Tab stops and alignment
Layout and Rendering
- ✅ Page dimensions and margins
- ✅ Multi-column layouts
- ✅ Line-by-line positioning (when available)
- ✅ Character-level positioning (when available)
- ✅ Borders and fill patterns
- ✅ SVG export with accurate positioning
Advanced Features
- ✅ Compressed document support
- ✅ CFB (Compound File Binary) format handling
- ✅ Multiple encoding support (UTF-16LE)
- ✅ Error recovery and partial parsing
Command Line Tool
The library includes a command-line tool for inspecting HWP files:
# Install the tool
# Inspect an HWP file
Format Support
This library supports HWP 5.0 format files. For older HWP formats, consider using format conversion tools first.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Acknowledgments
- HWP file format specification by Hancom Inc.
- Korean text processing community
- Rust parsing and document processing ecosystem