pptx-to-md
pptx-to-md is a library to parse Microsoft PowerPoint (.pptx) slides and convert them into structured Markdown content and data, making it easy to process, use, or integrate slide data programmatically.
🚀 Features
- 📄 Extract Slide Text: Parses and extracts text elements from slides.
- 📋 Lists & Tables: Recognizes and formats lists (ordered/unordered) and tables into Markdown.
- 🖼️ Embedded Images: Supports embedded images extraction as base64-encoded inline images.
- 💾 Memory Efficient: Use the streaming API to iterate over one slide at a time, never overloading memory.
- ⏱️ Multithreading: Optional support for multithreaded parsing of PowerPoint slides, with a significant performance increase for larger presentations.
- ⚙️ Robust & Safe APIs: Designed according to Rust best practices with explicit error handling.
- 🪄 Embedding: Used to provide pptx content and meta information in a form that is useful for embeddings
👨💻 Example Usage
Here's an easy example to convert a PowerPoint slide into Markdown*:
use ;
use Path;
*for more usage examples refer to the examples directory
Config Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
extract_images |
bool |
true |
Whether images are extracted from slides or not |
compress_images |
bool |
true |
Whether images are compressed before encoding or not |
image_quality |
u8 |
80 |
Defines the image compression quality (0-100). Higher values mean better quality but larger file sizes. |
🏗 Project Structure
pptx-to-md/
├── Cargo.toml
├── README.md
├── CHANGELOG.md
├── LICENSE-MIT
├── LICENSE-APACHE
├── examples/ # Simple examples to present the usage of this crate
│ ├── basic_usage.rs
│ ├── image_extractions.rs
│ ├── memory_efficient_streaming.rs
│ └── slide_elements.rs
├── src/
│ ├── lib.rs # Public API
│ ├── container.rs # Pptx container handling
│ ├── parser_config.rs # Config and config builder
│ ├── slide.rs # Individual slide representation & markdown conversion
│ ├── parse_xml.rs # XML parsing logic
│ ├── parse_rels.rs # Relationship parsing logic
│ └── types.rs # Common data types used
├── tests/
│ ├── test_data/ # XML & MD test data files
└── └── slide_tests.rs # tests for md conversion logic
📦 Installation
Include the following line in your Cargo.toml dependencies section:
[]
= "0.1.2" # replace with the current version
📜 License
This project is licensed under the MIT-License and Apache 2.0-Licence.
Feel free to contribute or suggest improvements!