pptx-to-md
pptx-to-md is a library to parse Microsoft PowerPoint (.pptx) slides and convert them into structured Markdown content and data, making it easy to process, use, or integrate slide data programmatically.
🚀 Features
- 📄 Extract Slide Text: Parses and extracts text elements from slides.
- 📋 Lists & Tables: Recognizes and formats lists (ordered/unordered) and tables into Markdown.
- 🖼️ Embedded Images: Supports embedded images extraction as base64-encoded inline images.
- 💾 Memory Efficient: Use the streaming API to iterate over one slide at a time, never overloading memory.
- ⚙️ Robust & Safe APIs: Designed according to Rust best practices with explicit error handling.
- 🧑💻 Developer-Friendly: Simple API design, extensive documentation, and examples.
- 🪄 Embedding: Used to provide pptx content and meta information in a form that is useful for embeddings
📦 Installation
Include the following line in your Cargo.toml dependencies section:
[]
= "0.1" # replace with the current version
👨💻 Example Usage
Here's an easy example to convert a PowerPoint slide into Markdown:
use PptxContainer;
use Path;
🏗 Project Structure
pptx-to-md/
├── Cargo.toml
├── README.md
├── src/
│ ├── lib.rs # Public API
│ ├── container.rs # Pptx container handling
│ ├── slide.rs # Individual slide representation & markdown conversion
│ ├── parse_xml.rs # XML parsing logic
│ └── types.rs # Common data types used
├── tests/
│ ├── test_data/ # XML & MD test data files
└── └── slide_tests.rs # tests for md conversion logic
📜 License
This project is licensed under the MIT-License and Apache 2.0-Licence.
Feel free to contribute or suggest improvements!