pdf-rs
⚠️ Warning: This project is currently under active development. APIs may change frequently and are not yet stable.
A PDF parsing library written in Rust.
Overview
pdf-rs is a Rust library for parsing PDF files. The project aims to provide parsing functionality for PDF document structures, including:
- PDF version identification
- Cross-reference table (xref) parsing
- Object parsing (dictionaries, arrays, strings, etc.)
- Basic access to PDF structures
Features
- PDF Version Support: Supports PDF versions from 1.0 to 2.0
- Object Parsing: Parses various object types in PDF, including dictionaries, arrays, strings, etc.
- Cross-reference Table Parsing: Parses PDF's xref table to locate objects
- Stream Reading: Uses
Sequencetrait for efficient streaming file reading - Memory Efficiency: Designed to minimize memory usage during parsing
- Error Handling: Comprehensive error handling with detailed error messages
- Type Safety: Fully utilizes Rust's type system for safety guarantees
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Usage Example
use PathBuf;
use PDFDocument;
// Create PDF document parser
let path = from;
let document = open?;
// Access PDF version
println!;
// Get cross-reference table
let xrefs = document.get_xref;
println!;
API Documentation
For detailed API documentation, please refer to the crate documentation.
Module Structure
document: Main PDF document parsing functionalityobjects: PDF object representations (dictionaries, arrays, strings, etc.)parser: Core parsing logic for PDF objectssequence: Streaming file reading utilitiestokenizer: Tokenization of PDF contenterror: Error types and handling
Design Highlights
- Modular Design: Different functionalities separated into different modules for easy maintenance and extension
- Error Handling: Comprehensive error type system providing detailed error information
- Memory Efficiency: Streaming reading avoids loading entire files into memory
- Type Safety: Fully utilizes Rust's type system for safety guarantees
- Extensibility: Designed with extensibility in mind for future enhancements
Current Status
The project is in early development stage. Basic PDF parsing functionality has been implemented, including version detection, xref table parsing, and basic object parsing.
Future Plans
- Improve PDF object parsing functionality
- Add encrypted PDF support
- Implement advanced PDF content extraction features
- Provide more user-friendly API interfaces
- Add comprehensive documentation with examples
- Improve performance and memory usage
Build Requirements
- Rust 1.5+ (latest stable version recommended)
Build Steps
Run Tests
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
- Thanks to the Rust community for providing excellent tools and libraries
- Inspired by other PDF parsing libraries in different languages