Open XML SDK for Rust
ooxmlsdk is a Rust library for reading, writing, and round-tripping Office Open XML documents such as .docx, .xlsx, and .pptx. It uses the .NET Open XML SDK as a primary reference for OOXML package and schema behavior, but exposes Rust-native generated types, serializers, and strongly typed package parts.
Features
The runtime crate exposes a small public feature surface:
default: enablesparts; this is the recommended configuration for most usersparts: enables package-level OOXML read/write support such asWordprocessingDocument,SpreadsheetDocument, andPresentationDocumentflat-opc: enables Flat OPC package read/write helpers and depends onpartsmce: enables Markup Compatibility and Extensibility processing and depends onpartsvalidators: enables optional validator APIs
The minimum supported Rust version is 1.88, and the workspace uses the Rust 2024 edition.
Documentation
Rust API documentation is published on docs.rs/ooxmlsdk.
Guides and examples are maintained separately in KaiserY/ooxmlsdk-doc. That documentation follows the shape of the Microsoft Learn Open XML SDK documentation, but rewrites the material for this Rust crate, Rust naming, Cargo features, generated schema types, and Rust package APIs.
For background on Open XML package concepts, file format structure, WordprocessingML, SpreadsheetML, PresentationML, Flat OPC, and Markup Compatibility, the Microsoft Learn documentation remains the upstream conceptual reference. This crate follows many of the same package and schema concepts while exposing Rust-native generated types and feature flags.
Round-Trip Coverage
Corpus-scale round-trip data is tracked in KaiserY/ooxmlsdk-test-suite. Latest recorded results:
| Corpus | Files | Round-trip candidates | Open-only | Invalid | Result |
|---|---|---|---|---|---|
| Apache POI | 677 | 602 | 11 | 64 | 677 passed / 0 failed |
| LibreOffice | 3368 | 3335 | 7 | 26 | 3368 passed / 0 failed |
| Open-XML-SDK | 884 | 874 | 6 | 4 | 884 passed / 0 failed |
Last run: 2026-06-07.
Version Coverage
Office 2007 is the baseline. The checked-in generated schemas also include newer OOXML namespaces and package parts from the upstream metadata.
Common build shapes:
- default: generated schemas plus package APIs
--no-default-features --features parts: package APIs only--no-default-features --features flat-opc: package APIs plus Flat OPC helpers--no-default-features --features mce: package APIs plus Markup Compatibility and Extensibility processing--features validators: optional validator APIs
The generated runtime includes Office 2010, 2013, 2016, 2019, 2021, Microsoft 365-era extensions, and newer upstream namespace revisions currently present in the checked-in metadata. In practice this covers later DrawingML and chart extensions, SVG and 3D-related parts, threaded comments, dynamic-array-era spreadsheet extensions, and other post-2007 additions tracked by Open XML SDK metadata.
Package API
The parts feature exposes package-level APIs for .docx, .xlsx, and .pptx files. The intended public surface follows upstream Open XML SDK concepts:
- open and create packages with constructors such as
new,new_with_settings,new_from_file, andnew_from_file_with_settings - save packages with
save - create custom parts with
add_new_part_with_content_type_and_pathwhen the caller needs an explicit package path and content type - inspect package and part relationships with
parts,get_all_parts,get_part_by_id,get_parts_of_type, and relationship-specific helpers - traverse typed related parts with helpers such as
related_parts_of_type,related_part_of_type, and relationship-type-specific variants when the relationship id is needed alongside the typed part - access well-known child parts through typed methods such as
main_document_part,workbook_part,presentation_part,worksheet_parts,font_table_part, and chart-related part accessors - read, replace, or unload parsed part payloads through public data helpers and root-element helpers
Raw package storage, raw relationship sets, generated factory internals, and unchecked dynamic part plumbing are not part of the public API. Prefer the package and part methods above when writing code that should survive generator updates.
The package API follows Open XML SDK container concepts. When relationship metadata matters, typed traversal helpers return RelatedPart<T> so callers can keep the typed part and its r:id together.
Generated Schema API
The schemas module is generated from upstream Open XML SDK metadata plus checked-in schema extensions. Generated names are intended to read like Rust while staying traceable to the source schema:
- repeated child fields are named for their item type, for example
paragraph,extension, ortable_row - choices use concrete child names when the schema provides enough information; generic names remain for genuinely anonymous schema groups
- common scalar shapes are typed: lists are
Vec<T>, OOXML booleans are enums, and measures/percentages use unit wrappers - extension and wildcard content is preserved, with known children exposed through typed choices where possible
Prefer these generated types and conversion helpers over raw XML strings in new code. See the changelog for release-specific API changes.
XML And MCE Compatibility
The generated XML reader/writer preserves markup compatibility data needed for stable round trips, including common mc:* attributes, mc:AlternateContent, choice/fallback content, unknown extension attributes, and extension namespace children used by newer Office documents.
With the mce feature enabled, package/root loading can process known Markup Compatibility and Extensibility constructs such as mc:AlternateContent and package-level ProcessAllParts behavior. Integration coverage includes upstream-derived MCE, strict, OPC, extension, and real-world compatibility samples, with tests focused on public Rust APIs and stable XML/package round trips.
Unknown-element DOM editing and markup compatibility validator behavior are still future work.
Flat OPC
The flat-opc feature exposes Wordprocessing Flat OPC helpers for loading and writing XML package representations. Flat OPC APIs support strings and readers, and written Flat OPC preserves binary package parts such as alternative format import parts while writing XML-safe parts such as SVG media as XML data.
Project Structure
crates/ooxmlsdk: runtime library exposed to downstream userscrates/ooxmlsdk-build: generator that turns checked-in metadata into Rust codecrates/ooxmlsdk-derive: derive macros used by the generated runtime codecrates/ooxmlsdk-test: integration tests and benchmarkssdk_data/: checked-in intermediate generator datadata/: upstream-derived metadata snapshots consumed by the generator pipelineschemas/OpenPackagingConventions-XMLSchema/: package schema inputs used by the generator
The generated runtime code under crates/ooxmlsdk/src/schemas/, crates/ooxmlsdk/src/deserializers/, crates/ooxmlsdk/src/serializers/, crates/ooxmlsdk/src/parts/, and related module files is intended to be checked in and reviewed as generated artifacts.
Validation And Benchmarks
For release validation, this repository uses the full workspace sequence:
For runtime performance work, prefer evaluating cargo bench -p ooxmlsdk-test as a whole. The packages and xml suites have shown a persistent disagreement on wordprocessing_document/write/parsed, so treat that one case as an anomaly rather than as the sole performance signal.
The in-repository compatibility smoke lane is:
Corpus-scale package round-trip validation is maintained in the adjacent
../ooxmlsdk-test-suite/ checkout. Prefer that local path; the remote is
https://github.com/KaiserY/ooxmlsdk-test-suite.
The committed fixture set includes document, presentation, MCE, OPC, DrawingML, WML, and SpreadsheetML coverage, including spreadsheet cell types, defined names, formatting, formulas, freeze panes, merged cells, number formats, row/column dimensions, sheet visibility, and rich shared strings.
Known Limitations
- There is no
serdeintegration. - The validator surface is optional and still narrower than the core read/write path.
- Unknown-element DOM APIs and markup compatibility validator behavior are not yet exposed.
- Some schema shapes still map to generated enum-based child collections rather than a fully particle-aware hand-modeled API.
to_string()is justDisplay; prefer the XML-oriented APIs when you care about write performance.
Changelog
See CHANGELOG.md.
Data Provenance
data/ is directly copied from the upstream .NET Open XML SDK.
sdk_data/ is generated from the upstream .NET Open XML SDK, and schemas/OpenPackagingConventions-XMLSchema/ contains package schema inputs derived from the Open Packaging Conventions XSDs. Review upstream licensing before redistributing refreshed snapshots.
License
MIT OR Apache-2.0