xsd-parser 1.4.0

A comprehensive library for parsing XML schemas and generating code based on them.

This project originated as a fork of [`xsd-parser-rs`](https://github.com/lumeohq/xsd-parser-rs) but has since evolved into a complete rewrite.

If you enjoy the project and would like to support my work, you can [buy me a coffee](https://ko-fi.com/bergmann89) or [send a tip via PayPal](https://paypal.me/bergmann891/5EUR). Thanks a lot! 😊

<a href="https://github.com/Bergmann89/xsd-parser/blob/master/LICENSE"><img src="https://img.shields.io/crates/l/xsd-parser" alt="Crates.io License"></a> <a href="https://crates.io/crates/xsd-parser"><img src="https://img.shields.io/crates/v/xsd-parser" alt="Crates.io Version"></a> <a href="https://crates.io/crates/xsd-parser"><img src="https://img.shields.io/crates/d/xsd-parser" alt="Crates.io Total Downloads"></a> <a href="https://docs.rs/xsd-parser"><img src="https://img.shields.io/docsrs/xsd-parser" alt="docs.rs"></a> <a href="https://github.com/Bergmann89/xsd-parser/actions/workflows/main.yml"><img src="https://github.com/Bergmann89/xsd-parser/actions/workflows/main.yml/badge.svg" alt="Github CI"></a> <a href="https://deps.rs/repo/github/Bergmann89/xsd-parser"><img src="https://deps.rs/repo/github/Bergmann89/xsd-parser/status.svg" alt="Dependency Status"></a>


# Overview

This library is built around a staged transformation pipeline that converts XML schemas into Rust source code. Each stage handles a specific level of abstraction and produces a well-defined intermediate representation. This makes the library highly flexible, testable, and suitable for advanced customization or tooling.

![overview](doc/overview.svg "Overview")

## Pipeline Stages

1. **Parsing:**
   The parsing stage is handled by the `Parser` type. It loads XML schemas from files or URLs and uses pluggable `Resolver`s to fetch and preprocess schema definitions. The result is captured in a `Schemas` model, which stores namespaces, prefixes, and the raw schema structure needed for further processing.

2. **Interpreting:**:
   Interpreting is carried out by the `Interpreter`. This stage analyzes the schema definitions stored in the `Schemas` model and converts them into normalized, abstract type descriptions. The resulting `MetaTypes` model encapsulates schema semantics such as complex types, enumerations, references, and groups in a language-agnostic form.

3. **Optimizing:**
   Optimization is performed by the `Optimizer`, which takes the `MetaTypes` and applies structural transformations. These include deduplication, simplification of unions, merging cardinalities, and resolving typedef aliases. The goal is to prepare the type graph for idiomatic translation into Rust while reducing complexity.

4. **Generating:**
   The generation step uses the `Generator` to transform the abstract types into Rust-specific type data. It produces the `DataTypes` model by attaching names, Rust derivations, trait support, and rendering metadata. These enriched types form the basis for later rendering while still preserving schema semantics.

5. **Rendering:**
   Rendering is handled by the `Renderer`, which converts `DataTypes` into structured Rust code organized in a `Module`. It uses the `RenderStep` trait to define individual rendering steps. Several built-in steps are available, including support for `serde` or `quick-xml`. Users can also add custom `RenderStep` implementations to extend or modify the output.

## Data Models

- **`Schemas`:**
  This model is built by the `Parser` and contains the raw XML schema data, including namespaces, prefixes, and schema file content. It serves as the foundation for interpretation and supports multiple sources and resolver types.

- **`MetaTypes`:**
  Generated by the `Interpreter`, this model contains language-neutral type definitions. It includes data like complex types, references, enumerations, and groupings derived from schema structure. It is suitable for introspection, transformation, and optimization.

- **`DataTypes`:**
  Produced by the `Generator`, this model holds enriched Rust-specific type data. Each type includes metadata for layout, naming, derivations, and other traits required for rendering idiomatic Rust code. This is the core input for the rendering process.

- **`Module`:**
  The final model is produced by the `Renderer`. It wraps the Rust source code output into a structured format, ready for file output or consumption as token streams. Modules support nested submodules, file splitting, and embedded metadata for customization.


# Features

This library provides the following features:

- **Rust Code Generation:** Convert any XML schema into Rust code.
- **Layered Architecture:** Add user-defined code to manipulate type information or generated code.
- **User-Defined Types:** Inject existing types into the generated code to reuse predefined structures.
- **`serde` Support:** Generate code for serialization and deserialization using [`serde`](https://docs.rs/serde) with [`serde_xml`](https://docs.rs/serde-xml-rs) or [`quick_xml`](https://docs.rs/quick-xml) as serializer/deserializer.
- **`quick_xml` Support:** Direct serialization/deserialization support using [`quick_xml`](https://docs.rs/quick-xml), avoiding `serde` limitations and leveraging asynchronous features.


# Planned Features

- **Schema-Based Validation:** Generate validators directly from schemas to validate XML data during reading or writing.


# Changelog

Below you can find a short list of the most important changes for each released version.

## Version 1.4.0

This release focuses on extending customization options, improving `quick_xml` serialization/deserialization, and restructuring the crate into a more modular form. It also includes numerous fixes for naming, type handling, and schema interpretation issues found in real-world XSDs.

- **Customizable Name Generation**
  The entire naming system has been reworked to allow full customization of how identifiers are formatted and generated. This includes support for handling reserved keywords, resolving name collisions, and providing user-defined naming strategies.

- **Crate Split into `xsd-parser` and `xsd-parser-types`**
  The library is now split into two crates. All runtime types and dependencies were moved into `xsd-parser-types`, reducing dependency overhead for users of the generator and improving compatibility with downstream projects.

- **Improved `quick_xml` Serializer/Deserializer**
  The `quick_xml` backend received a comprehensive refactor. Namespace serialization has been improved, state naming is simplified, async reading tests were added, and multiple edge cases involving `xs:any`, `xs:anyType`, CDATA, mixed content, and boxed types are now handled correctly.

- **Extended Interpreter and Generator Capabilities**
  Several complex schema patterns are now correctly interpreted and rendered:
  - complex/simple content combinations with mixed or emptyable bases
  - extended types with mixed content
  - facets applied to enums, unions, and simple types
  - improved handling of `xs:any`, `xs:anyAttribute`, and custom XML runtime types
  - unified configuration for XML helper types
  - new optimizer step to replace `xs:anyType` with `AnyElement` where appropriate

- **Absolute Path Support Improvements**
  Both `GeneratorFlags::BUILD_IN_ABSOLUTE_PATHS` and the new `GeneratorFlags::ABSOLUTE_PATHS_INSTEAD_USINGS` flag now ensure that all built-in types and traits are resolved using absolute paths, reducing the risk of naming conflicts.

- **Additional improvements include**
  - faster loop detection for large schemas resulting in better performance while code generation
  - improved namespace handling and prefix resolution
  - more reliable display names during serialization
  - fixes for mixed choices, nested `xs:any`, substitution groups, and enum/union facets
  - updated schema definitions and expanded test coverage

## Version 1.3

This release introduces new configuration options, enhanced schema modularization, extended type handling, improved serializer support, and broader schema compatibility. It also addresses a range of long-standing issues with stability and schema validation.

- **New Examples and Customization Options**
  An example for custom named enum variants has been added, showcasing how to override generated names with user-defined ones.
  The renderer context was refactored and expanded with helper methods, giving developers more control over schema-level configuration and code generation behavior.

- **Schema Modularization**
  Generated code can now be split into modules per schema definition. This improves maintainability and separation of concerns when working with large or multi-schema projects.
  Meta information is now attached to schemas, making it easier to inspect and debug schema processing.

- **Support for Additional Schemas**
  Support for the OFD schema was added to the test suite, expanding real-world coverage.
  Compatibility with the ONIX schema was introduced, ensuring support for publishing and book-industry data standards.

- **Extended Type Handling**
  A new option allows the use of the full path for built-in types, ensuring unambiguous references in complex projects.
  A new `models::data::TagName` type was introduced to provide consistent and reliable handling of XML tag names across the pipeline.
  Inline element types are now lazily interpreted, reducing memory usage and improving performance for large schemas.
  Restrictions expressed through `xs:facet` are now evaluated and applied to simple type definitions, closing a gap in schema validation support.

- **Improved Serializer Support**
  A new configuration for the `quick_xml` serializer has been implemented, making it possible to fine-tune serialization behavior.
  Existing schema tests were updated to use the new serializer configuration, ensuring consistent results across different serializers.

- **Bug Fixes and Stability Improvements**
  - Fixed escaping and unescaping of mixed content and special characters
  - Resolved interpreter errors with self-referencing types
  - Fixed duplicate type names generated when an attribute named `Type` was present
  - Corrected deserialization failures for XML with `elementFormDefault=qualified`
  - Included schemas now properly inherit their target namespace instead of defaulting incorrectly
  - Restored missing integration tests and expanded coverage for real-world scenarios

## Version 1.2

This release introduces a series of architectural improvements, enhanced flexibility in code generation, and broader schema compatibility.

- **Refactored Pipeline Structure**
  The internal code generation pipeline has been refactored to introduce a new `Renderer` step and an accompanying `DataType` model. This separation gives users more control over the rendering process, allows better extension points for customization, and prepares the architecture for further growth.

- **Refactored Serde Support**
  Support for `serde` has been moved into dedicated renderer steps. This makes it possible to support multiple versions of `serde`-based implementations, such as `serde-xml-rs` 0.7 and 0.8, without mixing code. Each renderer step now cleanly encapsulates the logic for one serialization backend.

- **Implement Support for Unstructured Data**
  Added support for `xs:any` and `xs:anyAttribute` by introducing an internal representation for unstructured XML data. This enables working with flexible or unknown schema elements and fixes a long-standing gap in schema coverage.

- **Implement Support for `BigInt` and `BigUint`**
  Schemas defining integer types without upper bounds can now be mapped to `num::BigInt` or `num::BigUint`, depending on context. This is useful when working with large numeric values.

- **Improved Documentation Support**
  XSD annotations (`xs:documentation`) are now parsed and included as Rust doc comments in the generated code, improving type-level visibility and usability.

- **Different Bug Fixes and Improvements**
  - Enum restrictions on text types are now correctly interpreted and rendered
  - Complex types in the XML Catalog schema are now rendered correctly
  - Introduced per-type `derive` settings for advanced customization
  - Various naming, escaping, and formatting issues were resolved across the pipeline.
  - Generated names of nested elements now uses the name of the parent element as prefix to prevent name collisions.

## Version 1.1

- Implemented feature to generated boxed `quick_xml` deserializers to reduce stack usage during deserialization
- Improved naming of the generated types
- Implemented feature to split generated code into multiple module files
- Improved and implemented advanced examples
- General bug fixes and improvements

## Version 1.0

- First official release of `xsd-parser`


# License

This crate is licensed under the MIT License.