CML - Content Markup Language (Rust Implementation)
CML (Content Markup Language) is an XML-based markup language with profile-based extensibility for representing structured knowledge. This is the reference Rust implementation of CML v0.1, providing parsing, generation, and embedding storage for structured documents.
Overview
CML v0.1 provides:
- ✅ Standardized structure -
<cml>/<header>/<body>/<footer>for all documents - 📋 Profile system - Domain-specific vocabularies (code, legal, wiki, etc.)
- 🗜️ Byte Punch compression - 40-70% size reduction with profile-aware dictionaries
- 🔍 Semantic search - Vector keywords and full-text indexing
- 📝 XSD schemas - Strict validation for all profiles
- 🔄 Round-trip fidelity - Parse → Generate → Parse yields identical results
Quick Start
use ;
// Parse a CML document
let xml = read_to_string?;
let doc = parse_cml?;
// Generate CML
let generator = CmlGenerator;
let output = generator.generate_cml?;
// Create a new document
let doc = CmlDocument ;
Profiles
code:api (v1.0 - Ratified)
For API documentation with semantic search.
Namespace: https://schemas.continuity.org/profiles/code/1.0
Elements:
<module>- Code modules/packages<struct>- Data structures<enum>- Enumerations<trait>- Traits/interfaces<function>- Free functions<method>- Methods on types<field>- Struct/enum fields
Example:
Rust Standard Library: Vec<T>
std.collections.vec
A contiguous growable array type.
pub fn push(&mut self, value: T)
Appends an element to the back.
amortized O(1)
See examples/cml/code-api-example.cml for full example.
legal:constitution (v1.0 - Ratified)
For constitutional and statutory documents.
Namespace: https://schemas.continuity.org/profiles/legal/1.0
Elements:
<preamble>- Document preamble<article>- Top-level articles<section>- Sections within articles<clause>- Individual clauses<paragraph>- Subdivisions<amendment>- Amendments to the document
Example:
Constitution of the United States
us.federal.constitution
We the People of the United States...
All legislative Powers herein granted...
See examples/cml/legal-constitution-example.cml for full example.
bookstack:wiki (v0.1 - Local Namespace)
For knowledge base / wiki content.
Namespace: https://local.namespace/continuity/bookstack/0.1 (pending ratification)
Elements:
<book>- Top-level book<chapter>- Chapters within books<page>- Individual pages<shelf>- Collections of books<content>- Page content (markdown/html/plain)<tags>- Metadata tags
Example:
Engineering Documentation
company.engineering.rust-guide
See examples/cml/bookstack-wiki-example.cml for full example.
CML Structure
Root Element
All CML documents start with:
Attributes:
version- CML version (currently "0.1")encoding- Character encoding (always "utf-8")profile- Profile identifier (e.g., "code:api", "legal:constitution")
Header Section
Required metadata about the document:
Document Title
Name
unique.document.id
Optional summary
Body Section
Profile-specific content. Structure depends on the profile.
Footer Section (Optional)
Signatures, provenance, and annotations:
Alice
2025-11-07T10:00:00Z
ed25519
base64-encoded-sig
2025-11-07T10:00:00Z
Bob
Initial creation
abc123
Note about this element
Inline Semantic Elements
Available in all profiles:
<em>- Emphasis<strong>- Strong importance<ref target="id" type="cross">- Cross-reference<term>- Defined term<abbr>- Abbreviation<date when="2025-11-07">- Date/time reference<currency code="USD" value="100.00">- Currency amount<snip reason="redacted">- Elided content
Validation
XSD schemas are provided for strict validation:
use validate_document;
let doc = parse_cml?;
validate_document?; // Validates against schema
Schemas:
schemas/cml-core-0.1.xsd- Core CML structureschemas/profiles/code-api-1.0.xsd- Code profileschemas/profiles/legal-constitution-1.0.xsd- Legal profileschemas/profiles/bookstack-wiki-0.1.xsd- Bookstack profile
Byte Punch Compression
CML integrates with Byte Punch for profile-aware compression:
use ;
// Load profile dictionary
let dict = from_file?;
let compressor = new;
// Compress
let compressed = compressor.compress?;
// Decompress
let decompressed = compressor.decompress?;
assert_eq!; // 100% fidelity
Compression Results:
- Legal documents: ~65% compression
- Code documentation: ~50-60% compression
- Wiki content: ~55% compression
Testing
# Run all tests
# Run with output
# Run specific test
Test Coverage:
- 42/42 tests passing ✅
- Unit tests for parser, generator, schema
- Integration tests for round-trip fidelity
- Profile-specific tests for each supported profile
Development
Project Structure
crates/sam-cml/
├── src/
│ ├── lib.rs # Public API
│ ├── types.rs # CML document types
│ ├── parser.rs # XML → Rust parsing
│ ├── generator.rs # Rust → XML generation
│ └── schema.rs # Validation logic
├── tests/
│ ├── integration_test.rs # Integration tests
│ ├── v01_tests.rs # CML v0.1 tests
│ └── v01_roundtrip_tests.rs # Round-trip tests
└── Cargo.toml
Adding a New Profile
- Define the profile in
types.rs:
- Create XSD schema:
Create schemas/profiles/my-profile-1.0.xsd following the pattern of existing schemas.
- Add parser support:
Update parser.rs to handle your profile's elements.
- Add generator support:
Update generator.rs to output your profile's XML.
- Add tests:
Create tests in tests/ directory.
- Create dictionary:
Add crates/byte-punch/dictionaries/my-profile.json for compression.
- Create example:
Add examples/cml/my-profile-example.cml.
Migration from Legacy Format
Old <document> format is deprecated but still supported:
<!-- OLD (deprecated) -->
...
...
<!-- NEW (CML v0.1) -->
...
...
<!-- Profile-specific content -->
Parser auto-detects and upgrades legacy format internally.
Related Projects
- byte-punch - Profile-aware compression (sister crate)
- sam-engram - Engram packaging (coming soon)
- rustdoc-to-cml - Generate CML from Rust docs
Documentation
- MASTER_PLAN.md - Complete implementation plan
- STATUS.md - Current project status
- XSD Schemas - Validation schemas
- Examples - Example documents
License
MIT OR Apache-2.0