# CML - Content Markup Language (Rust Implementation)
**CML (Content Markup Language)** is an XML-based markup language with profile-based extensibility for representing structured knowledge. This is the reference Rust implementation of CML v0.1, providing parsing, generation, and embedding storage for structured documents.
## Overview
CML v0.1 provides:
- ✅ **Standardized structure** - `<cml>/<header>/<body>/<footer>` for all documents
- 📋 **Profile system** - Domain-specific vocabularies (code, legal, wiki, etc.)
- 🗜️ **Byte Punch compression** - 40-70% size reduction with profile-aware dictionaries
- 🔍 **Semantic search** - Vector keywords and full-text indexing
- 📝 **XSD schemas** - Strict validation for all profiles
- 🔄 **Round-trip fidelity** - Parse → Generate → Parse yields identical results
## Quick Start
```rust
use cml::{CmlParser, CmlGenerator, CmlDocument, Profile, CodeBody};
// Parse a CML document
let xml = std::fs::read_to_string("example.cml")?;
let doc = CmlParser::parse_cml(&xml)?;
// Generate CML
let generator = CmlGenerator;
let output = generator.generate_cml(&doc)?;
// Create a new document
let doc = CmlDocument {
version: "0.1".to_string(),
encoding: "utf-8".to_string(),
profile: Profile::Code,
header: Header {
title: "My API Docs".to_string(),
// ...
},
body: Body::Code(CodeBody { /* ... */ }),
footer: Footer::default(),
};
```
## Profiles
### code:api (v1.0 - Ratified)
For API documentation with semantic search.
**Namespace:** `https://schemas.continuity.org/profiles/code/1.0`
**Elements:**
- `<module>` - Code modules/packages
- `<struct>` - Data structures
- `<enum>` - Enumerations
- `<trait>` - Traits/interfaces
- `<function>` - Free functions
- `<method>` - Methods on types
- `<field>` - Struct/enum fields
**Example:**
```xml
<cml version="0.1" encoding="utf-8" profile="code:api"
xmlns="https://schemas.continuity.org/cml/0.1"
xmlns:code="https://schemas.continuity.org/profiles/code/1.0">
<header>
<title>Rust Standard Library: Vec<T></title>
<identifier scheme="continuity">std.collections.vec</identifier>
</header>
<body>
<code:struct id="std.vec.Vec" name="Vec" generic="T">
<code:description vector="vector array dynamic">
A contiguous growable array type.
</code:description>
<code:method id="std.vec.Vec.push" name="push">
<code:signature>pub fn push(&mut self, value: T)</code:signature>
<code:description vector="append add">
Appends an element to the back.
</code:description>
<code:complexity>amortized O(1)</code:complexity>
</code:method>
</code:struct>
</body>
</cml>
```
See [examples/cml/code-api-example.cml](../../examples/cml/code-api-example.cml) for full example.
### legal:constitution (v1.0 - Ratified)
For constitutional and statutory documents.
**Namespace:** `https://schemas.continuity.org/profiles/legal/1.0`
**Elements:**
- `<preamble>` - Document preamble
- `<article>` - Top-level articles
- `<section>` - Sections within articles
- `<clause>` - Individual clauses
- `<paragraph>` - Subdivisions
- `<amendment>` - Amendments to the document
**Example:**
```xml
<cml version="0.1" encoding="utf-8" profile="legal:constitution"
xmlns="https://schemas.continuity.org/cml/0.1"
xmlns:legal="https://schemas.continuity.org/profiles/legal/1.0">
<header>
<title>Constitution of the United States</title>
<identifier scheme="continuity">us.federal.constitution</identifier>
</header>
<body>
<legal:preamble>
We the People of the United States...
</legal:preamble>
<legal:article num="I" title="Legislative Branch" id="article-1">
<legal:section num="1" id="article-1-section-1">
<legal:clause num="1" id="article-1-section-1-clause-1">
All legislative Powers herein granted...
</legal:clause>
</legal:section>
</legal:article>
</body>
</cml>
```
See [examples/cml/legal-constitution-example.cml](../../examples/cml/legal-constitution-example.cml) for full example.
### bookstack:wiki (v0.1 - Local Namespace)
For knowledge base / wiki content.
**Namespace:** `https://local.namespace/continuity/bookstack/0.1` (pending ratification)
**Elements:**
- `<book>` - Top-level book
- `<chapter>` - Chapters within books
- `<page>` - Individual pages
- `<shelf>` - Collections of books
- `<content>` - Page content (markdown/html/plain)
- `<tags>` - Metadata tags
**Example:**
```xml
<cml version="0.1" encoding="utf-8" profile="bookstack:wiki"
xmlns="https://schemas.continuity.org/cml/0.1"
xmlns:bookstack="https://local.namespace/continuity/bookstack/0.1">
<header>
<title>Engineering Documentation</title>
<identifier scheme="continuity">company.engineering.rust-guide</identifier>
</header>
<body>
<bookstack:book id="book-1" title="Rust Development Guide">
<bookstack:chapter id="ch-1" title="Getting Started" num="1">
<bookstack:page id="page-1" title="Setup">
<bookstack:content format="markdown"><![CDATA[
# Development Environment Setup
...
]]></bookstack:content>
<bookstack:tags>
<tag name="rust"/>
<tag name="setup"/>
</bookstack:tags>
</bookstack:page>
</bookstack:chapter>
</bookstack:book>
</body>
</cml>
```
See [examples/cml/bookstack-wiki-example.cml](../../examples/cml/bookstack-wiki-example.cml) for full example.
## CML Structure
### Root Element
All CML documents start with:
```xml
<cml version="0.1" encoding="utf-8" profile="namespace:profile">
```
Attributes:
- `version` - CML version (currently "0.1")
- `encoding` - Character encoding (always "utf-8")
- `profile` - Profile identifier (e.g., "code:api", "legal:constitution")
### Header Section
Required metadata about the document:
```xml
<header>
<title>Document Title</title>
<author role="author">Name</author>
<date type="created" when="2025-11-07"/>
<identifier scheme="continuity">unique.document.id</identifier>
<description>Optional summary</description>
<meta name="key" value="value"/>
<link rel="related" href="https://example.com"/>
</header>
```
### Body Section
Profile-specific content. Structure depends on the profile.
### Footer Section (Optional)
Signatures, provenance, and annotations:
```xml
<footer>
<signatures>
<signature>
<signer>Alice</signer>
<timestamp>2025-11-07T10:00:00Z</timestamp>
<algorithm>ed25519</algorithm>
<value>base64-encoded-sig</value>
</signature>
</signatures>
<provenance>
<change>
<timestamp>2025-11-07T10:00:00Z</timestamp>
<author>Bob</author>
<description>Initial creation</description>
<commit>abc123</commit>
</change>
</provenance>
<annotations>
<annotation author="Carol" target="element-id">
Note about this element
</annotation>
</annotations>
</footer>
```
## Inline Semantic Elements
Available in all profiles:
- `<em>` - Emphasis
- `<strong>` - Strong importance
- `<ref target="id" type="cross">` - Cross-reference
- `<term>` - Defined term
- `<abbr>` - Abbreviation
- `<date when="2025-11-07">` - Date/time reference
- `<currency code="USD" value="100.00">` - Currency amount
- `<snip reason="redacted">` - Elided content
## Validation
XSD schemas are provided for strict validation:
```rust
use sam_cml::validate_document;
let doc = CmlParser::parse_cml(&xml)?;
validate_document(&doc)?; // Validates against schema
```
**Schemas:**
- `schemas/cml-core-0.1.xsd` - Core CML structure
- `schemas/profiles/code-api-1.0.xsd` - Code profile
- `schemas/profiles/legal-constitution-1.0.xsd` - Legal profile
- `schemas/profiles/bookstack-wiki-0.1.xsd` - Bookstack profile
## Byte Punch Compression
CML integrates with Byte Punch for profile-aware compression:
```rust
use byte_punch::{Compressor, Dictionary};
// Load profile dictionary
let dict = Dictionary::from_file("dictionaries/code-api.json")?;
let compressor = Compressor::new(dict);
// Compress
let compressed = compressor.compress(&cml_xml)?;
// Decompress
let decompressed = compressor.decompress(&compressed)?;
assert_eq!(cml_xml, decompressed); // 100% fidelity
```
**Compression Results:**
- Legal documents: ~65% compression
- Code documentation: ~50-60% compression
- Wiki content: ~55% compression
## Testing
```bash
# Run all tests
cargo test -p sam-cml
# Run with output
cargo test -p sam-cml -- --nocapture
# Run specific test
cargo test -p sam-cml test_code_profile_roundtrip
```
**Test Coverage:**
- 42/42 tests passing ✅
- Unit tests for parser, generator, schema
- Integration tests for round-trip fidelity
- Profile-specific tests for each supported profile
## Development
### Project Structure
```
crates/sam-cml/
├── src/
│ ├── lib.rs # Public API
│ ├── types.rs # CML document types
│ ├── parser.rs # XML → Rust parsing
│ ├── generator.rs # Rust → XML generation
│ └── schema.rs # Validation logic
├── tests/
│ ├── integration_test.rs # Integration tests
│ ├── v01_tests.rs # CML v0.1 tests
│ └── v01_roundtrip_tests.rs # Round-trip tests
└── Cargo.toml
```
### Adding a New Profile
1. **Define the profile in `types.rs`:**
```rust
pub enum Profile {
Code,
Legal,
Bookstack,
MyProfile, // Add here
}
pub enum Body {
Code(CodeBody),
Legal(LegalBody),
Bookstack(BookstackBody),
MyProfile(MyProfileBody), // Add here
}
pub struct MyProfileBody {
// Your profile structure
}
```
2. **Create XSD schema:**
Create `schemas/profiles/my-profile-1.0.xsd` following the pattern of existing schemas.
3. **Add parser support:**
Update `parser.rs` to handle your profile's elements.
4. **Add generator support:**
Update `generator.rs` to output your profile's XML.
5. **Add tests:**
Create tests in `tests/` directory.
6. **Create dictionary:**
Add `crates/byte-punch/dictionaries/my-profile.json` for compression.
7. **Create example:**
Add `examples/cml/my-profile-example.cml`.
## Migration from Legacy Format
Old `<document>` format is deprecated but still supported:
```xml
<document id="..." version="1.0">
<metadata>
<title>...</title>
</metadata>
<section>...</section>
</document>
<cml version="0.1" encoding="utf-8" profile="code:api">
<header>
<title>...</title>
<identifier scheme="continuity">...</identifier>
</header>
<body>
</body>
</cml>
```
Parser auto-detects and upgrades legacy format internally.
## Related Projects
- **byte-punch** - Profile-aware compression (sister crate)
- **sam-engram** - Engram packaging (coming soon)
- **rustdoc-to-cml** - Generate CML from Rust docs
## Documentation
- [MASTER_PLAN.md](../../MASTER_PLAN.md) - Complete implementation plan
- [STATUS.md](../../STATUS.md) - Current project status
- [XSD Schemas](../../schemas/) - Validation schemas
- [Examples](../../examples/cml/) - Example documents
## License
MIT OR Apache-2.0