Data Modelling SDK
Shared SDK for model operations across platforms (API, WASM, Native).
Copyright (c) 2025 Mark Olliver - Licensed under MIT
CLI Tool
The SDK includes a command-line interface (CLI) for importing and exporting schemas. See CLI.md for detailed usage instructions.
Quick Start:
# Build the CLI (with OpenAPI and ODPS validation support)
# Run it
Note: The CLI now includes OpenAPI support by default in GitHub releases. For local builds, include the openapi feature to enable OpenAPI import/export. Include odps-validation to enable ODPS schema validation.
ODPS Import/Export Examples:
# Import ODPS YAML file
# Export ODCS to ODPS format
# Test ODPS round-trip (requires odps-validation feature)
Features
- Storage Backends: File system, browser storage (IndexedDB/localStorage), and HTTP API
- Database Backends: DuckDB (embedded) and PostgreSQL for high-performance queries
- Model Loading/Saving: Load and save models from various storage backends
- Import/Export: Import from SQL (PostgreSQL, MySQL, SQLite, Generic, Databricks), ODCS, ODCL, JSON Schema, AVRO, Protobuf (proto2/proto3), CADS, ODPS, BPMN, DMN, OpenAPI; Export to various formats
- Decision Records (DDL): MADR-compliant Architecture Decision Records with full lifecycle management
- Knowledge Base (KB): Domain-partitioned knowledge articles with Markdown content support
- Business Domain Schema: Organize systems, CADS nodes, and ODCS nodes within business domains
- Universal Converter: Convert any format to ODCS v3.1.0 format
- OpenAPI to ODCS Converter: Convert OpenAPI schema components to ODCS table definitions
- Validation: Table and relationship validation (naming conflicts, circular dependencies)
- Relationship Modeling: Crow's feet notation cardinality (zeroOrOne, exactlyOne, zeroOrMany, oneOrMany) and data flow directions
- Schema Reference: JSON Schema definitions for all supported formats in
schemas/directory - Database Sync: Bidirectional sync between YAML files and database with change detection
- Git Hooks: Automatic pre-commit and post-checkout hooks for database synchronization
Decision Records (DDL)
The SDK includes full support for Architecture Decision Records following the MADR (Markdown Any Decision Records) format. Decisions are stored as YAML files and can be exported to Markdown for documentation.
Decision File Structure
workspace/
├── decisions/
│ ├── index.yaml # Decision index with metadata
│ ├── 0001-use-postgresql-database.yaml # Individual decision records
│ ├── 0002-adopt-microservices.yaml
│ └── ...
└── decisions-md/ # Markdown exports (auto-generated)
├── 0001-use-postgresql-database.md
└── 0002-adopt-microservices.md
Decision Lifecycle
Decisions follow a defined lifecycle with these statuses:
- Draft: Initial proposal, open for discussion
- Proposed: Formal proposal awaiting decision
- Accepted: Approved and in effect
- Deprecated: No longer recommended but still valid
- Superseded: Replaced by a newer decision
- Rejected: Not approved
Decision Categories
- Architecture: System design and structure decisions
- Technology: Technology stack and tool choices
- Process: Development workflow decisions
- Security: Security-related decisions
- Data: Data modeling and storage decisions
- Integration: External system integration decisions
CLI Commands
# Create a new decision
# List all decisions
# Show a specific decision
# Filter by status or category
# Export decisions to Markdown
Knowledge Base (KB)
The SDK provides a Knowledge Base system for storing domain knowledge, guides, and documentation as structured articles.
Knowledge Base File Structure
workspace/
├── knowledge/
│ ├── index.yaml # Knowledge index with metadata
│ ├── 0001-api-authentication-guide.yaml # Individual knowledge articles
│ ├── 0002-deployment-procedures.yaml
│ └── ...
└── knowledge-md/ # Markdown exports (auto-generated)
├── 0001-api-authentication-guide.md
└── 0002-deployment-procedures.md
Article Types
- Guide: Step-by-step instructions and tutorials
- Reference: API documentation and technical references
- Concept: Explanations of concepts and principles
- Tutorial: Learning-focused content with examples
- Troubleshooting: Problem-solving guides
- Runbook: Operational procedures
Article Status
- Draft: Work in progress
- Review: Ready for peer review
- Published: Approved and available
- Archived: No longer actively maintained
- Deprecated: Outdated, pending replacement
CLI Commands
# Create a new knowledge article
# List all articles
# Show a specific article
# Filter by type, status, or domain
# Search article content
# Export articles to Markdown
File Structure
The SDK organizes files using a flat file naming convention within a workspace:
workspace/
├── .git/ # Git folder (if present)
├── README.md # Repository files
├── workspace.yaml # Workspace metadata with assets and relationships
├── myworkspace_sales_customers.odcs.yaml # ODCS table: workspace_domain_resource.type.yaml
├── myworkspace_sales_orders.odcs.yaml # Another ODCS table in sales domain
├── myworkspace_sales_crm_leads.odcs.yaml # ODCS table with system: workspace_domain_system_resource.type.yaml
├── myworkspace_analytics_metrics.odps.yaml # ODPS product file
├── myworkspace_platform_api.cads.yaml # CADS asset file
├── myworkspace_platform_api.openapi.yaml # OpenAPI specification file
├── myworkspace_ops_approval.bpmn.xml # BPMN process model file
└── myworkspace_ops_routing.dmn.xml # DMN decision model file
File Naming Convention
Files follow the pattern: {workspace}_{domain}_{system}_{resource}.{type}.{ext}
- workspace: The workspace name (required)
- domain: The business domain (required)
- system: The system within the domain (optional)
- resource: The resource/asset name (required)
- type: The asset type (
odcs,odps,cads,openapi,bpmn,dmn) - ext: File extension (
yaml,xml,json)
Workspace-Level Files
workspace.yaml: Workspace metadata including domains, systems, asset references, and relationships
Asset Types
*.odcs.yaml: ODCS table/schema definitions (Open Data Contract Standard)*.odps.yaml: ODPS data product definitions (Open Data Product Standard)*.cads.yaml: CADS asset definitions (architecture assets)*.openapi.yaml/*.openapi.json: OpenAPI specification files*.bpmn.xml: BPMN 2.0 process model files*.dmn.xml: DMN 1.3 decision model files
Usage
File System Backend (Native Apps)
use FileSystemStorageBackend;
use ModelLoader;
let storage = new;
let loader = new;
let result = loader.load_model.await?;
Browser Storage Backend (WASM Apps)
use BrowserStorageBackend;
use ModelLoader;
let storage = new;
let loader = new;
let result = loader.load_model.await?;
API Backend (Online Mode)
use ApiStorageBackend;
use ModelLoader;
let storage = new;
let loader = new;
let result = loader.load_model.await?;
WASM Bindings (Browser/Offline Mode)
The SDK exposes WASM bindings for parsing and export operations, enabling offline functionality in web applications.
Build the WASM module:
Use in JavaScript/TypeScript:
import init from './pkg/data_modelling_sdk.js';
// Initialize the module
await ;
// Parse ODCS YAML
const yaml = `apiVersion: v3.1.0
kind: DataContract
name: users
schema:
fields:
- name: id
type: bigint`;
const resultJson = ;
const result = JSON.;
console.log;
// Export to ODCS YAML
const workspace = ;
const exportedYaml = ;
console.log;
Available WASM Functions:
Import/Export:
parseOdcsYaml(yamlContent: string): string- Parse ODCS YAML to workspace structureexportToOdcsYaml(workspaceJson: string): string- Export workspace to ODCS YAMLimportFromSql(sqlContent: string, dialect: string): string- Import from SQL (supported dialects: "postgres"/"postgresql", "mysql", "sqlite", "generic", "databricks")importFromAvro(avroContent: string): string- Import from AVRO schemaimportFromJsonSchema(jsonSchemaContent: string): string- Import from JSON SchemaimportFromProtobuf(protobufContent: string): string- Import from ProtobufimportFromCads(yamlContent: string): string- Import CADS (Compute Asset Description Specification) YAMLimportFromOdps(yamlContent: string): string- Import ODPS (Open Data Product Standard) YAMLexportToOdps(productJson: string): string- Export ODPS data product to YAML formatvalidateOdps(yamlContent: string): void- Validate ODPS YAML content against ODPS JSON Schema (requiresodps-validationfeature)importBpmnModel(domainId: string, xmlContent: string, modelName?: string): string- Import BPMN 2.0 XML modelimportDmnModel(domainId: string, xmlContent: string, modelName?: string): string- Import DMN 1.3 XML modelimportOpenapiSpec(domainId: string, content: string, apiName?: string): string- Import OpenAPI 3.1.1 specificationexportToSql(workspaceJson: string, dialect: string): string- Export to SQL (supported dialects: "postgres"/"postgresql", "mysql", "sqlite", "generic", "databricks")exportToAvro(workspaceJson: string): string- Export to AVRO schemaexportToJsonSchema(workspaceJson: string): string- Export to JSON SchemaexportToProtobuf(workspaceJson: string): string- Export to ProtobufexportToCads(workspaceJson: string): string- Export to CADS YAMLexportToOdps(workspaceJson: string): string- Export to ODPS YAMLexportBpmnModel(xmlContent: string): string- Export BPMN model to XMLexportDmnModel(xmlContent: string): string- Export DMN model to XMLexportOpenapiSpec(content: string, sourceFormat: string, targetFormat?: string): string- Export OpenAPI spec with optional format conversionconvertToOdcs(input: string, format?: string): string- Universal converter: convert any format to ODCS v3.1.0convertOpenapiToOdcs(openapiContent: string, componentName: string, tableName?: string): string- Convert OpenAPI schema component to ODCS tableanalyzeOpenapiConversion(openapiContent: string, componentName: string): string- Analyze OpenAPI component conversion feasibilitymigrateDataflowToDomain(dataflowYaml: string, domainName?: string): string- Migrate DataFlow YAML to Domain schema format
Domain Operations:
createDomain(name: string): string- Create a new business domainaddSystemToDomain(workspaceJson: string, domainId: string, systemJson: string): string- Add a system to a domainaddCadsNodeToDomain(workspaceJson: string, domainId: string, nodeJson: string): string- Add a CADS node to a domainaddOdcsNodeToDomain(workspaceJson: string, domainId: string, nodeJson: string): string- Add an ODCS node to a domain
Filtering:
filterNodesByOwner(workspaceJson: string, owner: string): string- Filter tables by ownerfilterRelationshipsByOwner(workspaceJson: string, owner: string): string- Filter relationships by ownerfilterNodesByInfrastructureType(workspaceJson: string, infrastructureType: string): string- Filter tables by infrastructure typefilterRelationshipsByInfrastructureType(workspaceJson: string, infrastructureType: string): string- Filter relationships by infrastructure typefilterByTags(workspaceJson: string, tag: string): string- Filter nodes and relationships by tag (supports Simple, Pair, and List tag formats)
Database Support
The SDK includes an optional database layer for high-performance queries on large workspaces (10-100x faster than file-based operations).
Database Backends
- DuckDB: Embedded analytical database, ideal for CLI tools and local development
- PostgreSQL: Server-based database for team environments and shared access
Quick Start
# Build CLI with database support
# Initialize database for a workspace
# Sync YAML files to database
# Query the database
Configuration
Database settings are stored in .data-model.toml:
[]
= "duckdb"
= ".data-model.duckdb"
[]
= true
[]
= true
Git Hooks Integration
When initializing a database in a Git repository, the CLI automatically installs:
- Pre-commit hook: Exports database changes to YAML before commit
- Post-checkout hook: Syncs YAML files to database after checkout
This ensures YAML files and database stay in sync across branches and collaborators.
See CLI.md for detailed database command documentation.
Development
Pre-commit Hooks
This project uses pre-commit hooks to ensure code quality. Install them with:
# Install pre-commit (if not already installed)
# Install the git hooks
# Run hooks manually on all files
The hooks will automatically run on git commit and check:
- Rust formatting (
cargo fmt) - Rust linting (
cargo clippy) - Security audit (
cargo audit) - File formatting (trailing whitespace, end of file, etc.)
- YAML/TOML/JSON syntax
CI/CD
GitHub Actions workflows automatically run on push and pull requests:
- Lint: Format check, clippy, and security audit
- Test: Unit and integration tests on Linux, macOS, and Windows
- Build: Release build verification
- Publish: Automatic publishing to crates.io on main branch (after all checks pass)
Documentation
- Architecture Guide: Comprehensive guide to project architecture, design decisions, and use cases
- Schema Overview Guide: Detailed documentation of all supported schemas
The SDK supports:
- ODCS v3.1.0: Primary format for data contracts (tables)
- ODCL v1.2.1: Legacy data contract format (backward compatibility)
- ODPS: Data products linking to ODCS Tables
- CADS v1.0: Compute assets (AI/ML models, applications, pipelines)
- BPMN 2.0: Business Process Model and Notation (process models stored in native XML)
- DMN 1.3: Decision Model and Notation (decision models stored in native XML)
- OpenAPI 3.1.1: API specifications (stored in native YAML or JSON)
- Business Domain Schema: Organize systems, CADS nodes, and ODCS nodes
- Universal Converter: Convert any format to ODCS v3.1.0
- OpenAPI to ODCS Converter: Convert OpenAPI schema components to ODCS table definitions
Schema Reference Directory
The SDK maintains JSON Schema definitions for all supported formats in the schemas/ directory:
- ODCS v3.1.0:
schemas/odcs-json-schema-v3.1.0.json- Primary format for data contracts - ODCL v1.2.1:
schemas/odcl-json-schema-1.2.1.json- Legacy data contract format - ODPS:
schemas/odps-json-schema-latest.json- Data products linking to ODCS tables - CADS v1.0:
schemas/cads.schema.json- Compute assets (AI/ML models, applications, pipelines)
These schemas serve as authoritative references for validation, documentation, and compliance. See schemas/README.md for detailed information about each schema.
Data Pipeline
The SDK includes a complete data pipeline for ingesting JSON data, inferring schemas, and mapping to target formats.
Pipeline Features
- JSON Ingestion: Ingest JSON/JSONL files into a staging database with deduplication
- S3 Ingestion: Ingest directly from AWS S3 buckets with streaming downloads (feature:
s3) - Databricks Volumes: Ingest from Databricks Unity Catalog Volumes (feature:
databricks) - Progress Reporting: Real-time progress bars with throughput metrics
- Schema Inference: Automatically infer types, formats, and nullability from data
- LLM Refinement: Optionally enhance schemas using Ollama or local LLM models
- Schema Mapping: Map inferred schemas to target schemas with transformation generation
- Checkpointing: Resume pipelines from the last successful stage
- Secure Credentials: Credential wrapper types preventing accidental logging
Quick Start
# Build with pipeline support
# Initialize staging database
# Run full pipeline
# Check pipeline status
Schema Mapping
Map source schemas to target schemas with fuzzy matching and transformation script generation:
# Map schemas with fuzzy matching
# Generate SQL transformation
# Generate Python transformation
See CLI.md for detailed pipeline and mapping documentation.
Status
The SDK provides comprehensive support for multiple data modeling formats:
- ✅ Storage backend abstraction and implementations
- ✅ Database backend abstraction (DuckDB, PostgreSQL)
- ✅ Model loader/saver structure
- ✅ Full import/export implementation for all supported formats
- ✅ Validation module structure
- ✅ Business Domain schema support
- ✅ Universal format converter
- ✅ Enhanced tag support (Simple, Pair, List)
- ✅ Full ODCS/ODCL field preservation
- ✅ Schema reference directory (
schemas/) with JSON Schema definitions for all supported formats - ✅ Bidirectional YAML ↔ Database sync with change detection
- ✅ Git hooks for automatic synchronization
- ✅ Decision Records (DDL) with MADR format support
- ✅ Knowledge Base (KB) with domain partitioning
- ✅ Data Pipeline with staging, inference, and mapping
- ✅ Schema Mapping with fuzzy matching and transformation generation
- ✅ LLM-enhanced schema refinement (Ollama and local models)
- ✅ S3 ingestion with AWS SDK for Rust
- ✅ Databricks Unity Catalog Volumes ingestion
- ✅ Real-time progress reporting with indicatif
- ✅ Secure credential handling with automatic redaction