# Data Modelling SDK
Shared SDK for model operations across platforms (API, WASM, Native).
Copyright (c) 2025 Mark Olliver - Licensed under MIT
## CLI Tool
The SDK includes a command-line interface (CLI) for importing and exporting schemas. See [CLI.md](CLI.md) for detailed usage instructions.
**Quick Start:**
```bash
# Build the CLI (with OpenAPI and ODPS validation support)
cargo build --release --bin data-modelling-cli --features cli,openapi,odps-validation
# Run it
./target/release/data-modelling-cli --help
```
**Note:** The CLI now includes OpenAPI support by default in GitHub releases. For local builds, include the `openapi` feature to enable OpenAPI import/export. Include `odps-validation` to enable ODPS schema validation.
**ODPS Import/Export Examples:**
```bash
# Import ODPS YAML file
data-modelling-cli import odps product.odps.yaml
# Export ODCS to ODPS format
data-modelling-cli export odps input.odcs.yaml output.odps.yaml
# Test ODPS round-trip (requires odps-validation feature)
cargo run --bin test-odps --features odps-validation,cli -- product.odps.yaml --verbose
```
## Features
- **Storage Backends**: File system, browser storage (IndexedDB/localStorage), and HTTP API
- **Database Backends**: DuckDB (embedded) and PostgreSQL for high-performance queries
- **Model Loading/Saving**: Load and save models from various storage backends
- **Import/Export**: Import from SQL (PostgreSQL, MySQL, SQLite, Generic, Databricks), ODCS, ODCL, JSON Schema, AVRO, Protobuf (proto2/proto3), CADS, ODPS, BPMN, DMN, OpenAPI; Export to various formats
- **Business Domain Schema**: Organize systems, CADS nodes, and ODCS nodes within business domains
- **Universal Converter**: Convert any format to ODCS v3.1.0 format
- **OpenAPI to ODCS Converter**: Convert OpenAPI schema components to ODCS table definitions
- **Validation**: Table and relationship validation (naming conflicts, circular dependencies)
- **Schema Reference**: JSON Schema definitions for all supported formats in `schemas/` directory
- **Database Sync**: Bidirectional sync between YAML files and database with change detection
- **Git Hooks**: Automatic pre-commit and post-checkout hooks for database synchronization
## File Structure
The SDK organizes files using a flat file naming convention within a workspace:
```
workspace/
├── .git/ # Git folder (if present)
├── README.md # Repository files
├── workspace.yaml # Workspace metadata with assets and relationships
├── myworkspace_sales_customers.odcs.yaml # ODCS table: workspace_domain_resource.type.yaml
├── myworkspace_sales_orders.odcs.yaml # Another ODCS table in sales domain
├── myworkspace_sales_crm_leads.odcs.yaml # ODCS table with system: workspace_domain_system_resource.type.yaml
├── myworkspace_analytics_metrics.odps.yaml # ODPS product file
├── myworkspace_platform_api.cads.yaml # CADS asset file
├── myworkspace_platform_api.openapi.yaml # OpenAPI specification file
├── myworkspace_ops_approval.bpmn.xml # BPMN process model file
└── myworkspace_ops_routing.dmn.xml # DMN decision model file
```
### File Naming Convention
Files follow the pattern: `{workspace}_{domain}_{system}_{resource}.{type}.{ext}`
- **workspace**: The workspace name (required)
- **domain**: The business domain (required)
- **system**: The system within the domain (optional)
- **resource**: The resource/asset name (required)
- **type**: The asset type (`odcs`, `odps`, `cads`, `openapi`, `bpmn`, `dmn`)
- **ext**: File extension (`yaml`, `xml`, `json`)
### Workspace-Level Files
- `workspace.yaml`: Workspace metadata including domains, systems, asset references, and relationships
### Asset Types
- `*.odcs.yaml`: ODCS table/schema definitions (Open Data Contract Standard)
- `*.odps.yaml`: ODPS data product definitions (Open Data Product Standard)
- `*.cads.yaml`: CADS asset definitions (architecture assets)
- `*.openapi.yaml` / `*.openapi.json`: OpenAPI specification files
- `*.bpmn.xml`: BPMN 2.0 process model files
- `*.dmn.xml`: DMN 1.3 decision model files
## Usage
### File System Backend (Native Apps)
```rust
use data_modelling_sdk::storage::filesystem::FileSystemStorageBackend;
use data_modelling_sdk::model::ModelLoader;
let storage = FileSystemStorageBackend::new("/path/to/workspace");
let loader = ModelLoader::new(storage);
let result = loader.load_model("workspace_path").await?;
```
### Browser Storage Backend (WASM Apps)
```rust
use data_modelling_sdk::storage::browser::BrowserStorageBackend;
use data_modelling_sdk::model::ModelLoader;
let storage = BrowserStorageBackend::new("db_name", "store_name");
let loader = ModelLoader::new(storage);
let result = loader.load_model("workspace_path").await?;
```
### API Backend (Online Mode)
```rust
use data_modelling_sdk::storage::api::ApiStorageBackend;
use data_modelling_sdk::model::ModelLoader;
let storage = ApiStorageBackend::new("http://localhost:8081/api/v1", Some("session_id"));
let loader = ModelLoader::new(storage);
let result = loader.load_model("workspace_path").await?;
```
### WASM Bindings (Browser/Offline Mode)
The SDK exposes WASM bindings for parsing and export operations, enabling offline functionality in web applications.
**Build the WASM module**:
```bash
wasm-pack build --target web --out-dir pkg --features wasm
```
**Use in JavaScript/TypeScript**:
```javascript
import init, { parseOdcsYaml, exportToOdcsYaml } from './pkg/data_modelling_sdk.js';
// Initialize the module
await init();
// Parse ODCS YAML
const yaml = `apiVersion: v3.1.0
kind: DataContract
name: users
schema:
fields:
- name: id
type: bigint`;
const resultJson = parseOdcsYaml(yaml);
const result = JSON.parse(resultJson);
console.log('Parsed tables:', result.tables);
// Export to ODCS YAML
const workspace = {
tables: [{
id: "550e8400-e29b-41d4-a716-446655440000",
name: "users",
columns: [{ name: "id", data_type: "bigint", nullable: false, primary_key: true }]
}],
relationships: []
};
const exportedYaml = exportToOdcsYaml(JSON.stringify(workspace));
console.log('Exported YAML:', exportedYaml);
```
**Available WASM Functions**:
**Import/Export**:
- `parseOdcsYaml(yamlContent: string): string` - Parse ODCS YAML to workspace structure
- `exportToOdcsYaml(workspaceJson: string): string` - Export workspace to ODCS YAML
- `importFromSql(sqlContent: string, dialect: string): string` - Import from SQL (supported dialects: "postgres"/"postgresql", "mysql", "sqlite", "generic", "databricks")
- `importFromAvro(avroContent: string): string` - Import from AVRO schema
- `importFromJsonSchema(jsonSchemaContent: string): string` - Import from JSON Schema
- `importFromProtobuf(protobufContent: string): string` - Import from Protobuf
- `importFromCads(yamlContent: string): string` - Import CADS (Compute Asset Description Specification) YAML
- `importFromOdps(yamlContent: string): string` - Import ODPS (Open Data Product Standard) YAML
- `exportToOdps(productJson: string): string` - Export ODPS data product to YAML format
- `validateOdps(yamlContent: string): void` - Validate ODPS YAML content against ODPS JSON Schema (requires `odps-validation` feature)
- `importBpmnModel(domainId: string, xmlContent: string, modelName?: string): string` - Import BPMN 2.0 XML model
- `importDmnModel(domainId: string, xmlContent: string, modelName?: string): string` - Import DMN 1.3 XML model
- `importOpenapiSpec(domainId: string, content: string, apiName?: string): string` - Import OpenAPI 3.1.1 specification
- `exportToSql(workspaceJson: string, dialect: string): string` - Export to SQL (supported dialects: "postgres"/"postgresql", "mysql", "sqlite", "generic", "databricks")
- `exportToAvro(workspaceJson: string): string` - Export to AVRO schema
- `exportToJsonSchema(workspaceJson: string): string` - Export to JSON Schema
- `exportToProtobuf(workspaceJson: string): string` - Export to Protobuf
- `exportToCads(workspaceJson: string): string` - Export to CADS YAML
- `exportToOdps(workspaceJson: string): string` - Export to ODPS YAML
- `exportBpmnModel(xmlContent: string): string` - Export BPMN model to XML
- `exportDmnModel(xmlContent: string): string` - Export DMN model to XML
- `exportOpenapiSpec(content: string, sourceFormat: string, targetFormat?: string): string` - Export OpenAPI spec with optional format conversion
- `convertToOdcs(input: string, format?: string): string` - Universal converter: convert any format to ODCS v3.1.0
- `convertOpenapiToOdcs(openapiContent: string, componentName: string, tableName?: string): string` - Convert OpenAPI schema component to ODCS table
- `analyzeOpenapiConversion(openapiContent: string, componentName: string): string` - Analyze OpenAPI component conversion feasibility
- `migrateDataflowToDomain(dataflowYaml: string, domainName?: string): string` - Migrate DataFlow YAML to Domain schema format
**Domain Operations**:
- `createDomain(name: string): string` - Create a new business domain
- `addSystemToDomain(workspaceJson: string, domainId: string, systemJson: string): string` - Add a system to a domain
- `addCadsNodeToDomain(workspaceJson: string, domainId: string, nodeJson: string): string` - Add a CADS node to a domain
- `addOdcsNodeToDomain(workspaceJson: string, domainId: string, nodeJson: string): string` - Add an ODCS node to a domain
**Filtering**:
- `filterNodesByOwner(workspaceJson: string, owner: string): string` - Filter tables by owner
- `filterRelationshipsByOwner(workspaceJson: string, owner: string): string` - Filter relationships by owner
- `filterNodesByInfrastructureType(workspaceJson: string, infrastructureType: string): string` - Filter tables by infrastructure type
- `filterRelationshipsByInfrastructureType(workspaceJson: string, infrastructureType: string): string` - Filter relationships by infrastructure type
- `filterByTags(workspaceJson: string, tag: string): string` - Filter nodes and relationships by tag (supports Simple, Pair, and List tag formats)
## Database Support
The SDK includes an optional database layer for high-performance queries on large workspaces (10-100x faster than file-based operations).
### Database Backends
- **DuckDB**: Embedded analytical database, ideal for CLI tools and local development
- **PostgreSQL**: Server-based database for team environments and shared access
### Quick Start
```bash
# Build CLI with database support
cargo build --release --bin data-modelling-cli --features cli-full
# Initialize database for a workspace
./target/release/data-modelling-cli db init --workspace ./my-workspace
# Sync YAML files to database
./target/release/data-modelling-cli db sync --workspace ./my-workspace
# Query the database
./target/release/data-modelling-cli query "SELECT name FROM tables" --workspace ./my-workspace
```
### Configuration
Database settings are stored in `.data-model.toml`:
```toml
[database]
backend = "duckdb"
path = ".data-model.duckdb"
[sync]
auto_sync = true
[git]
hooks_enabled = true
```
### Git Hooks Integration
When initializing a database in a Git repository, the CLI automatically installs:
- **Pre-commit hook**: Exports database changes to YAML before commit
- **Post-checkout hook**: Syncs YAML files to database after checkout
This ensures YAML files and database stay in sync across branches and collaborators.
See [CLI.md](docs/CLI.md) for detailed database command documentation.
## Development
### Pre-commit Hooks
This project uses pre-commit hooks to ensure code quality. Install them with:
```bash
# Install pre-commit (if not already installed)
pip install pre-commit
# Install the git hooks
pre-commit install
# Run hooks manually on all files
pre-commit run --all-files
```
The hooks will automatically run on `git commit` and check:
- Rust formatting (`cargo fmt`)
- Rust linting (`cargo clippy`)
- Security audit (`cargo audit`)
- File formatting (trailing whitespace, end of file, etc.)
- YAML/TOML/JSON syntax
### CI/CD
GitHub Actions workflows automatically run on push and pull requests:
- **Lint**: Format check, clippy, and security audit
- **Test**: Unit and integration tests on Linux, macOS, and Windows
- **Build**: Release build verification
- **Publish**: Automatic publishing to crates.io on main branch (after all checks pass)
## Documentation
- **[Architecture Guide](docs/ARCHITECTURE.md)**: Comprehensive guide to project architecture, design decisions, and use cases
- **[Schema Overview Guide](docs/SCHEMA_OVERVIEW.md)**: Detailed documentation of all supported schemas
The SDK supports:
- **ODCS v3.1.0**: Primary format for data contracts (tables)
- **ODCL v1.2.1**: Legacy data contract format (backward compatibility)
- **ODPS**: Data products linking to ODCS Tables
- **CADS v1.0**: Compute assets (AI/ML models, applications, pipelines)
- **BPMN 2.0**: Business Process Model and Notation (process models stored in native XML)
- **DMN 1.3**: Decision Model and Notation (decision models stored in native XML)
- **OpenAPI 3.1.1**: API specifications (stored in native YAML or JSON)
- **Business Domain Schema**: Organize systems, CADS nodes, and ODCS nodes
- **Universal Converter**: Convert any format to ODCS v3.1.0
- **OpenAPI to ODCS Converter**: Convert OpenAPI schema components to ODCS table definitions
### Schema Reference Directory
The SDK maintains JSON Schema definitions for all supported formats in the `schemas/` directory:
- **ODCS v3.1.0**: `schemas/odcs-json-schema-v3.1.0.json` - Primary format for data contracts
- **ODCL v1.2.1**: `schemas/odcl-json-schema-1.2.1.json` - Legacy data contract format
- **ODPS**: `schemas/odps-json-schema-latest.json` - Data products linking to ODCS tables
- **CADS v1.0**: `schemas/cads.schema.json` - Compute assets (AI/ML models, applications, pipelines)
These schemas serve as authoritative references for validation, documentation, and compliance. See [schemas/README.md](schemas/README.md) for detailed information about each schema.
## Status
The SDK provides comprehensive support for multiple data modeling formats:
- ✅ Storage backend abstraction and implementations
- ✅ Database backend abstraction (DuckDB, PostgreSQL)
- ✅ Model loader/saver structure
- ✅ Full import/export implementation for all supported formats
- ✅ Validation module structure
- ✅ Business Domain schema support
- ✅ Universal format converter
- ✅ Enhanced tag support (Simple, Pair, List)
- ✅ Full ODCS/ODCL field preservation
- ✅ Schema reference directory (`schemas/`) with JSON Schema definitions for all supported formats
- ✅ Bidirectional YAML ↔ Database sync with change detection
- ✅ Git hooks for automatic synchronization