data-modelling-sdk 2.4.0

Shared SDK for model operations across platforms (API, WASM, Native)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
# Schema Overview Guide

This guide provides an overview of the different schemas supported by the Data Modelling SDK and how they are used.

## Table of Contents

1. [ODCS (Open Data Contract Standard)]#odcs-open-data-contract-standard
2. [ODCL (Open Data Contract Language)]#odcl-open-data-contract-language
3. [ODPS (Open Data Product Standard)]#odps-open-data-product-standard
4. [CADS (Compute Asset Description Specification)]#cads-compute-asset-description-specification
5. [Business Domain Schema]#business-domain-schema
6. [BPMN (Business Process Model and Notation)]#bpmn-business-process-model-and-notation
7. [DMN (Decision Model and Notation)]#dmn-decision-model-and-notation
8. [OpenAPI]#openapi
9. [DBMV (Databricks Metric Views)]#dbmv-databricks-metric-views
10. [Other Formats]#other-formats
11. [Universal Converter]#universal-converter
12. [OpenAPI to ODCS Converter]#openapi-to-odcs-converter

---

## ODCS (Open Data Contract Standard)

**Version**: v3.1.0
**Purpose**: Data Contracts (tables/schemas)
**Status**: Primary format for data models

### Overview

ODCS is the primary format for defining data contracts (tables). It provides comprehensive metadata about data structures, including:

- Schema definitions with properties/fields
- Quality rules and validation checks
- Service level agreements (SLAs)
- Tags and metadata
- References to external definitions

### Key Features

- **Full Schema Coverage**: Supports all ODCS v3.1.0 fields including `description`, `quality` arrays, and `$ref` references
- **Field Preservation**: All metadata is preserved during import/export operations
- **Enhanced Tags**: Supports Simple, Pair, and List tag formats

### Usage

```rust
use data_modelling_sdk::import::ODCSImporter;
use data_modelling_sdk::export::ODCSExporter;

// Import ODCS YAML
let mut importer = ODCSImporter::new();
let result = importer.import(odcs_yaml)?;

// Export to ODCS YAML
let exporter = ODCSExporter::new();
let yaml = exporter.export_table(&table, "odcs_v3_1_0")?;
```

### When to Use

- Defining data contracts (tables)
- Sharing schema definitions between systems
- Data governance and documentation
- Quality assurance and validation

---

## ODCL (Open Data Contract Language)

**Version**: v1.2.1 (Last Supported)
**Purpose**: Legacy data contract format
**Status**: Legacy format, full support maintained

### Overview

ODCL is the legacy format for data contracts. It's similar to ODCS but uses a different structure. The SDK provides full backward compatibility.

### Key Features

- **Legacy Support**: Full support for ODCL v1.2.1
- **Field Preservation**: All fields including `description`, `quality`, and `$ref` are preserved
- **Auto-Detection**: Automatically detected during import

### Usage

```rust
use data_modelling_sdk::import::ODCSImporter; // Same importer handles both ODCS and ODCL
use data_modelling_sdk::export::ODCLExporter;

// Import ODCL YAML (automatically detected)
let mut importer = ODCSImporter::new();
let result = importer.import(odcl_yaml)?;

// Export to ODCL format
let exporter = ODCLExporter::new();
let yaml = exporter.export_table(&table, "odcl")?;
```

### When to Use

- Working with legacy ODCL files
- Migrating from ODCL to ODCS
- Maintaining backward compatibility

---

## ODPS (Open Data Product Standard)

**Version**: Latest
**Purpose**: Data Products
**Status**: Full import/export support

### Overview

ODPS defines data products that link to ODCS Tables via `contractId` references. Data products represent higher-level abstractions that consume or produce data contracts.

### Key Features

- **Data Product Definition**: Complete support for ODPS data product structure
- **Contract Linking**: Links to ODCS Tables via `contractId` in input/output ports
- **Validation**: Validates `contractId` references against known ODCS Tables
- **Full Metadata**: Supports all ODPS fields including ports, support, team, and custom properties

### Usage

```rust
use data_modelling_sdk::import::ODPSImporter;
use data_modelling_sdk::export::ODPSExporter;

// Import ODPS YAML
let importer = ODPSImporter::new();
let product = importer.import(odps_yaml)?;

// Export to ODPS YAML
let exporter = ODPSExporter::new();
let yaml = exporter.export_product(&product)?;
```

### When to Use

- Defining data products
- Linking multiple data contracts together
- Product-level governance and documentation
- API and service definitions

---

## CADS (Compute Asset Description Specification)

**Version**: v1.0
**Purpose**: AI/ML models, applications, pipelines
**Status**: Full import/export support

### Overview

CADS describes computational assets including AI/ML models, ML pipelines, traditional applications, ETL pipelines, and source/destination systems. It focuses on governance, risk management, and operational clarity without embedding data schemas.

### Key Features

- **Asset Kinds**: Supports AIModel, MLPipeline, Application, ETLPipeline, SourceSystem, DestinationSystem
- **Governance-First**: Risk, compliance, and ownership are first-class concepts
- **Runtime Context**: Describes where and how assets execute
- **SLA Support**: Service level agreements for operational guarantees
- **Validation Profiles**: Defines expected checks based on asset type or risk

### Usage

```rust
use data_modelling_sdk::import::CADSImporter;
use data_modelling_sdk::export::CADSExporter;

// Import CADS YAML
let importer = CADSImporter::new();
let asset = importer.import(cads_yaml)?;

// Export to CADS YAML
let exporter = CADSExporter::new();
let yaml = exporter.export(&asset)?;
```

### When to Use

- Describing AI/ML models
- Documenting applications and pipelines
- Governance and risk management
- Operational documentation

---

## Business Domain Schema

**Version**: Custom (SDK-specific)
**Purpose**: Organize systems, CADS nodes, and ODCS nodes
**Status**: Full support

### Overview

The Business Domain schema is a top-level organizational structure that groups systems, CADS nodes, and ODCS nodes within business domains. It provides:

- **Systems**: Physical infrastructure entities (Kafka, Cassandra, EKS, EC2, etc.)
- **CADS Nodes**: References to CADS assets (AI/ML models, applications, pipelines)
- **ODCS Nodes**: References to ODCS Tables (data contracts)
- **Connections**: ERD-style connections between systems, Crow's feet notation for ODCS nodes

### Crow's Feet Notation Cardinality

The SDK supports standard crow's feet notation for ERD-style data modeling:

| Cardinality | Symbol | JSON Value | Description |
|-------------|--------|------------|-------------|
| Zero or One | ○─ | `zeroOrOne` | Optional single (0..1) |
| Exactly One | ├─ | `exactlyOne` | Required single (1..1) |
| Zero or Many | ○─< | `zeroOrMany` | Optional multiple (0..*) |
| One or Many | ├─< | `oneOrMany` | Required multiple (1..*) |

### Data Flow Direction

Relationships can specify data flow direction:

| Direction | JSON Value | Description |
|-----------|------------|-------------|
| Source to Target | `sourceToTarget` | Data flows from source to target only |
| Target to Source | `targetToSource` | Data flows from target to source only |
| Bidirectional | `bidirectional` | Data flows in both directions |

### Key Features

- **System Metadata**: Systems inherit DataFlow metadata (owner, SLA, contact_details, infrastructure_type, notes)
- **Shared References**: Systems, CADS nodes, and ODCS nodes can be shared across domains
- **Relationship Types**: ERD-style for systems, Crow's feet notation for ODCS nodes
- **Versioning**: Systems have version fields for cross-domain sharing
- **Endpoint Cardinality**: Source and target cardinality using crow's feet notation
- **Flow Direction**: Directional data flow modeling

### Usage

```rust
use data_modelling_sdk::models::{Domain, System, InfrastructureType};
use data_modelling_sdk::models::domain::{CADSNode, ODCSNode, CADSKind};
use uuid::Uuid;

// Create a domain
let mut domain = Domain::new("customer-service".to_string());

// Add a system
let system = System::new(
    "kafka-cluster".to_string(),
    InfrastructureType::Kafka,
    domain.id,
);
domain.add_system(system);

// Import/Export Domain YAML
let yaml = domain.to_yaml()?;
let domain2 = Domain::from_yaml(&yaml)?;
```

### When to Use

- Organizing infrastructure within business domains
- Mapping data flow across systems
- Cross-domain data sharing
- Enterprise architecture documentation

---

## DBMV (Databricks Metric Views)

**Version**: v1.0
**Purpose**: Semantic layer metric definitions for Databricks
**Schema**: `schemas/dbmv.schema.json`

### Overview

DBMV provides a wrapper format for Databricks Metric Views, allowing one YAML file per system containing multiple metric view definitions. The envelope uses camelCase (`apiVersion`, `kind`, `metricViews`) while inner content uses snake_case (Databricks-native format).

### Key Features

- **Multi-view documents**: Multiple metric views per file, organized by system
- **Dimensions and measures**: Full support for Databricks dimension/measure definitions
- **Window functions**: Time-based windowing with semiadditive support
- **Recursive joins**: Snowflake schema modeling with nested join definitions
- **Materialization**: Schedule, mode, and materialized view configuration
- **Format specification**: Measure formatting (currency, percentage, etc.)

### Usage

```rust
use data_modelling_sdk::import::DBMVImporter;
use data_modelling_sdk::export::DBMVExporter;

// Import DBMV document
let importer = DBMVImporter::new();
let doc = importer.import(yaml_content)?;

// Export DBMV document
let yaml = DBMVExporter::export_document(&doc);

// Import standalone Databricks metric view
let view = importer.import_single_view(yaml_content)?;
```

### File Extension

`.dbmv.yaml` (e.g., `sales-metrics.dbmv.yaml`)

---

## Other Formats

### SQL

**Purpose**: SQL DDL statements
**Support**: Import and export

```rust
use data_modelling_sdk::import::SQLImporter;
use data_modelling_sdk::export::SQLExporter;

let importer = SQLImporter::new("postgresql");
let result = importer.parse(sql_ddl)?;

let exporter = SQLExporter;
let sql = exporter.export(&tables, Some("postgresql"))?;
```

### JSON Schema

**Purpose**: JSON Schema definitions
**Support**: Import and export

```rust
use data_modelling_sdk::import::JSONSchemaImporter;
use data_modelling_sdk::export::JSONSchemaExporter;

let importer = JSONSchemaImporter::new();
let result = importer.import(json_schema)?;

let exporter = JSONSchemaExporter;
let json = exporter.export(&tables)?;
```

### AVRO

**Purpose**: AVRO schema definitions
**Support**: Import and export

```rust
use data_modelling_sdk::import::AvroImporter;
use data_modelling_sdk::export::AvroExporter;

let importer = AvroImporter::new();
let result = importer.import(avro_schema)?;

let exporter = AvroExporter;
let avro = exporter.export(&tables)?;
```

### Protobuf

**Purpose**: Protocol Buffer definitions
**Support**: Import and export

```rust
use data_modelling_sdk::import::ProtobufImporter;
use data_modelling_sdk::export::ProtobufExporter;

let importer = ProtobufImporter::new();
let result = importer.import(protobuf_content)?;

let exporter = ProtobufExporter;
let proto = exporter.export(&tables)?;
```

---

## BPMN (Business Process Model and Notation)

**Version**: 2.0
**Purpose**: Business process models
**Status**: Full support (requires `bpmn` feature)
**Storage**: Native XML format

### Overview

BPMN 2.0 is a standard for modeling business processes. The SDK stores BPMN models in their native XML format within domain directories, allowing CADS assets to reference process models.

### Key Features

- **Native XML Storage**: BPMN models are stored as-is in XML format
- **Domain Organization**: Models are stored within domain directories (`{domain_name}/{model_name}.bpmn.xml`)
- **CADS Integration**: CADS assets can reference BPMN models via `bpmn_models` field
- **Validation**: XML well-formedness checks and basic validation
- **Metadata Extraction**: Model name and metadata extracted from XML

### Usage

```rust
#[cfg(feature = "bpmn")]
use data_modelling_sdk::import::bpmn::BPMNImporter;
#[cfg(feature = "bpmn")]
use data_modelling_sdk::export::bpmn::BPMNExporter;
use uuid::Uuid;

// Import BPMN XML
let mut importer = BPMNImporter::new();
let model = importer.import(domain_id, xml_content, Some("process-name"))?;

// Export BPMN XML
let exporter = BPMNExporter::new();
let xml = exporter.export(&model, &storage_backend).await?;
```

### When to Use

- Documenting business processes
- Linking processes to compute assets (CADS)
- Process automation and workflow documentation
- Business process analysis

---

## DMN (Decision Model and Notation)

**Version**: 1.3
**Purpose**: Decision models
**Status**: Full support (requires `dmn` feature)
**Storage**: Native XML format

### Overview

DMN 1.3 is a standard for modeling business decisions. The SDK stores DMN models in their native XML format within domain directories, allowing CADS assets to reference decision models.

### Key Features

- **Native XML Storage**: DMN models are stored as-is in XML format
- **Domain Organization**: Models are stored within domain directories (`{domain_name}/{model_name}.dmn.xml`)
- **CADS Integration**: CADS assets can reference DMN models via `dmn_models` field
- **Validation**: XML well-formedness checks and basic validation
- **Metadata Extraction**: Model name and metadata extracted from XML

### Usage

```rust
#[cfg(feature = "dmn")]
use data_modelling_sdk::import::dmn::DMNImporter;
#[cfg(feature = "dmn")]
use data_modelling_sdk::export::dmn::DMNExporter;
use uuid::Uuid;

// Import DMN XML
let mut importer = DMNImporter::new();
let model = importer.import(xml_content, domain_id, Some("decision-name"))?;

// Export DMN XML
let exporter = DMNExporter::new();
let xml = exporter.export(&model, &storage_backend).await?;
```

### When to Use

- Documenting business decisions
- Linking decisions to compute assets (CADS)
- Decision automation and rule documentation
- Business rule analysis

---

## OpenAPI

**Version**: 3.1.1
**Purpose**: API specifications
**Status**: Full support (requires `openapi` feature)
**Storage**: Native YAML or JSON format

### Overview

OpenAPI 3.1.1 is a standard for describing REST APIs. The SDK stores OpenAPI specifications in their native YAML or JSON format within domain directories, allowing CADS assets to reference API specifications. Additionally, OpenAPI schema components can be converted to ODCS table definitions.

### Key Features

- **Native Format Storage**: OpenAPI specs are stored as-is in YAML or JSON format
- **Domain Organization**: Specs are stored within domain directories (`{domain_name}/{api_name}.openapi.yaml` or `.openapi.json`)
- **CADS Integration**: CADS assets can reference OpenAPI specs via `openapi_specs` field
- **Format Conversion**: YAML ↔ JSON conversion supported
- **ODCS Conversion**: Schema components can be converted to ODCS tables
- **Validation**: JSON Schema validation against OpenAPI 3.1.1 specification

### Usage

```rust
#[cfg(feature = "openapi")]
use data_modelling_sdk::import::openapi::OpenAPIImporter;
#[cfg(feature = "openapi")]
use data_modelling_sdk::export::openapi::OpenAPIExporter;
#[cfg(feature = "openapi")]
use data_modelling_sdk::models::openapi::OpenAPIFormat;
#[cfg(feature = "openapi")]
use data_modelling_sdk::convert::openapi_to_odcs::OpenAPIToODCSConverter;
use uuid::Uuid;

// Import OpenAPI spec
let mut importer = OpenAPIImporter::new();
let model = importer.import(domain_id, yaml_content, Some("api-name"))?;

// Export OpenAPI spec (with format conversion)
let exporter = OpenAPIExporter::new();
let json_content = exporter.export(&model, &storage_backend, Some(OpenAPIFormat::Json)).await?;

// Convert OpenAPI component to ODCS table
let converter = OpenAPIToODCSConverter::new();
let table = converter.convert_component(openapi_content, "User", Some("users"))?;
```

### When to Use

- Documenting REST APIs
- Linking APIs to compute assets (CADS)
- Converting API schemas to data contracts (ODCS)
- API-first development workflows
- API documentation and governance

---

## Universal Converter

The universal converter (`convert_to_odcs`) can convert any supported format to ODCS v3.1.0 format.

### Supported Formats

- SQL
- ODCS v3.1.0
- ODCL v1.2.1
- JSON Schema
- AVRO
- Protobuf
- CADS v1.0
- ODPS (Latest)
- Domain Schema

### Usage

```rust
use data_modelling_sdk::convert::convert_to_odcs;

// Auto-detect format
let odcs_yaml = convert_to_odcs(input_content, None)?;

// Explicit format
let odcs_yaml = convert_to_odcs(input_content, Some("sql"))?;
```

### Format Detection

The converter automatically detects formats based on content:

- **ODCS**: Contains `apiVersion:` and `kind: DataContract`
- **ODCL**: Contains `dataContractSpecification:`
- **SQL**: Contains `CREATE TABLE`
- **JSON Schema**: JSON object with `$schema` or `type`
- **AVRO**: JSON with `type`, `fields`, `name`
- **Protobuf**: Contains `syntax`, `message`, or `service`
- **CADS**: Contains `apiVersion:` and `kind: AIModel|MLPipeline|Application|...`
- **ODPS**: Contains `apiVersion:` and `kind: DataProduct`
- **Domain**: Contains `systems:` and `odcs_nodes:` or `cads_nodes:`

### Conversion Notes

- **CADS → ODCS**: Returns an error explaining that CADS assets represent compute resources, not data contracts
- **ODPS → ODCS**: Requires `contractId` references and ODCS Table definitions
- **Domain → ODCS**: Requires Table definitions (Domain only stores references)

---

## Schema Comparison

| Schema | Purpose | Primary Use Case | Data Contracts | Compute Assets | Products |
|--------|---------|------------------|----------------|---------------|----------|
| **ODCS** | Data Contracts | Tables/Schemas ||||
| **ODCL** | Data Contracts (Legacy) | Legacy Tables ||||
| **ODPS** | Data Products | Products linking Tables | ✅ (via refs) |||
| **CADS** | Compute Assets | AI/ML/Applications ||||
| **Domain** | Organization | Systems & Nodes | ✅ (via refs) | ✅ (via refs) ||

---

## Migration Guide

### DataFlow → Domain Schema

The DataFlow format has been migrated to the Domain schema. Use the migration utility:

```rust
use data_modelling_sdk::convert::migrate_dataflow::migrate_dataflow_to_domain;

let domain = migrate_dataflow_to_domain(dataflow_yaml, Some("domain-name"))?;
```

**Migration Mapping**:
- DataFlow nodes → Systems (with all metadata preserved)
- DataFlow relationships → SystemConnections (ERD-style)

---

## Best Practices

1. **Use ODCS for Data Contracts**: ODCS v3.1.0 is the primary format for tables/schemas
2. **Use CADS for Compute Assets**: CADS is designed for AI/ML models and applications
3. **Use ODPS for Data Products**: ODPS links multiple data contracts together
4. **Use Domain Schema for Organization**: Domain schema organizes systems and nodes within business domains
5. **Preserve Metadata**: Always use import/export functions to preserve metadata during conversions
6. **Validate References**: When using ODPS, validate `contractId` references against known ODCS Tables

---

## Serialization Format

All SDK models use **camelCase** serialization for JSON and YAML output, aligning with ODCS format conventions:

```yaml
# Example relationship in YAML
id: "dd0e8400-e29b-41d4-a716-446655440008"
sourceTableId: "990e8400-e29b-41d4-a716-446655440004"
targetTableId: "aa0e8400-e29b-41d4-a716-446655440005"
sourceCardinality: "exactlyOne"
targetCardinality: "zeroOrMany"
flowDirection: "sourceToTarget"
relationshipType: "foreignKey"
createdAt: "2025-01-01T09:00:00Z"
updatedAt: "2025-01-01T09:00:00Z"
```

Key enum values:
- **Cardinality**: `oneToOne`, `oneToMany`, `manyToOne`, `manyToMany`
- **RelationshipType**: `dataFlow`, `dependency`, `foreignKey`, `etl`
- **EndpointCardinality**: `zeroOrOne`, `exactlyOne`, `zeroOrMany`, `oneOrMany`
- **FlowDirection**: `sourceToTarget`, `targetToSource`, `bidirectional`

---

## Enhanced Tag Support

All schemas support enhanced tag formats:

- **Simple**: `"finance"` - Single word tags
- **Pair**: `"Environment:Dev"` - Key:Value pairs
- **List**: `"SecondaryDomains:[XXXXX, PPPP]"` - Key:[Value1, Value2, ...] lists

Tags are automatically detected and parsed during import, and serialized as strings during export.

---

## Further Reading

- [ODCS Specification]https://github.com/bitol-io/open-data-contract-standard
- [ODPS Specification]https://github.com/bitol-io/open-data-product-standard
- [CADS Specification]https://github.com/your-org/cads-spec (when available)
- SDK Documentation: See `README.md` and `LLM.txt` for detailed API documentation