data-modelling-sdk 1.2.0

Shared SDK for model operations across platforms (API, WASM, Native)
Documentation
# Data Modelling SDK

## Overview

The Data Modelling SDK is a Rust library that provides unified interfaces for data modeling operations across multiple platforms (native apps, WASM/web apps, and API backends). It enables importing from various formats (SQL, ODCS, JSON Schema, AVRO, Protobuf), exporting to multiple formats, model validation, and storage abstraction for different environments.

**Repository**: https://github.com/pixie79/data-modelling-sdk
**License**: MIT
**Rust Edition**: 2024
**Version**: 1.2.0

## Architecture

The SDK is designed with a modular architecture centered around:

1. **Storage Backends**: Abstract storage operations (file system, browser storage, HTTP API)
2. **Models**: Core data structures (Table, Column, Relationship, DataModel)
3. **Import/Export**: Format converters for various data contract formats
4. **Validation**: Table and relationship validation logic
5. **Model Management**: Loading and saving models from storage backends

## Directory Structure

```
data-modelling-sdk/
├── src/
│   ├── lib.rs                 # Main library entry point, re-exports public API
│   ├── auth/                   # Authentication types (OAuth, session management)
│   │   └── mod.rs
│   ├── export/                 # Export functionality
│   │   ├── mod.rs
│   │   ├── avro.rs            # AVRO format exporter
│   │   ├── json_schema.rs     # JSON Schema exporter
│   │   ├── odcl.rs            # ODCL format exporter
│   │   ├── dataflow.rs        # Data Flow format exporter (lightweight, separate from ODCS)
│   │   ├── odcs.rs            # ODCS v3.1.0 format exporter
│   │   ├── png.rs             # PNG diagram exporter (feature-gated)
│   │   ├── protobuf.rs        # Protobuf exporter
│   │   └── sql.rs             # SQL DDL exporter
│   ├── git/                    # Git operations (feature-gated)
│   │   ├── mod.rs
│   │   └── git_service.rs     # Git service for version control operations
│   ├── import/                 # Import functionality
│   │   ├── mod.rs
│   │   ├── avro.rs            # AVRO format importer
│   │   ├── dataflow.rs        # Data Flow format importer (lightweight, separate from ODCS)
│   │   ├── json_schema.rs     # JSON Schema importer
│   │   ├── odcs.rs            # ODCS v3.1.0 format importer (primary)
│   │   ├── protobuf.rs        # Protobuf importer
│   │   └── sql.rs             # SQL DDL importer
│   ├── model/                  # Model loading/saving
│   │   ├── mod.rs
│   │   ├── api_loader.rs      # API-based model loader
│   │   ├── loader.rs          # File-based model loader
│   │   └── saver.rs           # Model saver
│   ├── models/                 # Core data structures
│   │   ├── mod.rs
│   │   ├── column.rs          # Column and ForeignKey models
│   │   ├── cross_domain.rs    # Cross-domain relationship models
│   │   ├── data_model.rs      # DataModel container
│   │   ├── enums.rs           # Enums (DatabaseType, MedallionLayer, etc.)
│   │   ├── relationship.rs    # Relationship model
│   │   └── table.rs           # Table model
│   ├── storage/                # Storage backend abstraction
│   │   ├── mod.rs             # StorageBackend trait
│   │   ├── api.rs             # HTTP API backend (feature-gated)
│   │   ├── browser.rs         # Browser storage backend (WASM, feature-gated)
│   │   └── filesystem.rs      # File system backend (native, feature-gated)
│   ├── validation/             # Validation logic
│   │   ├── mod.rs
│   │   ├── input.rs           # Input validation (table/column names, UUIDs)
│   │   ├── relationships.rs   # Relationship validation (circular deps)
│   │   └── tables.rs          # Table validation (naming conflicts)
│   └── workspace/              # Workspace management types
│       └── mod.rs             # WorkspaceInfo, ProfileInfo types
├── tests/                      # Test suite
│   ├── auth_tests.rs
│   ├── export_tests.rs
│   ├── git_tests.rs
│   ├── import_tests.rs
│   ├── integration_tests.rs
│   ├── model_tests.rs
│   ├── models_tests.rs
│   ├── nested_structures_tests.rs
│   ├── odcs_comprehensive_tests.rs
│   ├── storage_tests.rs
│   ├── validation_tests.rs
│   └── workspace_tests.rs
├── Cargo.toml                  # Package manifest
├── cargo-audit.toml           # Security audit configuration
├── .pre-commit-config.yaml    # Pre-commit hooks configuration
└── README.md                   # User documentation
```

## Key Modules

### Storage Backends (`src/storage/`)

Abstracts file operations across different environments:

- **`StorageBackend` trait**: Common interface for all storage backends
- **`FileSystemStorageBackend`**: Native file system operations (requires `native-fs` feature)
- **`BrowserStorageBackend`**: Browser IndexedDB/localStorage (WASM, requires `wasm` feature)
- **`ApiStorageBackend`**: HTTP API backend (requires `api-backend` feature, default)

### Models (`src/models/`)

Core data structures:

- **`Table`**: Represents a database table/data contract with columns, metadata, and relationships. Also used as Data Flow nodes with enhanced metadata (owner, SLA, contact_details, infrastructure_type, notes)
- **`Column`**: Column definition with data type, constraints, foreign keys
- **`Relationship`**: Relationship between tables (source/target, type, metadata). Also used as Data Flow relationships with enhanced metadata (owner, SLA, contact_details, infrastructure_type, notes)
- **`DataModel`**: Container for tables and relationships representing a workspace/domain. Includes filter methods for Data Flow nodes and relationships
- **`ForeignKey`**: Foreign key relationship details
- **`SlaProperty`**: SLA property structure (property, value, unit, element, driver, description, scheduler, schedule)
- **`ContactDetails`**: Contact details structure (email, phone, name, role, other)
- **Enums**: `DatabaseType`, `MedallionLayer`, `SCDPattern`, `DataVaultClassification`, `ModelingLevel`, `Cardinality`, `RelationshipType`, `InfrastructureType` (70+ infrastructure types)

### Import (`src/import/`)

Importers convert various formats to SDK `Table` models:

- **`ODCSImporter`**: Primary importer for ODCS v3.1.0 format (also handles legacy ODCL) - for Data Models (tables)
- **`DataFlowImporter`**: Lightweight importer for Data Flow format YAML (nodes and relationships) - separate from ODCS
- **`SQLImporter`**: SQL DDL parser (CREATE TABLE statements)
- **`JSONSchemaImporter`**: JSON Schema to Table conversion
- **`AvroImporter`**: AVRO schema to Table conversion
- **`ProtobufImporter`**: Protobuf .proto files to Table conversion

### Export (`src/export/`)

Exporters convert SDK `Table` models to various formats:

- **`ODCSExporter`**: Exports to ODCS v3.1.0 format - for Data Models (tables)
- **`DataFlowExporter`**: Lightweight exporter for Data Flow format YAML (nodes and relationships) - separate from ODCS
- **`SQLExporter`**: Generates SQL DDL (CREATE TABLE statements)
- **`JSONSchemaExporter`**: Exports to JSON Schema format
- **`AvroExporter`**: Exports to AVRO schema format
- **`ProtobufExporter`**: Exports to Protobuf .proto format
- **`PNGExporter`**: Generates PNG diagrams (requires `png-export` feature)

### Model Management (`src/model/`)

- **`ModelLoader`**: Loads models from storage backends (tables + relationships)
- **`ModelSaver`**: Saves models to storage backends
- **`ApiModelLoader`**: Loads models via HTTP API

### Validation (`src/validation/`)

- **`TableValidator`**: Validates table names, detects naming conflicts
- **`RelationshipValidator`**: Validates relationships, detects circular dependencies
- **`InputValidator`**: Validates table/column names, UUIDs, SQL identifiers
- **`DataModel` filter methods**: `filter_nodes_by_owner()`, `filter_relationships_by_owner()`, `filter_nodes_by_infrastructure_type()`, `filter_relationships_by_infrastructure_type()`, `filter_by_tags()`

### Git Operations (`src/git/`)

- **`GitService`**: Git operations (status, commit, push, pull) - requires `git` feature

## Features

The SDK uses Cargo features to enable optional functionality:

- **`default`**: Includes `api-backend` (HTTP API support)
- **`api-backend`**: Enables `ApiStorageBackend` (reqwest, urlencoding)
- **`native-fs`**: Enables `FileSystemStorageBackend` (tokio)
- **`wasm`**: Enables `BrowserStorageBackend` (wasm-bindgen, web-sys)
- **`png-export`**: Enables PNG diagram export (image, imageproc)
- **`databricks-dialect`**: Enables Databricks SQL dialect support (datafusion)
- **`git`**: Enables Git operations (git2)

## Dependencies

### Core Dependencies
- `serde`, `serde_json`, `serde_yaml`: Serialization
- `anyhow`, `thiserror`: Error handling
- `async-trait`: Async trait support
- `uuid`: UUID generation (v4 random, v5 deterministic)
- `chrono`: Timestamps
- `tracing`: Logging
- `petgraph`: Graph operations for validation

### Optional Dependencies
- `tokio`: Async runtime (for `native-fs` feature)
- `reqwest`: HTTP client (for `api-backend` feature)
- `wasm-bindgen`, `web-sys`: WASM support (for `wasm` feature)
- `git2`: Git operations (for `git` feature)
- `image`, `imageproc`: Image processing (for `png-export` feature)
- `datafusion`: SQL parsing (for `databricks-dialect` feature)
- `sqlparser`: SQL parsing
- `yaml-rust`: YAML processing

## Usage Examples

### File System Backend (Native Apps)

```rust
use data_modelling_sdk::storage::filesystem::FileSystemStorageBackend;
use data_modelling_sdk::model::ModelLoader;

let storage = FileSystemStorageBackend::new("/path/to/workspace");
let loader = ModelLoader::new(storage);
let result = loader.load_model("workspace_path").await?;
```

### Browser Storage Backend (WASM Apps)

```rust
use data_modelling_sdk::storage::browser::BrowserStorageBackend;
use data_modelling_sdk::model::ModelLoader;

let storage = BrowserStorageBackend::new("db_name", "store_name");
let loader = ModelLoader::new(storage);
let result = loader.load_model("workspace_path").await?;
```

### API Backend (Online Mode)

```rust
use data_modelling_sdk::storage::api::ApiStorageBackend;
use data_modelling_sdk::model::ModelLoader;

let storage = ApiStorageBackend::new("http://localhost:8081/api/v1", Some("session_id"));
let loader = ModelLoader::new(storage);
let result = loader.load_model("workspace_path").await?;
```

### Import/Export

```rust
use data_modelling_sdk::import::ODCSImporter;
use data_modelling_sdk::export::ODCSExporter;

// Import
let mut importer = ODCSImporter::new();
let result = importer.import(yaml_content)?;

// Export
let exporter = ODCSExporter::new();
let yaml = exporter.export_table(&table)?;
```

### Validation

```rust
use data_modelling_sdk::validation::{TableValidator, RelationshipValidator};

let table_validator = TableValidator::new();
let result = table_validator.detect_naming_conflicts(&tables)?;

let rel_validator = RelationshipValidator::new();
let (has_cycle, cycle_path) = rel_validator.check_circular_dependency(&relationships, source_id, target_id)?;
```

## Testing

The SDK includes comprehensive tests:

- **Unit tests**: Individual module tests
- **Integration tests**: End-to-end workflows
- **Doctests**: Documentation examples

Run tests:
```bash
# All tests
cargo test --all-features

# Specific test suite
cargo test --test model_tests --features native-fs

# With output
cargo test --all-features -- --nocapture
```

## Development Workflow

### Pre-commit Hooks

The project uses pre-commit hooks for code quality:

```bash
# Install pre-commit
pip install pre-commit

# Install hooks
pre-commit install

# Run manually
pre-commit run --all-files
```

Hooks check:
- Rust formatting (`cargo fmt`)
- Rust linting (`cargo clippy`)
- Security audit (`cargo audit`)
- File formatting (trailing whitespace, end of file)
- YAML/TOML/JSON syntax

### CI/CD

GitHub Actions workflows:
- **Lint**: Format check, clippy, security audit
- **Test**: Unit and integration tests on Linux, macOS, Windows
- **Build**: Release build verification
- **Publish**: Automatic publishing to crates.io (manual trigger)

### Building

```bash
# Development build
cargo build

# Release build
cargo build --release

# With specific features
cargo build --features native-fs,git

# All features
cargo build --all-features
```

## Key Design Decisions

1. **Storage Abstraction**: Uses trait-based storage backends to support multiple environments (native, WASM, API)
2. **Feature Flags**: Optional functionality gated behind features to minimize dependencies
3. **Async/Await**: Uses async traits for storage operations to support both native and WASM
4. **Error Handling**: Uses `anyhow::Result` for convenience, `thiserror` for structured errors
5. **UUID Strategy**: Uses UUIDv5 (deterministic) for model/table IDs to avoid random number generation requirements
6. **ODCS Primary Format**: ODCS v3.1.0 is the primary format, with legacy ODCL support for backward compatibility

## Common Patterns

### Error Handling

The SDK uses `anyhow::Result` for most operations:

```rust
use anyhow::Result;

pub async fn load_model(&self, path: &str) -> Result<ModelLoadResult, StorageError> {
    // ...
}
```

### Storage Operations

All storage operations are async and use the `StorageBackend` trait:

```rust
#[async_trait(?Send)]
pub trait StorageBackend: Send + Sync {
    async fn read_file(&self, path: &str) -> Result<Vec<u8>, StorageError>;
    async fn write_file(&self, path: &str, content: &[u8]) -> Result<(), StorageError>;
    // ...
}
```

### Import/Export Pattern

Importers convert external formats to SDK `Table` models, exporters convert SDK models to external formats:

```rust
pub trait Importer {
    fn import(&mut self, content: &str) -> Result<ImportResult, ImportError>;
}

pub trait Exporter {
    fn export_table(&self, table: &Table) -> Result<String, ExportError>;
}
```

## Security

- Security advisories are tracked via `cargo audit`
- Configuration in `cargo-audit.toml` allows specific unmaintained warnings
- Pre-commit hooks run security audits automatically

## License

MIT License - See LICENSE file for details.