data-modelling-sdk 2.0.6

Shared SDK for model operations across platforms (API, WASM, Native)
Documentation
# Security Policy

## Security Measures

The Data Modelling SDK implements several security measures to protect against common vulnerabilities.

### 1. Path Traversal Protection

The `FileSystemStorageBackend` prevents directory traversal attacks:

- Paths containing `..` are rejected with `StorageError::PermissionDenied`
- All resolved paths are verified to remain within the configured base directory
- Symlinks are resolved and validated to prevent escape via symbolic links

```rust
// This will fail with PermissionDenied
backend.read_file("../etc/passwd").await; // Error!
backend.read_file("/foo/../../etc/passwd").await; // Error!
```

### 2. Input Validation

All imported data is validated before processing:

- **Table names**: Must start with a letter or underscore, contain only alphanumeric characters, hyphens, and underscores
- **Column names**: Same as table names, plus dots for nested columns
- **Data types**: Checked for SQL injection patterns
- **Lengths**: All inputs have maximum length limits (255 chars for identifiers)

```rust
use data_modelling_sdk::validation::input::validate_table_name;

// Valid
assert!(validate_table_name("users").is_ok());

// Invalid - contains SQL injection attempt
assert!(validate_table_name("users; DROP TABLE users").is_err());
```

### 3. SQL Identifier Escaping

When exporting to SQL, all identifiers are properly quoted and escaped:

- PostgreSQL: `"identifier"` with internal `"` doubled
- MySQL: `` `identifier` `` with internal `` ` `` doubled
- SQL Server: `[identifier]` with internal `]` doubled

```rust
use data_modelling_sdk::validation::input::sanitize_sql_identifier;

// Returns: "user""table"
let safe = sanitize_sql_identifier("user\"table", "postgres");
```

### 4. Domain Validation (API Backend)

Domain parameters in API requests are validated:

- Maximum 100 characters
- Only alphanumeric, hyphens, and underscores allowed
- Cannot start with a period
- URL-encoded before use in API paths

### 5. Reserved Word Detection

SQL reserved words are detected and flagged:

- `SELECT`, `TABLE`, `INSERT`, `UPDATE`, `DELETE`, etc.
- Validation warnings are logged but don't block import
- Use `validate_table_name()` or `validate_column_name()` to check

### 6. Safe Error Handling

- No `unwrap()` calls on user-provided data in non-test code
- All errors are properly propagated with context
- Sensitive information is not leaked in error messages

### 7. Secure Credential Handling

The SDK provides secure credential handling for cloud service integrations:

- **`SecureCredentials`**: Wrapper type that prevents accidental logging of secrets
  - Implements `Display` and `Debug` traits to show redacted values
  - Use `expose()` method to access the actual credential value
  - Prevents credentials from appearing in logs, error messages, or debug output

```rust
use data_modelling_sdk::staging::SecureCredentials;

// Create secure credential
let token = SecureCredentials::new("dapi_secret_token_123456789");

// Safe to log - shows redacted value
println!("Token: {}", token);  // Output: "Token: dapi****"

// Get actual value when needed
let actual_value = token.expose();
```

- **`redact_secret()`**: Utility function to redact secrets while showing prefix
  - Shows first N characters (default: 4) followed by asterisks
  - Useful for debugging while protecting sensitive data

```rust
use data_modelling_sdk::staging::redact_secret;

let redacted = redact_secret("AKIAIOSFODNN7EXAMPLE", 4);
// Returns: "AKIA****"
```

- **`redact_secrets_in_string()`**: Automatically detects and redacts secrets in text
  - AWS access keys (AKIA...)
  - AWS secret keys (40-character alphanumeric)
  - Bearer tokens
  - URL-embedded passwords (user:password@host)

```rust
use data_modelling_sdk::staging::redact_secrets_in_string;

let log_message = "Connecting with key AKIAIOSFODNN7EXAMPLE";
let safe_message = redact_secrets_in_string(&log_message);
// Returns: "Connecting with key AKIA****"
```

### 8. Cloud Service Security

For S3 and Databricks integrations:

- **AWS Credentials**: Use AWS SDK credential chain (environment, profile, IAM role)
  - Never hardcode credentials in source files
  - Prefer IAM roles for EC2/ECS/Lambda workloads
  - Use `--s3-profile` flag for named profiles

- **Databricks Tokens**: Use `SecureCredentials` wrapper
  - Store tokens in environment variables (`$DATABRICKS_TOKEN`)
  - Tokens are automatically redacted in logs and errors
  - Never pass tokens via command-line arguments in scripts

## Reporting Vulnerabilities

If you discover a security vulnerability, please:

1. **Do not** open a public GitHub issue
2. Email the maintainers directly at [security contact TBD]
3. Include a detailed description and steps to reproduce
4. Allow reasonable time for a fix before public disclosure

## Secure Usage Guidelines

### For SDK Users

1. **Always validate user input** before passing to SDK functions
2. **Use the provided validation functions** (`validate_table_name`, etc.)
3. **Handle errors gracefully** - don't expose internal errors to end users
4. **Keep the SDK updated** to receive security patches

### For Import Operations

```rust
use data_modelling_sdk::import::sql::SQLImporter;
use data_modelling_sdk::validation::input::validate_table_name;

let importer = SQLImporter::new("postgres");
let result = importer.parse(user_provided_sql)?;

// Additional validation on imported data
for table in &result.tables {
    if let Some(name) = &table.name {
        if let Err(e) = validate_table_name(name) {
            log::warn!("Potentially unsafe table name: {}", e);
        }
    }
}
```

### For Export Operations

```rust
use data_modelling_sdk::export::sql::SQLExporter;

// The exporter automatically quotes and escapes identifiers
let exporter = SQLExporter;
let result = exporter.export(&tables, Some("postgres"))?;
// result.content contains safe SQL with properly escaped identifiers
```

### For File System Operations

```rust
use data_modelling_sdk::storage::filesystem::FileSystemStorageBackend;

// Always use a restricted base path
let backend = FileSystemStorageBackend::new("/app/data");

// User paths are automatically validated
backend.read_file(user_path).await?; // Safe - validates against base path
```

## Changelog

### v2.0.0
- Added `SecureCredentials` wrapper type for preventing credential leaks
- Added `redact_secret()` and `redact_secrets_in_string()` utilities
- Automatic secret detection and redaction for AWS keys, tokens, and passwords
- S3 ingestion uses AWS SDK credential chain (secure by default)
- Databricks integration uses secure token handling

### v1.14.0
- Consistent camelCase serialization across all models for secure API responses
- All enums serialize with camelCase values to prevent injection via field manipulation

### v0.3.0
- Added path traversal protection to `FileSystemStorageBackend`
- Added domain validation to `ApiStorageBackend`
- Added input validation module with sanitizers
- Replaced unsafe `unwrap()` calls with proper error handling
- Added SQL identifier quoting with escape handling
- Added Protobuf reserved word handling