mediagit-storage 0.2.0

Unified async storage backend trait and implementations
Documentation

MediaGit Storage Backend

Unified cloud storage abstraction layer for MediaGit, providing consistent APIs across multiple storage providers.

Overview

MediaGit Storage provides a trait-based abstraction (StorageBackend) for working with different cloud storage providers through a unified interface. This enables seamless storage provider switching, multi-cloud deployments, and local development with emulators.

Supported Backends

Production Backends

  • AWS S3 - Industry-standard object storage with multipart upload support
  • Azure Blob Storage - Microsoft Azure cloud storage with chunked uploads
  • Google Cloud Storage (GCS) - Google Cloud object storage with resumable uploads
  • MinIO - Self-hosted S3-compatible storage for on-premise deployments
  • Backblaze B2 - Cost-effective cloud storage with S3-compatible API
  • DigitalOcean Spaces - S3-compatible object storage for app platforms

Development Backends

  • Local Filesystem - File-based storage for development and testing
  • In-Memory Mock - Ephemeral storage for unit tests
  • Cache - LRU caching layer for any backend

Quick Start

Add Dependency

[dependencies]
mediagit-storage = { path = "../mediagit-storage" }
tokio = { version = "1", features = ["full"] }

Basic Usage

use mediagit_storage::{StorageBackend, S3Backend};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create S3 backend
    let storage = S3Backend::new("my-bucket").await?;

    // Store data
    storage.put("media/video.mp4", &video_data).await?;

    // Retrieve data
    let retrieved = storage.get("media/video.mp4").await?;

    // Check existence
    if storage.exists("media/video.mp4").await? {
        println!("Video exists!");
    }

    // List objects
    let objects = storage.list_objects("media/").await?;
    for key in objects {
        println!("Found: {}", key);
    }

    // Delete object
    storage.delete("media/video.mp4").await?;

    Ok(())
}

Backend Configuration

AWS S3

use mediagit_storage::S3Backend;

// From environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
let s3 = S3Backend::from_env("my-bucket").await?;

// Explicit configuration
let s3 = S3Backend::new("my-bucket").await?;

Azure Blob Storage

use mediagit_storage::AzureBackend;

// With connection string
let azure = AzureBackend::with_connection_string(
    "my-container",
    "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;"
).await?;

// With account key
let azure = AzureBackend::with_account_key(
    "account-name",
    "my-container",
    "account-key"
).await?;

// With SAS token
let azure = AzureBackend::with_sas_token(
    "account-name",
    "my-container",
    "?sv=2021-06-08&ss=b&srt=sco&sp=..."
).await?;

Google Cloud Storage

use mediagit_storage::GcsBackend;

// With service account JSON
let gcs = GcsBackend::new(
    "my-project",
    "my-bucket",
    "/path/to/service-account.json"
).await?;

// From environment (GCS_PROJECT_ID, GCS_BUCKET_NAME, GOOGLE_APPLICATION_CREDENTIALS)
let gcs = GcsBackend::from_env().await?;

MinIO

use mediagit_storage::MinIOBackend;

// Self-hosted MinIO
let minio = MinIOBackend::new(
    "http://localhost:9000",
    "my-bucket",
    "minioadmin",
    "minioadmin"
).await?;

// From environment (MINIO_ENDPOINT, MINIO_BUCKET, MINIO_ACCESS_KEY, MINIO_SECRET_KEY)
let minio = MinIOBackend::from_env().await?;

Backblaze B2 / DigitalOcean Spaces

use mediagit_storage::{B2SpacesBackend, Provider};

// Backblaze B2
let b2 = B2SpacesBackend::new_b2(
    "us-west-002",
    "my-bucket",
    "application-key-id",
    "application-key"
).await?;

// DigitalOcean Spaces
let spaces = B2SpacesBackend::new_spaces(
    "nyc3",
    "my-bucket",
    "access-key",
    "secret-key"
).await?;

Local Development

use mediagit_storage::LocalBackend;

// Local filesystem storage
let local = LocalBackend::new("/tmp/mediagit-storage").await?;

// In-memory mock for testing
use mediagit_storage::MockBackend;
let mock = MockBackend::new();

StorageBackend Trait

All backends implement the StorageBackend trait:

#[async_trait]
pub trait StorageBackend: Send + Sync + Debug {
    /// Retrieve an object by key
    async fn get(&self, key: &str) -> anyhow::Result<Vec<u8>>;

    /// Store an object with the given key
    async fn put(&self, key: &str, data: &[u8]) -> anyhow::Result<()>;

    /// Check if an object exists
    async fn exists(&self, key: &str) -> anyhow::Result<bool>;

    /// Delete an object (idempotent)
    async fn delete(&self, key: &str) -> anyhow::Result<()>;

    /// List objects with optional prefix
    async fn list_objects(&self, prefix: &str) -> anyhow::Result<Vec<String>>;
}

Features

Multipart & Chunked Uploads

  • S3: Automatic multipart upload for files >100MB
  • Azure: 4MB chunk uploads for large blobs
  • GCS: 256KB resumable uploads for files >5MB

Retry Logic

  • Exponential backoff for transient failures
  • Configurable retry attempts (default: 3)
  • Automatic retry on network errors

Concurrent Operations

  • Thread-safe implementations (Send + Sync)
  • Parallel uploads and downloads
  • Efficient connection pooling

Error Handling

  • Comprehensive error types with context
  • Detailed error messages for debugging
  • Idempotent delete operations

Testing

Unit Tests (No Dependencies)

# Run all unit tests
cargo test

# Test specific backend
cargo test --lib s3::tests
cargo test --lib azure::tests

Integration Tests (Requires Emulators)

# Start all emulators
docker-compose up -d

# Run all integration tests
cargo test --test '*' -- --ignored

# Test specific backend
cargo test --test s3_localstack_tests -- --ignored
cargo test --test azure_azurite_tests -- --ignored
cargo test --test gcs_emulator_tests -- --ignored
cargo test --test minio_docker_tests -- --ignored

See TESTING.md for complete testing guide.

Test Coverage

  • Unit Tests: 62 test cases (configuration, validation, traits)
  • Integration Tests: 89 test cases (CRUD, concurrent, edge cases)
  • Total: 151 comprehensive tests

Integration Test Breakdown

  • S3 LocalStack: 21 tests
  • Azure Azurite: 21 tests
  • GCS Emulator: 24 tests
  • MinIO Docker: 23 tests

Performance

Throughput Benchmarks (Emulators)

  • LocalStack (S3): ~50-100 MB/s
  • Azurite (Azure): ~80-150 MB/s
  • GCS Emulator: ~30-60 MB/s
  • MinIO: ~100-200 MB/s

Optimization Features

  • Connection pooling and reuse
  • Automatic multipart uploads
  • Configurable chunk sizes
  • Parallel operations support

Architecture

mediagit-storage/
├── src/
│   ├── lib.rs              # StorageBackend trait and exports
│   ├── s3.rs               # AWS S3 implementation (781 lines)
│   ├── azure.rs            # Azure Blob Storage (819 lines)
│   ├── gcs.rs              # Google Cloud Storage (894 lines)
│   ├── minio.rs            # MinIO S3-compatible (1,111 lines)
│   ├── b2_spaces.rs        # B2/Spaces unified (1,267 lines)
│   ├── local.rs            # Local filesystem backend
│   ├── mock.rs             # In-memory mock backend
│   ├── cache.rs            # LRU caching layer
│   └── error.rs            # Error types and handling
├── tests/
│   ├── s3_localstack_tests.rs      # S3 integration tests
│   ├── azure_azurite_tests.rs      # Azure integration tests
│   ├── gcs_emulator_tests.rs       # GCS integration tests
│   ├── minio_docker_tests.rs       # MinIO integration tests
│   └── gcs_integration_tests.rs    # GCS unit tests
├── docker-compose.yml      # Emulator orchestration
├── TESTING.md             # Complete testing guide
└── README.md              # This file

Dependencies

Core Dependencies

  • async-trait - Async trait definitions
  • tokio - Async runtime
  • anyhow - Error handling

Backend SDKs

  • aws-sdk-s3 - AWS S3 SDK
  • azure_storage_blobs - Azure SDK
  • google-cloud-storage - GCS SDK

Optional Dependencies

  • serde - Serialization (configuration)
  • tracing - Logging and instrumentation

Environment Variables

AWS S3

  • AWS_ACCESS_KEY_ID - AWS access key
  • AWS_SECRET_ACCESS_KEY - AWS secret key
  • AWS_REGION - AWS region (e.g., us-east-1)
  • AWS_SESSION_TOKEN - Optional session token

Azure Blob Storage

  • AZURE_STORAGE_ACCOUNT - Storage account name
  • AZURE_STORAGE_KEY - Account key
  • AZURE_STORAGE_CONNECTION_STRING - Full connection string
  • AZURE_STORAGE_SAS_TOKEN - SAS token

Google Cloud Storage

  • GCS_PROJECT_ID - GCP project ID
  • GCS_BUCKET_NAME - Bucket name
  • GOOGLE_APPLICATION_CREDENTIALS - Path to service account JSON

MinIO

  • MINIO_ENDPOINT - MinIO endpoint URL
  • MINIO_BUCKET - Bucket name
  • MINIO_ACCESS_KEY - Access key
  • MINIO_SECRET_KEY - Secret key

Production Deployment

Security Best Practices

  1. Never commit credentials - Use environment variables or secret managers
  2. Enable encryption - Use server-side encryption (S3 SSE, Azure encryption)
  3. Restrict access - Use IAM policies and minimal permissions
  4. Rotate credentials - Regular key rotation and auditing
  5. Enable versioning - Bucket versioning for data recovery

High Availability

  • Use multiple regions for redundancy
  • Configure appropriate retry policies
  • Implement health checks and monitoring
  • Use CDN for media distribution

Cost Optimization

  • Choose appropriate storage classes
  • Implement lifecycle policies
  • Monitor and optimize data transfer
  • Use compression where applicable

Roadmap

Completed (Week 6)

  • ✅ S3 backend with multipart uploads
  • ✅ Azure backend with chunked uploads
  • ✅ GCS backend with resumable uploads
  • ✅ MinIO self-hosted support
  • ✅ B2/Spaces unified backend
  • ✅ Docker Compose emulator setup
  • ✅ Comprehensive integration tests

Planned (Week 7+)

  • 🔲 Migration tool for backend switching
  • 🔲 Garbage collection for orphaned objects
  • 🔲 Client-side encryption layer
  • 🔲 Compression middleware
  • 🔲 CDN integration
  • 🔲 Metrics and monitoring
  • 🔲 Admin CLI tools

Contributing

Running Tests

# Unit tests
cargo test

# Integration tests (requires Docker)
docker-compose up -d
cargo test --test '*' -- --ignored
docker-compose down

Code Quality

# Format code
cargo fmt

# Run linter
cargo clippy -- -D warnings

# Check compilation
cargo check --all-features

License

Part of the MediaGit project. See LICENSE in repository root.

Support

  • Documentation: TESTING.md
  • Issues: GitHub Issues
  • Discussions: GitHub Discussions

Version: 0.1.0 Last Updated: 2025-11-14 Status: Week 6 Milestone Complete (Integration Testing)