spring-batch-rs 0.3.4

---
title: Architecture
description: Deep dive into Spring Batch RS architecture and design patterns
sidebar:
  order: 1
---

import { Card, CardGrid, Tabs, TabItem, Aside } from '@astrojs/starlight/components';

# Spring Batch RS Architecture

Spring Batch RS is built on proven batch processing patterns from the Java Spring Batch framework, adapted for Rust's unique strengths in performance and safety.

## High-Level Architecture

```mermaid
graph TB
    subgraph "Application Layer"
        App[Your Application]
    end

    subgraph "Spring Batch RS Core"
        Job[Job]
        JobExec[JobExecution]
        Step[Step]
        StepExec[StepExecution]
    end

    subgraph "Processing Layer"
        ChunkOrient[Chunk-Oriented Processing]
        TaskletProc[Tasklet Processing]
    end

    subgraph "I/O Layer"
        Reader[ItemReader]
        Processor[ItemProcessor]
        Writer[ItemWriter]
        Tasklet[Tasklet]
    end

    subgraph "Data Sources"
        Files[Files<br/>CSV, JSON, XML]
        DB[Databases<br/>PostgreSQL, MySQL, SQLite]
        NoSQL[NoSQL<br/>MongoDB]
        Network[Network<br/>FTP, FTPS]
    end

    App --> Job
    Job --> JobExec
    JobExec --> Step
    Step --> StepExec
    StepExec --> ChunkOrient
    StepExec --> TaskletProc

    ChunkOrient --> Reader
    ChunkOrient --> Processor
    ChunkOrient --> Writer
    TaskletProc --> Tasklet

    Reader -.-> Files
    Reader -.-> DB
    Reader -.-> NoSQL
    Writer -.-> Files
    Writer -.-> DB
    Writer -.-> NoSQL
    Tasklet -.-> Network
    Tasklet -.-> Files

    style Job fill:#3b82f6,stroke:#1e40af,color:#fff
    style Step fill:#10b981,stroke:#059669,color:#fff
    style ChunkOrient fill:#f59e0b,stroke:#d97706,color:#fff
    style TaskletProc fill:#f59e0b,stroke:#d97706,color:#fff
```

## Core Components

### Job

A **Job** represents the entire batch process. It's the top-level container that orchestrates one or more steps.

```rust
use spring_batch_rs::core::job::JobBuilder;

let job = JobBuilder::new()
    .start(&step1)
    .next(&step2)
    .next(&step3)
    .build();

let result = job.run()?;
```

**Key Characteristics:**
- Immutable once created
- Can have multiple steps executed sequentially
- Maintains execution state and metadata
- Provides rollback capabilities on failure

### Step

A **Step** is an independent, sequential phase of a Job. Each step can either process data in chunks or execute a single task.

```mermaid
graph LR
    Step[Step] --> Type{Step Type?}
    Type -->|Chunk-Oriented| Chunk[Read → Process → Write]
    Type -->|Tasklet| Task[Single Task Execution]

    style Step fill:#10b981,color:#fff
    style Chunk fill:#3b82f6,color:#fff
    style Task fill:#f59e0b,color:#fff
```

### Chunk-Oriented Processing

The read-process-write pattern for handling large datasets efficiently.

```mermaid
sequenceDiagram
    participant Step
    participant Reader
    participant Processor
    participant Writer

    loop For each chunk
        Step->>Reader: read(chunk_size)
        Reader-->>Step: items[1..N]

        loop For each item
            Step->>Processor: process(item)
            Processor-->>Step: transformed_item
        end

        Step->>Writer: write(chunk)
        Writer-->>Step: success
    end
```

**Architecture Benefits:**
- **Memory Efficient**: Only loads chunk_size items at a time
- **Transactional**: Commits per chunk, not per item
- **Fault Tolerant**: Can skip failed items within limits
- **Performant**: Batches I/O operations

### ItemReader

Abstracts data retrieval from various sources.

```rust
pub trait ItemReader<T> {
    fn read(&mut self) -> ItemReaderResult<T>;
}
```

**Design Pattern**: Iterator-like pattern with error handling

<CardGrid>
  <Card title="File Readers" icon="document">
    - CsvItemReader
    - JsonItemReader
    - XmlItemReader
  </Card>
  <Card title="Database Readers" icon="seti:db">
    - RdbcItemReader (SQL)
    - OrmItemReader (SeaORM)
    - MongoItemReader
  </Card>
  <Card title="Utility Readers" icon="star">
    - FakeItemReader
    - Custom implementations
  </Card>
</CardGrid>

### ItemProcessor

Transforms and validates items during processing.

```rust
pub trait ItemProcessor<I, O> {
    fn process(&self, item: I) -> ItemProcessorResult<O>;
}
```

**Key Features:**
- Type transformation: `I` → `O`
- Filtering: Return `None` to skip items
- Validation: Return `Err` for invalid items
- Stateless design for parallelization

### ItemWriter

Outputs processed items to destinations.

```rust
pub trait ItemWriter<T> {
    fn write(&mut self, items: &[T]) -> ItemWriterResult;
}
```

**Batch Writing**: Receives chunks of items for efficient I/O

### Tasklet

Single-task operations that don't fit the chunk pattern.

```rust
pub trait Tasklet {
    fn execute(&self, step_execution: &StepExecution)
        -> Result<RepeatStatus, BatchError>;
}
```

**Common Use Cases:**
- File compression (ZIP)
- File transfer (FTP/FTPS)
- Database maintenance
- Cleanup operations
- API calls

## Execution Flow

### Complete Job Execution

```mermaid
stateDiagram-v2
    [*] --> JobStarting
    JobStarting --> StepExecution

    state StepExecution {
        [*] --> ReadChunk
        ReadChunk --> ProcessItems
        ProcessItems --> WriteChunk
        WriteChunk --> MoreData?
        MoreData? --> ReadChunk: Yes
        MoreData? --> [*]: No
    }

    StepExecution --> NextStep?
    NextStep? --> StepExecution: More Steps
    NextStep? --> JobComplete: Done
    JobComplete --> [*]
```

### Error Handling Flow

```mermaid
graph TB
    Read[Read Item] --> Process[Process Item]
    Process --> Success{Success?}
    Success -->|Yes| Write[Write to Chunk]
    Success -->|No| SkipCheck{Skip Limit<br/>Reached?}
    SkipCheck -->|No| Skip[Skip Item & Continue]
    SkipCheck -->|Yes| Fail[Fail Job]
    Write --> MoreItems{More Items?}
    MoreItems -->|Yes| Read
    MoreItems -->|No| Commit[Commit Chunk]
    Skip --> MoreItems

    style Success fill:#10b981,color:#fff
    style SkipCheck fill:#f59e0b,color:#fff
    style Fail fill:#ef4444,color:#fff
    style Commit fill:#3b82f6,color:#fff
```

## Design Patterns

### Builder Pattern

All components use the builder pattern for flexible, type-safe construction.

```rust
let reader = CsvItemReaderBuilder::<Product>::new()
    .has_headers(true)
    .delimiter(b',')
    .from_path("products.csv")?;

let step = StepBuilder::new("process-products")
    .chunk(100)
    .reader(&reader)
    .processor(&processor)
    .writer(&writer)
    .skip_limit(10)
    .build();
```

**Benefits:**
- Clear, readable API
- Compile-time validation
- Sensible defaults
- Flexible configuration

### Strategy Pattern

Readers, processors, and writers are interchangeable strategies.

```mermaid
graph LR
    Step[Step] --> IReader[ItemReader Trait]
    IReader --> CSV[CsvReader]
    IReader --> JSON[JsonReader]
    IReader --> DB[DatabaseReader]

    style IReader fill:#3b82f6,color:#fff
    style CSV fill:#10b981,color:#fff
    style JSON fill:#10b981,color:#fff
    style DB fill:#10b981,color:#fff
```

### Template Method Pattern

Job and Step execution follows a template with customizable steps.

```rust
// Framework provides the template
pub fn run(&self) -> Result<JobExecution, BatchError> {
    self.before_job()?;        // Hook
    let result = self.execute_steps()?;
    self.after_job()?;         // Hook
    Ok(result)
}
```

## Memory Model

### Chunk Processing Memory Usage

```mermaid
graph TB
    subgraph "Memory Usage Per Chunk"
        Input[Input Buffer<br/>~chunk_size items]
        Processing[Processing Buffer<br/>~chunk_size items]
        Output[Output Buffer<br/>~chunk_size items]
    end

    subgraph "Total Memory"
        Total[~3 × chunk_size × item_size]
    end

    Input --> Processing
    Processing --> Output
    Output --> Total

    style Total fill:#f59e0b,color:#fff
```

**Memory Optimization:**
- Adjust `chunk_size` based on available memory
- Use streaming for large items
- Paginate database queries
- Clear buffers after each chunk

### Resource Management

```rust
// Resources are automatically cleaned up
{
    let reader = CsvItemReaderBuilder::new()
        .from_path("large_file.csv")?;

    // File handle opened
    let step = StepBuilder::new("process")
        .chunk(1000)  // Only 1000 items in memory
        .reader(&reader)
        .build();

    job.run()?;
    // File handle automatically closed
}
```

## Concurrency Model

Spring Batch RS is designed for single-threaded execution by default, but supports parallelization strategies.

### Current Model: Sequential

```mermaid
sequenceDiagram
    participant J as Job
    participant S1 as Step 1
    participant S2 as Step 2
    participant S3 as Step 3

    J->>S1: Execute
    S1-->>J: Complete
    J->>S2: Execute
    S2-->>J: Complete
    J->>S3: Execute
    S3-->>J: Complete
```

### Future: Parallel Steps

```mermaid
graph TB
    Job[Job] --> Split{Split}
    Split --> Step1[Step 1]
    Split --> Step2[Step 2]
    Split --> Step3[Step 3]
    Step1 --> Join{Join}
    Step2 --> Join
    Step3 --> Join
    Join --> Next[Next Step]

    style Split fill:#3b82f6,color:#fff
    style Join fill:#10b981,color:#fff
```

## Transaction Management

### Database Transactions

```mermaid
sequenceDiagram
    participant Step
    participant Reader
    participant Writer
    participant DB

    Step->>DB: BEGIN TRANSACTION

    loop Each chunk
        Step->>Reader: read(chunk_size)
        Reader->>DB: SELECT...
        DB-->>Reader: rows
        Reader-->>Step: items

        Step->>Writer: write(items)
        Writer->>DB: INSERT/UPDATE...
    end

    Step->>DB: COMMIT

    Note over Step,DB: If error: ROLLBACK
```

### File Operations

File operations are not transactional by default. Use staging directories:

```rust
// Write to temporary location
let temp_writer = JsonItemWriterBuilder::<TempData>::new()
    .from_path("/tmp/output.json")?;

// On success, move to final location
std::fs::rename("/tmp/output.json", "/final/output.json")?;
```

## Error Handling Architecture

<Tabs>
  <TabItem label="Skip Strategy">
    ```rust
    let step = StepBuilder::new("fault-tolerant")
        .chunk(100)
        .reader(&reader)
        .processor(&processor)
        .writer(&writer)
        .skip_limit(10)  // Skip up to 10 errors
        .build();
    ```

    **Use when**: Individual item failures shouldn't stop the job
  </TabItem>

  <TabItem label="Fail-Fast Strategy">
    ```rust
    let step = StepBuilder::new("critical-process")
        .chunk(100)
        .reader(&reader)
        .processor(&processor)
        .writer(&writer)
        // No skip_limit - fail on first error
        .build();
    ```

    **Use when**: Data integrity is critical
  </TabItem>

  <TabItem label="Retry Strategy">
    ```rust
    struct RetryProcessor<P> {
        inner: P,
        max_retries: u32,
    }

    impl<I, O, P> ItemProcessor<I, O> for RetryProcessor<P>
    where
        P: ItemProcessor<I, O>,
    {
        fn process(&self, item: I) -> ItemProcessorResult<O> {
            let mut attempts = 0;
            loop {
                match self.inner.process(item.clone()) {
                    Ok(result) => return Ok(result),
                    Err(e) if attempts < self.max_retries => {
                        attempts += 1;
                        std::thread::sleep(Duration::from_millis(100 * attempts));
                    }
                    Err(e) => return Err(e),
                }
            }
        }
    }
    ```

    **Use when**: Transient failures are expected (network, locks)
  </TabItem>
</Tabs>

## Performance Characteristics

### Throughput vs Memory Trade-offs

```mermaid
graph LR
    Small[Small Chunks<br/>10-50 items] -->|Lower Memory<br/>Lower Throughput| Result1[Safe for<br/>Large Items]
    Medium[Medium Chunks<br/>100-500 items] -->|Balanced| Result2[Recommended<br/>Default]
    Large[Large Chunks<br/>1000+ items] -->|Higher Memory<br/>Higher Throughput| Result3[High-Performance<br/>Small Items]

    style Medium fill:#10b981,color:#fff
```

### Benchmarks (Typical)

| Operation | Small Chunks (10) | Medium Chunks (100) | Large Chunks (1000) |
|-----------|-------------------|---------------------|---------------------|
| CSV Read | 5,000/sec | 45,000/sec | 180,000/sec |
| JSON Write | 3,000/sec | 28,000/sec | 95,000/sec |
| DB Insert | 500/sec | 4,000/sec | 12,000/sec |

<Aside type="tip">
  **Optimization Tip**: Start with chunk size of 100, then increase based on memory availability and item size.
</Aside>

## Extension Points

### Custom ItemReader

```rust
use spring_batch_rs::core::item::ItemReader;
use spring_batch_rs::BatchError;

struct ApiItemReader {
    url: String,
    page: usize,
    buffer: Vec<Item>,
}

impl ItemReader<Item> for ApiItemReader {
    fn read(&mut self) -> ItemReaderResult<Item> {
        if self.buffer.is_empty() {
            // Fetch next page
            self.fetch_page()?;
        }

        Ok(self.buffer.pop())
    }
}
```

### Custom Tasklet

```rust
use spring_batch_rs::core::step::{Tasklet, StepExecution, RepeatStatus};

struct CleanupTasklet {
    directory: PathBuf,
    days_old: u32,
}

impl Tasklet for CleanupTasklet {
    fn execute(&self, execution: &StepExecution)
        -> Result<RepeatStatus, BatchError>
    {
        // Custom cleanup logic
        self.delete_old_files()?;
        Ok(RepeatStatus::Finished)
    }
}
```

## Best Practices

<CardGrid>
  <Card title="1. Size Your Chunks Wisely" icon="rocket">
    - Start with 100 items
    - Monitor memory usage
    - Adjust based on item size
    - Consider database batch limits
  </Card>

  <Card title="2. Handle Errors Gracefully" icon="warning">
    - Set appropriate skip limits
    - Log skipped items
    - Implement retry logic for transient errors
    - Use validation early
  </Card>

  <Card title="3. Optimize I/O" icon="setting">
    - Use buffered readers/writers
    - Batch database operations
    - Compress network transfers
    - Cache reference data
  </Card>

  <Card title="4. Monitor & Measure" icon="information">
    - Track execution times
    - Monitor memory usage
    - Log progress regularly
    - Profile critical paths
  </Card>
</CardGrid>

## Summary

Spring Batch RS architecture provides:

✅ **Separation of Concerns**: Clear separation between reading, processing, and writing
✅ **Flexibility**: Multiple processing models (chunk vs tasklet)
✅ **Extensibility**: Easy to add custom components
✅ **Reliability**: Built-in error handling and recovery
✅ **Performance**: Optimized for throughput and memory efficiency
✅ **Type Safety**: Rust's strong type system prevents common errors

## Next Steps

- [Processing Models](/processing-models/) - Deep dive into chunk vs tasklet
- [Item Readers & Writers](/item-readers-writers/overview/) - Explore all I/O options
- [Examples](/examples/) - See architecture in action