spring-batch-rs 0.3.4

---
title: Java vs Rust Benchmark — 10M Transactions
description: Production-grade comparison of Spring Batch (Java) and Spring Batch RS (Rust) on a 10-million-row financial ETL pipeline (CSV → PostgreSQL → XML).
sidebar:
  order: 3
---

import { Tabs, TabItem, Aside } from '@astrojs/starlight/components';

This page compares **Spring Batch (Java 25 / Spring Boot 4.x)** and **Spring Batch RS (Rust)** on a
realistic ETL pipeline: reading 10 million financial transactions from CSV, storing them in
PostgreSQL, then exporting to XML.

Both implementations use **identical settings** — chunk size 1 000, connection pool 10,
same data schema — so the comparison is apples-to-apples.

---

## Test Environment

| Parameter | Value |
|-----------|-------|
| Machine | 8-core CPU, 16 GB RAM, NVMe SSD |
| OS | Ubuntu 22.04 LTS |
| PostgreSQL | 15.4 (local, same machine) |
| Java | OpenJDK 25, Spring Boot 4.0.3, Spring Batch 6.x |
| JVM flags | `-Xms512m -Xmx4g -XX:+UseG1GC` + virtual threads enabled |
| Rust | 1.77 stable, `--release` (`opt-level = 3`) |
| JVM GC | G1GC, logged with `-Xlog:gc*:gc.log` |
| Virtual threads | Enabled (`spring.threads.virtual.enabled=true`) |
| Chunk size | 1 000 (both) |
| Pool size | 10 connections (both) |

<Aside type="note">
Results vary by hardware, PostgreSQL configuration, and disk speed.
The numbers below are reference measurements — **run the benchmark yourself** to compare
on your own infrastructure (see [How to Reproduce](#how-to-reproduce)).
</Aside>

---

## Pipeline

```
transactions.csv (10M rows)
        │
        ▼ CsvItemReader / FlatFileItemReader
  TransactionProcessor
  (USD/GBP → EUR conversion, CANCELLED → FAILED)
        │
        ▼ PostgresItemWriter / JdbcBatchItemWriter  (bulk insert, chunk=1000)
   PostgreSQL: table transactions
        │
        ▼ RdbcItemReader / JdbcPagingItemReader  (paginated, page_size=1000)
        │
        ▼ XmlItemWriter / StaxEventItemWriter
  transactions_export.xml
```

### Transaction record

| Field | Type | Example |
|-------|------|---------|
| `transaction_id` | string | `TXN-0000000001` |
| `amount` | float | `1234.56` |
| `currency` | string | `USD`, `EUR`, `GBP` |
| `timestamp` | string | `2024-06-15T12:00:00Z` |
| `account_from` | string | `ACC-00042137` |
| `account_to` | string | `ACC-00891023` |
| `status` | string | `PENDING`, `COMPLETED`, `FAILED`, `CANCELLED` |
| `amount_eur` | float | `1135.80` (added by processor) |

---

## Code Side by Side

### Data Model

<Tabs>
  <TabItem label="Rust">
    ```rust
    #[derive(Debug, Clone, Deserialize, Serialize, FromRow)]
    struct Transaction {
        transaction_id: String,
        amount: f64,
        currency: String,
        timestamp: String,
        account_from: String,
        account_to: String,
        status: String,
        #[serde(default)]
        amount_eur: f64,
    }
    ```
  </TabItem>
  <TabItem label="Java">
    ```java
    @Entity
    @Table(name = "transactions")
    @XmlRootElement(name = "transaction")
    @XmlAccessorType(XmlAccessType.FIELD)
    public class Transaction {
        @Id
        @Column(name = "transaction_id")
        private String transactionId;
        private double amount;
        private String currency;
        private String timestamp;
        @Column(name = "account_from")
        private String accountFrom;
        @Column(name = "account_to")
        private String accountTo;
        private String status;
        @Column(name = "amount_eur")
        private double amountEur;
        // getters / setters ...
    }
    ```
  </TabItem>
</Tabs>

---

### Processor (currency conversion + status normalisation)

<Tabs>
  <TabItem label="Rust">
    ```rust
    #[derive(Default)]
    struct TransactionProcessor;

    impl ItemProcessor<Transaction, Transaction> for TransactionProcessor {
        fn process(&self, item: &Transaction) -> ItemProcessorResult<Transaction> {
            let rate = match item.currency.as_str() {
                "USD" => 0.92,
                "GBP" => 1.17,
                _     => 1.0,
            };
            let status = if item.status == "CANCELLED" {
                "FAILED".to_string()
            } else {
                item.status.clone()
            };
            Ok(Some(Transaction {
                amount_eur: (item.amount * rate * 100.0).round() / 100.0,
                status,
                ..item.clone()
            })
        }
    }
    ```
  </TabItem>
  <TabItem label="Java">
    ```java
    @Component
    public class TransactionProcessor
        implements ItemProcessor<Transaction, Transaction> {

        private static final Map<String, Double> RATES = Map.of(
            "USD", 0.92, "GBP", 1.17, "EUR", 1.0);

        @Override
        public Transaction process(Transaction item) {
            double rate = RATES.getOrDefault(item.getCurrency(), 1.0);
            item.setAmountEur(
                Math.round(item.getAmount() * rate * 100.0) / 100.0);
            if ("CANCELLED".equals(item.getStatus()))
                item.setStatus("FAILED");
            return item;
        }
    }
    ```
  </TabItem>
</Tabs>

---

### Step 1 — CSV → PostgreSQL

<Tabs>
  <TabItem label="Rust">
    ```rust
    let file     = File::open(csv_path)?;
    let buffered = BufReader::with_capacity(64 * 1024, file);

    let reader = CsvItemReaderBuilder::<Transaction>::new()
        .has_headers(true)
        .from_reader(buffered);

    let writer = RdbcItemWriterBuilder::<Transaction>::new()
        .postgres(&pool)
        .table("transactions")
        .add_column("transaction_id")
        // ... 8 columns total
        .postgres_binder(&TransactionBinder)
        .build_postgres();

    let step = StepBuilder::new("csv-to-postgres")
        .chunk::<Transaction, Transaction>(1_000)
        .reader(&reader)
        .processor(&TransactionProcessor)
        .writer(&writer)
        .build();
    ```
  </TabItem>
  <TabItem label="Java">
    ```java
    @Bean
    public FlatFileItemReader<Transaction> csvReader() {
        return new FlatFileItemReaderBuilder<Transaction>()
            .name("transactionCsvReader")
            .resource(new FileSystemResource(csvPath))
            .linesToSkip(1)
            .delimited().delimiter(",")
            .names("transactionId","amount","currency","timestamp",
                   "accountFrom","accountTo","status")
            .targetType(Transaction.class)
            .build();
    }

    @Bean
    public Step step1(...) {
        return new StepBuilder("csvToPostgresStep", repo)
            .<Transaction, Transaction>chunk(1_000, tx)
            .reader(csvReader())
            .processor(processor)
            .writer(postgresWriter(dataSource))
            .build();
    }
    ```
  </TabItem>
</Tabs>

---

### Step 2 — PostgreSQL → XML

<Tabs>
  <TabItem label="Rust">
    ```rust
    let reader = RdbcItemReaderBuilder::<Transaction>::new()
        .postgres(pool.clone())
        .query(
            "SELECT transaction_id, amount, currency, timestamp, \
             account_from, account_to, status, amount_eur \
             FROM transactions ORDER BY transaction_id",
        )
        .with_page_size(1_000)
        .build_postgres();

    let writer = XmlItemWriterBuilder::<Transaction>::new()
        .root_tag("transactions")
        .item_tag("transaction")
        .from_path(xml_path)?;

    let step = StepBuilder::new("postgres-to-xml")
        .chunk::<Transaction, Transaction>(1_000)
        .reader(&reader)
        .processor(&PassThroughProcessor::new())
        .writer(&writer)
        .build();
    ```
  </TabItem>
  <TabItem label="Java">
    ```java
    @Bean
    public JdbcPagingItemReader<Transaction> postgresReader(DataSource ds) {
        return new JdbcPagingItemReaderBuilder<Transaction>()
            .name("postgresTransactionReader")
            .dataSource(ds)
            .selectClause("SELECT transaction_id,amount,currency,timestamp," +
                          "account_from,account_to,status,amount_eur")
            .fromClause("FROM transactions")
            .sortKeys(Map.of("transaction_id", Order.ASCENDING))
            .rowMapper(/* maps columns → Transaction */)
            .pageSize(1_000).build();
    }

    @Bean
    public Step step2(...) {
        return new StepBuilder("postgrestoXmlStep", repo)
            .<Transaction, Transaction>chunk(1_000, tx)
            .reader(postgresReader(dataSource))
            .writer(xmlWriter(marshaller))
            .build();
    }
    ```
  </TabItem>
</Tabs>

---

## Results

*Measured on the reference environment described above.*

### Overall performance

| Metric | Spring Batch RS (Rust) | Spring Batch (Java) | Rust advantage |
|--------|------------------------|---------------------|----------------|
| **Total pipeline time** | **42 s** | **187 s** | **4.5×** faster |
| Step 1 duration (CSV→PG) | 28 s | 124 s | 4.4× |
| Step 2 duration (PG→XML) | 14 s | 63 s | 4.5× |
| JVM / binary startup | &lt; 10 ms | 3 200 ms | 320× |
| Deployable artefact size | 8 MB (binary) | 47 MB (fat JAR) | 6× smaller |

### Throughput (records/sec)

| Step | Rust | Java | Ratio |
|------|------|------|-------|
| Step 1 — CSV → PostgreSQL | 357 000 | 80 600 | 4.4× |
| Step 2 — PostgreSQL → XML | 714 000 | 158 700 | 4.5× |

### Memory (peak RSS)

| Metric | Rust | Java |
|--------|------|------|
| **Peak RSS** | **62 MB** | **1 840 MB** |
| Heap peak | N/A (no GC) | 1 620 MB |
| Steady-state RSS | ~45 MB | ~820 MB |

### GC (Java only)

| Metric | Value |
|--------|-------|
| Total GC events | 312 |
| Total GC pause time | 8.4 s |
| Longest single pause | 340 ms |
| % of runtime in GC | 4.5% |

<Aside type="tip">
The 340 ms GC pause (longest observed) occurred mid-Step 1 during a Full GC triggered by
heap pressure from buffering 1 000-record chunks of deserialized objects. In Rust, there are
zero pauses — the borrow checker ensures memory is freed immediately when a chunk goes out
of scope.
</Aside>

---

## Analysis

### Why is Rust ~4.5× faster?

**1. No garbage collection.**
Java's G1GC paused for a cumulative 8.4 seconds. Rust uses RAII — memory is freed the
instant a chunk goes out of scope, with zero overhead and zero latency spikes.

**2. Lower memory pressure.**
Java holds JVM metadata, class bytecode, and JIT-compiled code in addition to heap data.
Spring Batch also retains `JobExecution` and `StepExecution` objects throughout the run.
Rust's binary is a single executable: **62 MB vs 1 840 MB peak RSS**.

**3. Zero-cost abstractions.**
Rust's trait-based pipeline (`ItemReader` → `ItemProcessor` → `ItemWriter`) compiles to a
tight loop with no virtual dispatch overhead. Java's pipeline involves Spring AOP, proxy
objects, and transaction management wrappers on every chunk boundary.

**4. Startup time.**
The JVM takes 3.2 s to start, load classes, and JIT-compile hot paths. The Rust binary
starts in under 10 ms — critical for short jobs or frequent schedules.

### When to choose Java

- Your team is Java-first and migration cost outweighs performance gains
- You need Spring ecosystem integrations (Spring Data, Spring Cloud Task, Spring Integration)
- Your batch jobs run infrequently and throughput is not the bottleneck
- You require rich operational features: `JobRepository`, `JobExplorer`, REST API control

### When to choose Rust

- Throughput and latency are business requirements (financial settlement, real-time ETL)
- Memory is constrained (embedded systems, small containers)
- GC pauses would cause SLA violations
- You want a single statically-linked binary with no runtime dependency
- Cold-start time matters (serverless, frequent scheduling)

---

## How to Reproduce

### Prerequisites

```bash
# PostgreSQL 15+ (Docker):
docker run -d --name pg-bench \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=benchmark \
  postgres:15
```

### Run the Rust benchmark

```bash
# Build in release mode (required for fair comparison)
cargo build --release --example benchmark_csv_postgres_xml \
  --features csv,xml,rdbc-postgres

# Run and measure peak RSS
/usr/bin/time -v \
  cargo run --release --example benchmark_csv_postgres_xml \
    --features csv,xml,rdbc-postgres \
  2>&1 | tee rust_bench.log

# Extract key metrics
grep -E "Step|SUMMARY|Maximum resident" rust_bench.log
```

### Run the Java benchmark

```bash
cd benchmark/java

# Requires Java 25 + Maven 3.9+
# Build fat JAR (Spring Boot 4.0.3 / Spring Batch 6.x)
mvn package -q -DskipTests

# Run with GC logging, virtual threads, and RSS measurement
/usr/bin/time -v java \
  -Xms512m -Xmx4g \
  -XX:+UseG1GC \
  -Xlog:gc*:gc.log \
  -jar target/spring-batch-benchmark-1.0.0.jar \
  --spring.datasource.url=jdbc:postgresql://localhost:5432/benchmark \
  2>&1 | tee java_bench.log

# Parse GC summary
grep "Pause" gc.log | tail -20
grep "Maximum resident" java_bench.log
```

<Aside type="note">
**Truncate the table** between runs to avoid primary key conflicts:

```sql
TRUNCATE TABLE transactions;
```

The Rust benchmark does this automatically on each run. For Java, run the SQL manually or
set `spring.sql.init.mode=always` to re-create the table on startup.
</Aside>