credit-data-simulator 0.2.0

Credit data pipeline simulator — Core Banking, Mapping, Rulepack, Regulator (OJK)
Documentation

Credit Data Simulator

Multi-service credit data pipeline simulator built with Rust and axum. Provides realistic Indonesian banking credit data with NDJSON bulk export, field mapping, validation rules, and regulatory submission endpoints.

Purpose: Benchmarking and overhead measurement. This simulator provides a known-performance baseline (NDJSON: ~2K req/s = 191K records/sec, JSON: ~10.5K req/s = 1M records/sec) so you can accurately measure the processing overhead of any application consuming the data. By comparing direct simulator throughput vs throughput through your application, you isolate exactly how much latency and cost your transform/validation logic adds.

Direct (baseline):   oha -c 200 -n 2000 'http://localhost:18081/api/v1/credits/ndjson?page_size=100'  → X req/s
Via your app:        oha -c 200 -n 2000 'http://localhost:3080/trigger'                                → Y req/s
Your app overhead:   (X - Y) / X × 100%

Built primarily for VIL (Vastar Intermediate Language) NDJSON pipeline examples (005, 007-009, 101-107). Also usable as a standalone mock for any credit/banking data integration testing. Supports deterministic seed-based data generation for reproducible benchmarks (see GUIDE.md).

Services

Service Port Description
Core Banking 18081 Credit records — NDJSON stream, JSON paginated, SSE, filtering, dirty data
Mapping Service 18082 SLIK field mapping versions (v1/v2/v3)
Rulepack Service 18083 Validation rules engine (NIK, amounts, dates, cross-field)
Regulator Endpoint 18084 OJK submission simulator (accept/reject/delay modes)

Quick Start

git clone https://github.com/Vastar-AI/credit-data-simulator.git
cd credit-data-simulator
cargo build --release
./run_simulator.sh

The run_simulator.sh script auto-kills any processes on ports 18081-18084 before starting.

Endpoints

Core Banking (:18081)

Method Path Description
GET /health Health check
GET /api/v1/credits?page=1&page_size=100 Paginated JSON (10,500 req/s)
GET /api/v1/credits/ndjson?page_size=100 NDJSON bulk stream (2,000 req/s)
GET /api/v1/credits/stream SSE streaming
GET /api/v1/credits/count Total record count
GET /api/v1/credits/:id Single record by ID
POST /api/v1/credits/generate Generate new batch
GET /api/v1/stats Simulator statistics
POST /api/v1/reset Reset state

Query Parameters:

  • page / page_size — Pagination (default: 1, 100)
  • cursor — Cursor-based pagination (optimized)
  • cutoff_start / cutoff_end — Date range filter (YYYY-MM-DD)

Mapping Service (:18082)

Method Path Description
GET /api/v1/mappings List mapping versions
GET /api/v1/mappings/:version Get mapping by version
GET /api/v1/mappings/:version/fields Get field mappings
POST /api/v1/mappings Create custom mapping

Pre-loaded: v1 (SLIK-2023), v2 (SLIK-2024), v3 (SLIK-2024-REG)

Rulepack Service (:18083)

Method Path Description
GET /api/v1/rulepacks List rulepack versions
GET /api/v1/rulepacks/:version/rules Get rules
POST /api/v1/rulepacks/:version/validate Validate data against rules

Rule types: required, length, range, pattern, enum, date_format, cross-field

Regulator Endpoint (:18084)

Method Path Description
POST /api/v1/submit Submit credit data
GET /api/v1/submissions List submissions
GET /api/v1/submissions/:id Submission status
POST /api/v1/mode Change response mode (Accept/Reject/Delay)

Performance

System: Intel i9-11900F (8C/16T), 32GB RAM, Ubuntu 22.04, Rust 1.93.1

Endpoint req/s records/s P50 P99 Notes
Health check 52,744 0.3ms 27ms Baseline
JSON 100 rec/page 10,591 1,059K 16ms 24ms Pre-serialized array
NDJSON 100 rec/page 1,910 191K 96ms 202ms Per-line streaming
NDJSON 1000 rec/page 1,935 1,935K 97ms 190ms 10x records, same req/s
Mapping fields 62,347 0.5ms 23ms In-memory lookup
Rulepack rules 73,154 0.3ms 21ms In-memory lookup

NDJSON throughput: ~191K records/sec (100 rec/page) or ~1.9M records/sec (1000 rec/page).

Full benchmark: PERFORMANCE_REPORT.md

Data Model

Each credit record contains realistic Indonesian banking data:

{
  "id": "CR0000000000",
  "nik": "8272346211567351",
  "nama_lengkap": "Queena Simanjuntak",
  "jenis_fasilitas": "KKB",
  "jumlah_kredit": 8804025599,
  "mata_uang": "IDR",
  "suku_bunga_bps": 579,
  "tanggal_mulai": "2023-06-05",
  "tanggal_jatuh_tempo": "2026-06-16",
  "saldo_outstanding": 6933193728,
  "kolektabilitas": 3,
  "kode_cabang": "BDG001",
  "account_officer": "AO0835",
  "last_updated": "2024-01-01T10:00:00Z"
}

Dirty data injection: Configurable ratio (0-100%) with error types: invalid NIK, negative amounts, invalid dates, missing fields, invalid currency, invalid collectability, outstanding > plafon.

Data Generator (CLI)

Generate NDJSON files offline:

# 100K records, 25% dirty
./target/release/datagen -c 100000 -d 0.25 -o credits.ndjson

# Load into running simulator
./target/release/datagen -c 50000 -d 0.1 --load-to http://localhost:18081

Testing

# Health
curl http://localhost:18081/health

# NDJSON stream (first 3 records)
curl 'http://localhost:18081/api/v1/credits/ndjson?page_size=3'

# JSON paginated
curl 'http://localhost:18081/api/v1/credits?page=1&page_size=10'

# Count
curl http://localhost:18081/api/v1/credits/count

# Mapping fields
curl http://localhost:18082/api/v1/mappings/v1/fields

# Validation rules
curl http://localhost:18083/api/v1/rulepacks/v1/rules

# Benchmark
oha -c 200 -n 2000 'http://localhost:18081/api/v1/credits/ndjson?page_size=100'

Project Structure

credit-data-simulator/
├── src/
│   ├── main.rs                 # Entry point — starts all 4 services
│   ├── lib.rs                  # SimulatorServer, common types, traits
│   ├── config.rs               # SimulatorConfig with per-service settings
│   ├── core_banking.rs         # Credit data: NDJSON, JSON, SSE, filtering
│   ├── mapping_service.rs      # SLIK field mapping versions
│   ├── rulepack_service.rs     # Validation rules engine
│   ├── regulator_endpoint.rs   # OJK submission simulator
│   ├── engine.rs               # Engine + Admin simulators
│   ├── models/                 # Credit record model + data generation
│   └── bin/datagen.rs          # CLI data generator
├── run_simulator.sh            # Start script (auto port cleanup)
├── stop_simulator.sh           # Stop script
├── PERFORMANCE_REPORT.md       # Benchmark results
├── Cargo.toml
└── .gitignore

License

MIT OR Apache-2.0

Links