transactor 0.1.4

A toy (yet intentionally robust) transaction processing engine and accounting system with support for deposits, withdrawals, disputes, resolutions and chargebacks.
Documentation

Design Philosophy

  • Streaming I/O: Designed to handle files larger than available RAM. It processes transactions row-by-row using buffered readers, maintaining a compact memory footprint.
  • Dependency Austerity: Dependencies are kept as minimal as possible in order to avoid bloating the project and potential security risks.
  • Error Safety: All business logic errors and I/O failures are logged into stderr with configurable log levels, ensuring stdout remains a clean data stream for downstream pipes or file redirection.

Basic Usage

The application accepts a single argument: the path to the input CSV file.

cargo run -- transactions.csv > accounts.csv

As per the requirements, adding any extra arguments will make the application fail.

Command Line Interface

Argument Type Description
path_to_transactions_csv Path The path to the CSV file containing transaction data.

Advanced Usage

Thanks to the flexibility of having this crate act as both a CLI tool and a standalone library, the engine can be very easily integrated into very diverse runtimes.

As a proof of this adaptability, I have included some additional examples described below.

Reading Input From stdin

The simple_stdin example reads the transactions input from stdin instead of a file path provided as an argument:

cat transactions.csv | RUST_LOG=trace cargo run --example simple_stdin > accounts.csv

Processing Input From Multiple Clients Over TCP

The tokio_tcp example showcases the concurrent processing of transaction inputs coming from multiple clients over TCP:

# Start the server at 127.0.0.1:8080
RUST_LOG=trace cargo run --example tokio_tcp > accounts.csv

# Use netcat on a different terminal to send the transactions input CSV file over TCP
nc 127.0.0.1 8080 < transactions.csv

# Stop the server (uses SIGINT signal, so Ctrl+C also works)
kill -INT $(pidof tokio_tcp)

Logging

Logging of errors and diagnostics are handled via the log crate abstraction.

To view processing details or debug information without corrupting the CSV output, use the RUST_LOG environment variable:

# View errors only (default)
RUST_LOG=error cargo run -- transactions.csv > accounts.csv

# View detailed processing steps
RUST_LOG=trace cargo run -- transactions.csv > accounts.csv

Assumptions

These are some assumptions that were made on unclear requirements or behaviors that are not clearly defined in the specification:

  • Only deposits and withdrawal transactions can be disputed, resolved and charged back.
  • Disputing a withdrawal transaction must actually violate the "total funds should remain the same" principle.
  • Disputing a deposit should fail if the current balance of the account is less than the disputed amount.
  • The value field of disputes, resolves and chargebacks should be quietly ignored.
  • Input CSV files always contain headers (the first line in a file will be skipped when reading transactions).

Technical Decisions

Note: additional rationale on specific design decisions and code style choices can be found inline in affected modules.

Precision & Arithmetic

While the specification hints the use of floating-point values for monetary amounts, in a real production environment, fixed-point decimals or integer-based "bitcoin-like" math is mandatory to avoid unpredictable rounding errors due to lack of precision.

For this implementation, I am leveraging FixedU64<U14> from the fixed crate to satisfy the required four decimal places of precision.

Data Structures

Client accounts are stored in an IndexMap. This provides $O(1)$ lookup time while maintaining deterministic iteration based on insertion order.

Testing Determinism

To ensure that test vectors remain reproducible across different environments, the test suite utilizes conditional compilation (#[cfg(test)]) to sort account output by client_id before writing.

This deterministic behavior can also be enforced in production by enabling the deterministic feature flag.

Transaction Safety and State Machine

The engine handles the lifecycle of disputes by deriving "movements" from deposit and withdrawal transactions, and storing the movements with the account. This enables retrieving the original transaction amount upon processing a subsequent dispute, resolve or chargeback affecting it.

To ensure financial integrity, the state machine only allows 3 legal transitions:

  • InForceDisputed, upon processing a dispute on an undisputed deposit or withdrawal transaction.
  • DisputedChargedBack, upon processing a chargeback on an already-disputed deposit or withdrawal.
  • DisputedInForce, upon processing a resolve on an already-disputed deposit or withdrawal.

Any other transitions are strictly forbidden.

Testing

The suite includes unit tests for core logic and integration tests for CSV streaming.

# Run all tests
cargo test

To verify the engine against specific test vectors:

cargo run --features deterministic -- tests/test_input.csv > actual_output.csv
diff actual_output.csv tests/expected_output.csv

Dependencies

csv

Provides the main CSV deserialization and serialization functionality. Suggested by the specification.

fixed

Gives us fixed precision floats, which are strictly needed to guarantee 4-digit decimal precision. Feature flag serde-str is needed for deserialization and serialization from and into CSVs. This crate was favored for the simplicity of its API, but the rust_decimal crate would be equally good here.

indexmap

Clever solution to having a transaction history where you can query by key and get the whole list of entries in the original order of insertion, both with $O(1)$ complexity.

thiserror

The gold standard for error handling and formatting these days, better in every way than the old failure.

serde

Needed for deriving deserialization and serialization for transactions. Suggested by the specification.

log

A must for proper logging of runtime errors.

env_logger

Goes hand in hand with log.

Deep Dive

For a more in-depth and always up-to-date deep dive into the inner workings of this repository, you can visit: