jsonl_logger 70.70.70

Async queue-based structured logging with JSONL output
Documentation
# DOMAIN_RULES.md — jsonl_logger.rs

## 1. Environment Coupling

The library couples to environment variables via `LoggerConfig::from_env()` which calls
`dotenvy::dotenv().ok()` before reading vars. Config is loaded eagerly on `init()`,
not lazily on first log call. This design ensures the logger is fully configured
before any log entry is written.

| Aspect | Choice | Rationale |
|--------|--------|----------|
| Timing | `init()` call (explicit) | Caller controls when config is loaded; library is importable without raising |
| Loading | `dotenv().ok()` inside `from_env()` | Loads from .env file; `.ok()` means missing .env is not fatal when env vars are already set |
| Access | `LoggerConfig` struct | Single source of truth, no scattered `env::var()` calls at log call sites |

**Trade-offs:**
- Testability: Mock via `unittest.patch` on module globals or subprocess with patched env
- Runtime reconfiguration: Not supported — config is cached after first validation
- Multiple log directories: Supported — each `Logger::new(config)` creates its own writer thread

## 2. Tier Assignments

| Function | Tier | Reason |
|----------|------|--------|
| `log_info()` | 1 | Primary audit trail — silent failure corrupts all downstream debugging |
| `log_warn()` | 1 | Warning signals approaching-limit conditions — silent failure causes monitoring gaps |
| `log_error()` | 1 | Failure recording — silent failure hides incidents from on-call engineers |
| `log_metric()` | 1 | Metrics emission — silent failure corrupts monitoring dashboards (Grafana, Datadog) |
| `send_notification()` | 1 | Cross-module operational signals — silent failure loses deployment/alert records |
| `send_notification_async()` | 1 | Async variant — same stakes as send_notification |
| `flush_logs()` | 1 | Shutdown guarantee — silent failure loses all buffered entries |
| `run_performance_test()` | 3 | Dev/debug benchmark — failure is operator-visible, no production impact |
| `resolve_logfile_name()` | 2 | Caller detection — wrong detection misroutes logs to wrong JSONL file |
| `get_source_file_with_fallback()` | 2 | Source detection — wrong source breaks dual-source tracking |
| `get_timestamps()` | 2 | Timestamp formatting — wrong format breaks all downstream log parsers |
| `get_log_path()` | 2 | Path construction — wrong path means silent data loss |
| `with_file_retry()` | 2 | Core resilience layer for disk writes — failure always visible via re-raise |
| `flush_buffer()` | 2 | Bridge between in-memory buffers and disk — double-write or silent drop loses data |
| `writer_worker()` | 2 | Only path from log call to writer thread — wrong tuple shape causes silent data loss |
| `flush_logs()` | 2 | Last-mile guarantee — missed shutdown flag or join loses all in-flight logs |

## 3. Logging Strategy

This module IS the logging system. It writes directly to files via `std::fs::OpenOptions`
with append mode. No external logging crate (tracing, log, etc.) is used.

The writer thread logs to stderr when the channel disconnects unexpectedly — structured
message with context about the failure.

`with_file_retry()` prints to stderr when `debug_print` is enabled — development-only
visibility, disabled in production.

Buffer overflow prints a warning to stderr when oldest entries are dropped.

## 4. Environment Variables

| Variable | Required | Default | Purpose |
|----------|----------|---------|---------|
| `PROJECT_DIRECTORY` | Yes || Root directory for log output |
| `LOGS_LOCAL_TIMEZONE` | Yes || Local timezone for dual-timestamp (e.g. Asia/Kolkata) |
| `LOGGER_FILE_NAME` | No | `LOGS` | Default log file name for grouping |
| `LOGGER_BUFFER_SIZE` | No | `1000` | Entries per buffer before auto-flush |
| `LOGGER_FLUSH_INTERVAL_MS` | No | `500` | Periodic flush interval in milliseconds |
| `LOGGER_RETRY_MAX_ATTEMPTS` | No | `3` | Max retry attempts for disk writes |
| `LOGGER_RETRY_BACKOFF_BASE_MS` | No | `100` | Base delay for exponential backoff (ms) |
| `LOGGER_MAX_BUFFER_SIZE` | No | `50000` | Per-file buffer cap before oldest entries dropped |
| `LOGGER_DEBUG_PRINT` | No | `false` | Print retry debug info to stderr |

## 5. Config Files

No config files. All configuration is via environment variables (loaded from `.env`
by `from_env()`, or set directly in the process environment for `from_vars()`).

## 6. Error Classes

No custom error classes. Failures raise builtin types:

| Class | Raised when |
|-------|-------------|
| `Err(String)` | `PROJECT_DIRECTORY` not set — returned by `LoggerConfig::from_env()` |
| `Err(String)` | `PROJECT_DIRECTORY` path does not exist |
| `Err(String)` | `PROJECT_DIRECTORY` path is not writable |
| `Err(String)` | `LOGS_LOCAL_TIMEZONE` not set in .env |
| `panic` | Writer thread fails to spawn |
| `panic` | `global_logger()` called before `init()` |

## 7. Retry Policy

| Function | Retries | Max Attempts | Retryable Errors | Delay |
|----------|---------|--------------|------------------|-------|
| `flush_buffer()` via `with_file_retry()` | Yes | 3 | Any `std::io::Error` during file write | Exponential backoff + ±25% jitter |

All exceptions are retryable — local file IO has no 4xx-equivalent.
Retry rationale: Transient disk/NFS blips can cause write failures; 3 attempts with
exponential backoff and jitter recovers from spikes without hiding real bugs.

On exhaustion: lines are re-buffered. If the combined buffer exceeds
`LOGGER_MAX_BUFFER_SIZE`, oldest entries are dropped with a stderr warning.

**Non-idempotent note:** `log_error()` dual-writes the same line to two files.
Retry at the `log_error()` level would produce duplicate entries in both files.
Retry belongs only in `flush_buffer()`.

## 8. Domain-Specific Test Scenarios

### log_info() / log_warn() / log_error()
- Explicit `logfile_name` must route to correct log file
- Auto-detected `module_name` and `source_file` must be non-empty
- Empty message must log without error
- Special characters and emoji must be preserved intact
- 10,000-char message must not be truncated
- Dual-source tracking: `logfile_name`, `module_name`, and `source_file` all present

### log_error() dual-write
- Both audit (`{module}.jsonl`) and errors (`{module}.errors.jsonl`) must receive identical content
- Info and warn calls must NOT reach the errors file

### log_metric()
- Must write to `.metrics.jsonl` only
- Float and integer values must be stored without coercion
- Tags must be attached as queryable structured fields
- Must NOT write to audit log
- Zero and negative values must be accepted (not filtered)
- `metric_name` must be stored as a structured field, not only in message

### send_notification()
- Must write to `main_logger.jsonl`
- `source_file` must reflect the actual calling module
- Empty message must write without error
- Emoji and unicode must be preserved intact

### send_notification_async()
- Must be an async function
- Must delegate to `send_notification()` via `spawn_blocking`
- Must complete without blocking the Tokio event loop

### with_file_retry()
- Success on first attempt must return Ok without delay
- Success on third attempt (transient failures) must return Ok
- Total exhaustion must return Err with last error message
- Any exception type must be retried up to max attempts
- Backoff delays must grow exponentially (with jitter, check approximate ratio)

### flush_buffer()
- No-op on empty buffer — no `open()` call made
- Buffer must be cleared after successful write
- Lines must be re-buffered on retry exhaustion
- Oldest entries must be dropped when buffer exceeds cap
- Stderr warning must be emitted when buffer cap exceeded
- Must create missing parent directory before writing — midnight rollover support

### writer_worker / Logger
- `flush_logs()` must block until all buffers are confirmed on disk (ack pattern)
- `Drop` must flush pending entries when Logger goes out of scope without explicit `flush_logs()`
- `Drop` on a clone must NOT shut down the writer when other clones still exist —
  only the last clone (Arc strong_count == 1) sends Shutdown
- Periodic flush must trigger without Shutdown — long-running processes rely on timer
- Buffer size threshold must trigger immediate flush without waiting for timer
- All per-file buffers must be flushed on Shutdown — not just the last-written file

### resolve_logfile_name()
- Explicit param must win over `LOGGER_FILE_NAME` env var
- `.rs` and `.py` extensions must be stripped from explicit param
- Falls back to `"LOGS"` when neither param nor env var is set

### get_log_path()
- Audit suffix `""``{module}.jsonl`
- Errors suffix `".errors"``{module}.errors.jsonl`
- Warn suffix `".warn"``{module}.warn.jsonl`
- Metrics suffix `".metrics"``{module}.metrics.jsonl`
- Date folder must be `YYYY_MM_DD` format with underscores (10 chars)
- `.rs` and `.py` extensions must be stripped from `logfile_name`
- `_LOGS_DIRECTORY` directory must be present in path
- `LOGS` subdirectory must be present after date

### get_timestamps()
- UTC must end with `Z` (UTC marker)
- Local must have offset (e.g. `+0530` or `-0800`)
- UTC must match ISO-8601 with millisecond precision
- Real clock must be reflected, not a stale cached value

### get_source_file_with_fallback()
- Must always return a non-empty string
- Cached value must be stable across calls
- `"unknown_source"` is the fallback when env var is not set

### Global singleton (OnceLock)
- `init()` must be idempotent — second call returns same instance, no second writer thread
- `global_logger()` must return same pointer on every call after `init()`
- `global_logger()` before `init()` must panic with message containing
  `"jsonl_logger not initialised"`

### Concurrent logging
- Multiple concurrent threads writing must produce correct number of valid
  JSONL lines — no corruption, no dropped entries, no interleaved partial lines
- Mixed concurrent writes (`log_info`, `log_error`, `log_metric`) must land in
  correct files with correct counts — no cross-contamination between file types