wbt 0.3.2

Weight-based backtesting engine for quantitative trading
# wbt v0.3.0

> Release date: 2026-05-29
> Crate tag: `crate-v0.3.0` · Python tag: `v0.3.0`

## Summary

MINOR release (0.x) with a **BREAKING** plotting API change, plus the previously-staged
additive `is_good_strategy` method.

1. **New `BacktestResult` standard data model** (`wbt.result`) — a single, plot-ready,
   JSON-serializable object that precomputes everything the plotting layer needs, so
   plotting functions perform **zero data transformation**. Built via `wb.to_result()`.
2. **All `wbt.plotting` functions now take `BacktestResult`** instead of raw DataFrames/dicts
   (**BREAKING**). The duplicated chart family under `wbt.report._plot_backtest` and
   `LongShortComparisonChart` (which re-ran 4 backtests) are **removed**.
3. **New Rust pairs aggregation + key trades**`aggregated_pairs()` / `key_trades(top)`
   return Arrow IPC tables; surfaced as `wb.aggregated_pairs` / `wb.key_trades(top)`.
4. `wb.is_good_strategy(...)` — objective strategy verdict (details below; unchanged from the
   prior 0.2.4 draft).

Design doc: <https://s0cqcxuy3p.feishu.cn/wiki/JwQjwdN4iiycLBkR9VCceHf4nih>

## BacktestResult & plotting refactor (BREAKING)

### New `wbt.result.BacktestResult`

`wb.to_result(target_vol=0.20) -> BacktestResult`. Eager fields computed once:
`dates`, `year_starts`, `curves` (`dict[str, Curve]`, keys 多空/多头/空头/基准/超额),
`return_dist`, `monthly`, `symbol_returns`, `pairs_dist`, `stats`, `stats_by_side`.
Lazy `cached_property` fields: `curves_voladj` (volatility-normalized curves, kept
**separate** from the raw `curves`), `drawdowns`, `key_trades`, `verdict`.
`to_dict(full=False)` returns a JSON-safe structure (numpy→list, datetime→ISO) for
serving the strategy-review page over HTTP.

Unit convention: ratios stay as raw decimals (axes use `tickformat=".1%"`); only `*_pct`
fields are percentages. Exported dataclasses: `Curve`, `ReturnDist`, `MonthlyHeatmap`,
`SymbolReturns`, `PairsDist`, `KeyTrade`, `KeyTrades`.

### Rust: pairs aggregation + key trades

`src/core/key_trades.rs` (new). Aggregates raw pairs by `(symbol, 开仓时间, 平仓时间)`
(LIFO partial fills collapse into one logical trade), recording `count`; pnl is the
volume-weighted mean `profit_bp` (BP); `hold_bars` is the shared value (not summed).
`key_trades(top)` buckets aggregated trades by **close-time year** and takes best/worst
`top` each as a flat table with `year` / `kind` columns. Exposed via
`PyWeightBacktest::aggregated_pairs` / `key_trades`, returned as Arrow IPC; Python facades
`wb.aggregated_pairs` / `wb.key_trades(top)`.

### Plotting API migration (BREAKING)

All `wbt.plotting` functions now accept `result: BacktestResult`, and are **all
single-purpose figures — no composite (subplot) charts remain**. A composite
chart locks panel sizing/ratios inside one plotly figure, which prevents the HTML
report from laying panels out freely; splitting them into single figures lets the
report compose them into a CSS grid (see below).

Current function set:
`plot_cumulative_returns(result, keys=..., voladj=...)` (`voladj=True` reads the
volatility-normalized `curves_voladj`), `plot_drawdown(result, key=...)` (dual-axis
single figure, kept), `plot_daily_return_dist`, `plot_monthly_heatmap`,
`plot_symbol_returns`, `plot_pairs_pnl_dist`, `plot_pairs_hold_dist`,
`plot_colored_table`, `plot_stats_comparison`, `plot_key_trades`,
`plot_drawdowns_table`, `plot_verdict`.

#### Composite charts removed (BREAKING)

- `plot_backtest_overview` (deleted) → its panels are the existing `plot_drawdown`
  + `plot_daily_return_dist` + `plot_monthly_heatmap`.
- `plot_long_short_comparison` (deleted) → `plot_cumulative_returns` (raw) +
  `plot_cumulative_returns(voladj=True)` + new `plot_stats_comparison` (the
  多空/多头/空头/基准/超额 metric-comparison table).
- `plot_pairs_analysis` (deleted) → new `plot_pairs_pnl_dist` + `plot_pairs_hold_dist`.

#### `HtmlReportBuilder.add_chart_grid_tab` (new)

Lays multiple single figures inside one tab as a responsive CSS grid
(`add_chart_grid_tab(name, charts, cols=2, ...)`; items may span full width).
`generate_backtest_report` now composes the report from single figures across three
grid tabs (回测概览 / 多空对比 / 交易分析) instead of three composite figures. The
tab-resize script resizes **all** plotly divs in a pane (was: only the first).

### Removals (BREAKING)

- `wbt.report._plot_backtest` module (deleted) — including `plot_backtest_stats`,
  `plot_drawdown_analysis`, `plot_daily_return_distribution`, and the duplicated
  preprocessing helpers `_calculate_drawdown` / `_create_monthly_heatmap_data` /
  `_add_sigma_lines` / `_add_year_boundary_lines`.
- `wbt.report._generator.LongShortComparisonChart` (deleted) — the long/short panel no
  longer re-runs 4 backtests; it consumes `result.curves` / `result.curves_voladj` /
  `result.stats_by_side` from a single backtest.
- `_normalize_stats_for_czsc_view` (deleted) — `get_performance_metrics_cards` now reads
  the Chinese long-name `stats` keys directly.

### Migration

```python
result = wb.to_result()
plot_cumulative_returns(result, keys=["多空", "多头", "空头"])  # was: (wb.daily_return, cols=...)
plot_cumulative_returns(result, voladj=True)                     # vol-normalized panel
plot_stats_comparison(result)                                    # was: plot_long_short_comparison table panel
plot_pairs_pnl_dist(result); plot_pairs_hold_dist(result)        # was: plot_pairs_analysis
```

`generate_backtest_report(df, ...)` is unchanged at the call site.

---

## (Carried) `WeightBacktest.is_good_strategy(...)`

> Originally staged for 0.2.4; ships in 0.3.0. Additive, no breaking change.

### `WeightBacktest.is_good_strategy(...)`

Rust core in `src/core/is_good_strategy.rs`; exposed via PyO3 in
`src/lib.rs::PyWeightBacktest`; Python facade in `python/wbt/backtest.py`.

- `mode="history"` — pass when **every complete calendar year** (>= `min_year_days` trading days)
  satisfies `abs_return > ε` OR `vol-normalized long-alpha > ε`, AND the full-sample
  long-alpha max drawdown is below `max_dd_threshold`.
- `mode="recent"` — pass when the last `recent_days` window satisfies the same return condition
  AND its long-alpha max drawdown is strictly less than both `max_dd_threshold` and the
  **history-excluding-recent** max drawdown. The two windows are guaranteed disjoint; if the
  history segment is shorter than `min_history_days`, `history_window_empty=true` and
  `is_good=false`.

**Tunable parameters with defaults**: `target_vol=0.20`, `max_dd_threshold=0.20`,
`min_year_days=120`, `recent_days=252`, `min_history_days=60`.

**Returned dict** uses English `snake_case` keys, stable alphabetical order. `history` and
`recent` modes return **disjoint** key sets (dispatch on `mode`). Shared keys: `mode`,
`is_good`, `reason`, `alpha_degenerate`. Independent from the existing Chinese-keyed
`*_stats` family; does **not** participate in `STATS_FIELD_ORDER`.

### Degenerate-input handling (no false-positive `is_good=true`)

When the vol-normalization cannot be defined (NaN/Inf in inputs, or `long_vol < 1e-12`,
or `bench_vol < 1e-12`), the result dict carries `alpha_degenerate=true`, all alpha-derived
fields are `None` (e.g. `history_alpha_max_drawdown`,
`history_alpha_max_drawdown_excl_recent`, `recent_alpha_max_drawdown`,
`recent_alpha_return`), and `is_good=false`. This prevents the historical false-positive
where `bench_vol≈0` made `cond_history_dd_passed` trivially true via an all-zero alpha
series.

### Strict input validation (`WbtError::InvalidInput`)

- Empty `date_keys`, length mismatch across the four parallel arrays, invalid YYYYMMDD
  `date_key`, `recent_days=0` (in recent mode), non-positive `target_vol` /
  `max_dd_threshold`, or NaN/Inf in `strategy_daily` → returns the new
  `WbtError::InvalidInput` variant (Python: `Exception` with `"invalid input: ..."` prefix).
- Previously these conditions either panicked at index-OOB sites or were silently
  absorbed into a 1970-01-01 fallback (via the legacy `date_key_to_naive_date` helper).

### Helper: `hashmap_to_pydict` deterministic key order

`src/lib.rs::hashmap_to_pydict` now sorts top-level keys alphabetically before inserting
into the returned `PyDict`; nested `Value::Object` (e.g. `yearly_metrics` entries) are
sorted analogously inside `value_to_py`. The helper still recursively handles
`Value::Bool` / `Array` / `Object` introduced in the original PR.

## Tests

- Rust: 7 new unit tests in `src/core/key_trades::tests` (empty / same-key merge / distinct
  close / per-year best-worst / top-exceeds / close-year grouping / df schema) plus the
  existing `is_good_strategy` suite. Full Rust suite: **200 passed**.
- Python: new `python/tests/test_result.py` (18 tests — DTO field shapes, unit conventions,
  cached_property laziness, `to_dict(full=True)` JSON-safety, vol-norm/monthly numerical
  regression); `test_plotting.py` rewritten to the `BacktestResult` contract — now all
  single-purpose figures, no subplots (27 tests, incl. a pnl-axis unit regression);
  `test_generate_backtest_report.py` updated for the CSS-grid generator. Full Python suite:
  **259 passed / 1 skipped** (no regression).
- `python/scripts/quick_start.ipynb` executes end-to-end (now self-contained via
  `mock_weights`).

## Compatibility

- **BREAKING (plotting)**: every `wbt.plotting.plot_*` function now takes a `BacktestResult`
  (`wb.to_result()`) instead of raw DataFrames/dicts; `wbt.report._plot_backtest`,
  `LongShortComparisonChart`, and `_normalize_stats_for_czsc_view` are removed. Callers
  migrate via `result = wb.to_result()` (see Migration above). No transition shim is kept
  (hard switch; 0.x).
- `WeightBacktest` core compute, `stats`, `daily_return`, `dailys`, `pairs`, `alpha`,
  `segment_stats`, `is_good_strategy` are unchanged. New additive members: `to_result`,
  `aggregated_pairs`, `key_trades`.
- All `*_stats` callers see deterministic
  alphabetical key order in the returned dict — Python `_reorder_stats` re-imposes
  `STATS_FIELD_ORDER` over that, so existing user code is unaffected.
- Python: requires ≥ 3.10 (unchanged).
- Rust: edition 2024, `pyo3 = 0.28` / `numpy = 0.28` (unchanged).
- New `WbtError::InvalidInput` variant: additive (no existing match-arm needs to change
  because all consumers go through `Display` to `PyException::new_err`).

## Release checklist outcome

- **§1 SemVer**: 0.x MINOR bump (`0.2.3 → 0.3.0`). The plotting API change is breaking;
  in 0.x a breaking change is signalled by a MINOR bump and explicitly flagged here.
- **§2 Lint/typecheck**: `cargo fmt --check`, `cargo clippy --all -D warnings -A non_snake_case`,
  `cargo test --lib` (**200 passed**), `ruff format --check`, `ruff check`, `basedpyright` all clean.
  `cargo publish --dry-run` packaging is clean (91 files); the `--dry-run` *verify build* fails to
  link on macOS arm64 (`__Py_NoneStruct` undefined) — this is the expected pyo3 `extension-module`
  cdylib limitation on macOS (symbols resolved at runtime by the interpreter) and is **not a release
  blocker**: the crate is published on Linux CI (`release-crate.yml`), where undefined cdylib symbols
  are permitted, as proven by every prior `crate-v*` release.
- **§3 Tests**: see above.
- **§4 LLM review**: two passes. (a) Deep review of the `BacktestResult` + Rust pairs-aggregation
  work — 15 findings (panic paths, false-positive conditions, semantic mismatches, contract drift),
  all fixed. (b) Pre-release gate review of the chart-split work — 1 high (`H1`: `plot_pairs_pnl_dist`
  applied `tickformat=".1%"` to the `pnl_pct` *percentage* field, a 100× axis blow-up — **fixed**,
  axis now labelled `(%)` with no fraction formatter, locked by a regression test), 2 medium fixed
  (`M1` plotly.js inlining is now robust to a failing first panel; `M3` `HtmlReportBuilder.render()`
  auto-finalizes pending chart tabs), 1 low fixed (`L1` panel-error HTML is now escaped). Non-blocking
  noted: `M2` the two duplicate 年化收益 metric cards are pre-existing and test-locked; `L2`
  `month_win_rate` excludes zero-trade months by design.
- **§5 doc-vs-code (four-way consistency)**: `wbt/__init__.py` exports `BacktestResult` &
  the DTO dataclasses; `_wbt.pyi` adds `aggregated_pairs` / `key_trades`; `README.md` /
  `README_CN.md` / `python/README*.md` "Plotting" sections rewritten to the single-figure
  `BacktestResult` API (composite charts removed); `python/scripts/quick_start.ipynb` migrated to
  `result` inputs, single figures, and made self-contained. `STATS_FIELD_ORDER` unchanged.

## Known issues (carried forward from v0.2.0)

Unchanged; see `docs/release_notes/v0.2.0.md`.