ambers 0.3.7

Pure Rust reader for SPSS .sav and .zsav files
Documentation
# Changelog

All notable changes to ambers are documented in this file.

## [0.3.7] - 2025-02-24

- Fix ZSAV writer: 3 bugs causing SPSS to crash on all ambers-written .zsav files
  - ZTrailer bias field: write -100 (negative) per PSPP spec, was incorrectly +100
  - ZTrailer block uncompressed_offset: start at zheader file position per PSPP/ReadStat, was incorrectly 0
  - Subtype 3 compression_code: always write 1 per PSPP spec, was incorrectly writing actual compression value
- Fix reader subtype 21 (long string value labels): add missing var_width field parse
- Fix writer subtype 21: use long_name instead of short_name, pad values to var_width per SPSS spec
- Add format/type mismatch validation: reject string format on numeric column and vice versa
- Add 36 writer stress tests (pyreadstat issues #267, #119, #264)
- Add CI workflow (fmt, clippy, test on Linux/Windows/macOS + Python smoke test)
- Add unit tests for arrow_convert and scanner modules (15 new tests)
- Add overflow protection: AllocationTooLarge error, 2 GB pre-allocation cap, 16 GB zlib guard
- Split writer.rs (2,930 lines) into writer/{mod, layout, records, data, tests} submodules
- Add fail-fast validation: `validate_write_inputs()` catches metadata errors before data processing
- Add Python-side early metadata validation before PyCapsule consumption
- Stream zlib decompression block-by-block instead of all blocks upfront (lower peak memory)
- Add BytecodeDecompressor checkpoint/restore for streaming support
- Fix 29 clippy warnings across codebase
- Fix CI Python smoke test to use `maturin build` instead of `maturin develop`
- Use uv instead of pip in CI for faster dependency installs
- Update write benchmark results: 6–41x faster than pyreadstat (up from 4–20x)
- Remove Co-Authored-By trailers from git history

## [0.3.3] - 2025-02-21

- Add compression field to `meta.schema` and reorder schema fields
- Fix VLS last segment width to match SPSS spec (ReadStat compatibility)

## [0.3.2] - 2025-02-19

- Optimize writer performance and redesign compression API
- Add NumPy-style docstrings to `.pyi` type stubs for IDE documentation

## [0.3.1] - 2025-02-17

- Fix variable attributes using long names in subtype 18
- Fix subtype 22 format for SPSS-compatible long string missing values
- Fix string missing values on long strings (width > 8) for SPSS compatibility
- Fix MR set double-`$` prefix and mixed-type missing values bugs

## [0.3.0] - 2025-02-14

- **Milestone 3: SAV/ZSAV Writer** — full roundtrip support
- `write_sav()` and `write_sav_to_writer()` in Rust
- Python `ambers.write_sav()` with auto-detect compression from extension
- All three compression modes: uncompressed, bytecode, zlib
- SpssMetadata construction API: `SpssMetadata()` constructor, `update()`, `with_*()` methods
- Variable attributes read and write (subtype 18)
- Variable roles read and write (subtype 18 `$@Role`)
- Subtype 19 (MRSETS2) support for modern SPSS MR set definitions
- Python roundtrip tests and write benchmarks
- Fix VLS segment count formula and ghost name leaking
- Fix A254→A256 format bug

## [0.2.6] - 2025-02-08

- Fix VLS segment assembly: use 255 bytes per segment, not 252

## [0.2.5] - 2025-02-07

- Tiled parallel column processing for wide files (>12 KB row width)
- Bias LUT optimization: pre-computed 2 KB lookup table for bytecode decompression
- Unsafe pointer copies and unchecked f64 reads in hot path

## [0.2.4] - 2025-02-06

- Unified columnar pipeline: decompress bytecode directly to raw buffer
- Eliminate intermediate `SlotValue` representation

## [0.2.3] - 2025-02-05

- Cap uncompressed chunk size to 256 MB for cache-friendly large file reads
- Switch to zlib-rs backend for faster zlib decompression
- Add mimalloc allocator for Python builds
- Direct-write decompression, zero-fill avoidance

## [0.2.2] - 2025-02-04

- Add `columns`, `n_rows`, `row_index_name`, `row_index_offset` params to Python `read_sav()`/`scan_sav()`

## [0.2.0] - 2025-02-03

- Arrow temporal types: DATE→Date32, DATETIME→Timestamp(us), TIME→Duration(us)
- Wkday/Month stay Float64 (not temporal)
- Temporal conversion in `finish()` post-processing (not in hot path)

## [0.1.8] - 2025-02-02

- Optimize large uncompressed file performance: 2.3x faster on 5.4 GB files

## [0.1.7] - 2025-02-01

- Six performance optimizations to beat polars_readstat on all file sizes:
  - Bytecode match reorder (1..=251 first)
  - `Cow<str>` string decoding (zero-copy UTF-8)
  - Bulk I/O for uncompressed (single `read_exact` per row)
  - VLS segment pre-compute
  - Smart string capacity
  - StringViewArray with deduplication
- `scan_sav()` LazyFrame with `register_io_source`
- Direct-to-columnar builders (StringViewBuilder, Float64Builder)
- Drop PyArrow runtime dependency — PyCapsule-only data transfer

## [0.1.6] - 2025-01-31

- Revamp README benchmarks

## [0.1.5] - 2025-01-30

- Initial public release on crates.io and PyPI
- **Milestone 1:** SPSS .sav/.zsav reader (all compression modes)
- **Milestone 2:** PyO3 Python bindings with Polars DataFrame output
- `read_sav()`, `scan_sav()`, `read_sav_metadata()` API
- Full SpssMetadata with 22 fields
- Streaming `SavScanner` with column projection and row limits