bitnet-quantize 0.2.1

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.2.1] - 2026-01-28

### Changed
- Removed `rust-ai-core` dependency
- CUDA feature now enables `trit-vsa/cuda` for GPU types

## [0.2.0] - 2026-01-25

### Changed
- Migrated CubeCL kernels to 0.9 API
- Fixed Result type collision by using std::result::Result explicitly
- Replace early returns with `terminate!()` macro
- Update math functions to use trait methods (`F::floor()`, `F::sqrt()`)
- Added `usize` suffix to SharedMemory::new() calls
- Added proper usize casts at array index sites
- Wrapped kernel launches in unsafe blocks with SAFETY comments

### Known Limitations
- GPU kernel launches not yet integrated into public API (CPU fallback used)
- Kernel definitions ready for future GPU integration

## [0.1.1] - 2026-01-24

### Added
- CPU fallback warning when CUDA is unavailable

### Changed
- Bumped minimum Rust version to 1.92
- Documentation link fixes and formatting cleanup

## [0.1.0] - 2026-01-24

### Added

- `BitNetConfig`: Configuration for BitNet quantization
  - Configurable group size for weight quantization
  - Per-token or per-tensor activation scaling
  - Training mode with STE support
- `TernaryWeight`: Packed ternary weight storage
  - AbsMean quantization: `W_q = round(W / mean(|W|))`
  - Per-group scale factors
  - Compression tracking and sparsity metrics
- `QuantizedActivations`: INT8 activation quantization
  - AbsMax quantization: `X_q = round(X * 127 / max(|X|))`
  - Per-token scaling for sequence models
  - Efficient dequantization
- `BitLinear`: Drop-in replacement for `nn::Linear`
  - Compatible with candle-nn Module trait
  - Supports 2D and 3D input tensors
  - Optional bias term
  - Forward pass with automatic dequantization
  - `forward_quantized` for explicit quantization control
- Straight-Through Estimator (STE) functions
  - `ternary_ste`: Forward quantization with gradient passthrough
  - `int8_ste`: INT8 quantization with gradient passthrough
- peft-rs adapter integration (optional, `peft` feature)
  - `BitNetAdapter` implementing `Adapter` trait
  - Configuration via `BitNetAdapterConfig`
- GGUF export support (optional, `gguf-export` feature)
- CubeCL GPU kernel stubs (optional, `cuda` feature)
- Comprehensive test suite (35 unit tests)
- Criterion benchmarks for quantization and forward pass

### Technical Details

- Built on candle 0.9.x tensor library
- Minimum Rust version: 1.92
- Optional dependencies gated behind feature flags
- Integration with rust-ai workspace

### References

- BitNet b1.58: "The Era of 1-bit LLMs" (Ma et al., 2024)
- Original BitNet: "Scaling 1-bit Transformers" (Wang et al., 2023)

[Unreleased]: https://github.com/tzervas/bitnet-quantize/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/tzervas/bitnet-quantize/releases/tag/v0.1.0