# Changelog
All notable changes to tabkit are documented here. The format follows
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and tabkit
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
tabkit is pre-1.0 — the public API surface (`Engine`, `Reader`,
`Table`, `Column`, `Value`, `Error`) is intended to stay stable, but
minor versions may introduce additive changes to feature flags and
auxiliary types until 1.0 lands.
## [Unreleased]
## [0.4.1] — 2026-04-27
### Added
- **`examples/inspect.rs`** — runnable CLI that takes a path to a
tabular file and prints schema + sample rows. Demonstrates the
v0.1 `Engine::read` surface, the v0.3 typed `Date` / `DateTime`
cells, and (with `--features parquet`) the v0.2 Parquet path.
Run with:
```bash
cargo run --example inspect -- /path/to/data.xlsx
cargo run --example inspect --features parquet -- /path/to/data.parquet
```
- README "Examples" section pointing at the new `examples/`
directory.
### Notes
- v0.4.1 is the first of v0.4.x's planned "examples + cookbook"
iteration. Future releases will add a `custom_reader.rs` example
(showing how to implement the `Reader` trait for a new format)
and a `compose_with_duckdb.rs` example (showing the recommended
pattern for SQL queries on tabular data).
- Examples are deliberately dep-light: no `clap` for arg parsing,
no `serde` for output. Reading the surface should not require
wading through unrelated crate ceremony.
## [0.4.0] — 2026-04-27
### API stability candidate (1.0 prep)
v0.4 is the **API stability candidate** for 1.0. Format coverage
closed in v0.3 — calamine + csv + parquet readers, typed
`Date` / `DateTime` cells. v0.4 freezes the public surface ahead
of 1.0 and locks in SemVer commitments. v0.4.x can iterate on
examples, docs polish, and niche reader additions without
changing the public API shape.
### Added
- **Stability section in `lib.rs` module docs** explicitly
enumerates what's covered by the API freeze (Reader trait,
Engine dispatch surface, Table / Column / Value / Error /
ReadOptions / DataType field+variant sets, feature flag names,
backend `name()` strings) and what stays implementation detail
(private reader layout, exact Table.metadata key sets per
backend, auto-registration order).
- README mirrors the same Stability section.
### Changed
- **No API-shape changes.** v0.4.0 is intentionally a
documentation-only release. `#[non_exhaustive]` was already in
place on every public struct + enum (added incrementally across
v0.1 → v0.3); `#[must_use]` was already on every constructor +
builder + accessor. The audit confirmed no gaps.
### Migration
For most callers: bump the dep, rebuild, ship. Zero code changes
required.
### Notes
- **Why no `duckdb` feature.** Documented in
[README §"Why no `duckdb` feature"](README.md#why-no-duckdb-feature):
dep weight (~50 MB), scope creep, duplicate-reader confusion,
cleaner composition with `duckdb` directly. README also adds a
worked "Composing with DuckDB" example showing the recommended
pattern.
- v0.4.x will iterate on **examples** (`examples/` directory),
**cookbook**-style docs ("how to write a custom reader",
"extending the Engine"), and any **niche backend polish** that
doesn't change the public surface.
- 1.0 will be cut once the API is exercised by at least one
downstream production user. Sery Link is the canonical
integration target.
## [0.3.0] — 2026-04-27
### Added
- **Typed `Date` / `DateTime` cell values.** `Value` gains
`Date(String)` and `DateTime(String)` variants whose payloads
are ISO-8601 strings (`YYYY-MM-DD` / `YYYY-MM-DDTHH:MM:SS[.fff]
[±HH:MM|Z]`). No `chrono` dep — the contract is "the string
parses as ISO-8601"; callers wanting typed dates parse the
string with `chrono::NaiveDate::parse_from_str` (or equivalent).
- **`DataType::Date` / `DataType::DateTime`** join the inferred
type set. Type-promotion rule for date widening: `Date + DateTime
→ DateTime`. All other date-mixing falls back to `Text`.
- **All three readers emit typed dates** for source values that
carry date semantics:
- **calamine**: `Data::DateTime` → `Value::DateTime`;
`Data::DateTimeIso` → `Value::Date` if the string matches
`YYYY-MM-DD` exactly, otherwise `Value::DateTime`.
- **csv**: rigid pattern-match on cell strings —
`YYYY-MM-DD` (10 chars, dashes at positions 4 / 7) → `Date`;
`YYYY-MM-DDTHH:MM:SS` or `YYYY-MM-DD HH:MM:SS` (≥19 chars,
optional fractional + timezone tail) → `DateTime`. Other date
dialects (`MM/DD/YYYY`, etc.) fall through to `Text` — caller
can post-process.
- **parquet**: `Field::Date(i32)` → `Value::Date` (parquet's
own `Display` impl emits `YYYY-MM-DD`); `Field::TimestampMillis`
/ `TimestampMicros` → `Value::DateTime` (parquet's `Display`
emits `YYYY-MM-DD HH:MM:SS`; we replace the space separator
with `T` to conform to our ISO contract).
### Changed
- **`#[non_exhaustive]` on `Value`** so future variants
(`Decimal`, `Time` for date-less time-of-day, etc.) can land in
minor versions without breaking downstream `match` blocks.
External pattern-matchers must include a wildcard arm.
- The `infer_column_type` promotion rules are extracted into a
small `promote()` helper. Same outputs as v0.2; just easier to
read once date-widening was added.
### Migration
For most callers: bump the dep, rebuild. The only breaking change
is `#[non_exhaustive]` on `Value` — callers exhaustively matching
on `Value` in another crate need a wildcard arm:
```rust
match value {
Value::Null => ...,
Value::Bool(b) => ...,
// ... existing variants
_ => panic!("new tabkit Value variant — check the changelog"),
}
```
Callers that consumed `Date` / `DateTime` cells as `Value::Text`
will now see them as `Value::Date` / `Value::DateTime` instead.
Both carry `String` payloads; the migration is `match value
- 5 new csv tests covering: ISO-8601 date detection, datetime
with `Z` / `.fff` / `±HH:MM`, space-separated form, rejection
of `MM/DD/YYYY` and other non-ISO dialects, column-level Date
inference, Date+DateTime widening to DateTime.
## [0.2.0] — 2026-04-27
### Added
- **`ParquetReader`** — Apache Parquet read support, gated behind
the new `parquet` feature. Backed by the
[`parquet`](https://crates.io/crates/parquet) crate (default
features off — we don't need the Arrow runtime, async reader, or
CLI helpers for the schema-and-samples surface).
- **`parquet` feature flag** — opt-in. Not part of `default` so
consumers reading only XLSX/CSV don't pay the extra ~3 MB
compile cost. The `full` feature enables it alongside calamine
+ csv.
- **Parquet `Field` → tabkit `Value` mapping** documented in the
module-level docs as a table. Highlights:
- `Byte` / `Short` / `Int` / `Long` and `UByte` / `UShort` /
`UInt` → `Integer`
- `ULong` (≤ `i64::MAX`) → `Integer`; `ULong` (> `i64::MAX`) →
`Text` (decimal stringified, so the magnitude survives the
JSON round-trip)
- `Float` / `Double` → `Float` (lossless `f32`→`f64` widening)
- `Decimal` / `Date` / `Timestamp*` / `Bytes` / `Group` / list /
map → `Text` (parquet's `Display` form). Typed dates land in
a future `dates` feature; nested types in a future `nested`
feature.
- **Parquet metadata** surfaced via `Table.metadata`:
`num_row_groups` (parquet's row-group count, useful for
diagnostics on large files).
- 5 new unit tests covering: extensions, name, missing-file →
`Error::Io`, invalid-content → `Error::ParseError`, basic
field-to-value mapping, `ULong` overflow → `Text` fallback.
### Notes
- **Why not the full Arrow runtime?** The `parquet` crate's
default feature set pulls in `arrow-array` + `arrow-buffer` +
several other Arrow crates that together weigh ~10 MB compiled.
tabkit's row-level reader API doesn't need any of that. If a
future tabkit feature wants to expose Arrow-typed batches
(e.g. for zero-copy hand-off to DuckDB), that'd be its own
feature with the heavier dep set.
- **`row_count` semantics** match the calamine + csv readers:
`Some(n)` when known, where `n` excludes any header. Parquet
has no header concept — every row is data — so `n` is the
whole-file row count.
- **Streamed/unknown writers**: parquet's `num_rows` can be `-1`
in some edge cases. We clamp to `0` rather than surfacing a
signed integer in the public contract.
## [0.1.0] — 2026-04-27
### Added
- Initial release. Establishes the crate name on crates.io and the
public API surface.
- **`Engine`** — the dispatcher. Routes `read(path, options)` calls
to the registered `Reader` for the file's extension.
- **`Reader` trait** — per-format integration point. Implementors
declare `extensions()`, `name()`, `read(path, options)`.
- **`Table`** — the unit of output. `columns` + `sample_rows` +
optional `row_count` + backend-specific `metadata`.
- **`Column`** — name + inferred `data_type` + `nullable` flag.
- **`Value`** — six narrow variants: `Null` / `Bool` / `Integer` /
`Float` / `Text`. JSON-round-trippable for clean Tauri IPC.
- **`DataType`** — `Bool` / `Integer` / `Float` / `Text` / `Unknown`.
Type inference promotes Integer + Float → Float, anything-mixed
→ Text, all-null → Unknown.
- **`ReadOptions`** — `max_sample_rows` (default 100), `sheet_name`
(multi-sheet XLSX picker), `has_header` (default `true`).
- **`CalamineReader`** — XLSX / XLS / XLSB / XLSM / ODS via the
[`calamine`](https://crates.io/crates/calamine) crate. Detects
whole-number floats (Excel stores `1` as `Float(1.0)`) and
demotes them to `Integer` for the schema.
- **`CsvReader`** — CSV / TSV via the
[`csv`](https://crates.io/crates/csv) crate. Tab vs comma
selected by extension. Tolerates ragged rows (pads with
`Value::Null`). Handles empty header cells (falls back to
`column_<idx>`). Headerless mode generates `column_<i>` names.
- **Typed `Error` enum** — `Io` / `UnsupportedFormat` /
`ParseError` / `SheetNotFound`. `#[non_exhaustive]` for
forward-compat.
- **Feature flags pre-declared**:
- `calamine` (default) — XLSX-family reader
- `csv` (default) — CSV/TSV reader
- `full` — both
- 30 unit tests covering: type inference (all-int, int+float,
int+text, with-null, all-null, empty), CSV happy path, headerless
mode, ragged rows, sample cap, Excel float-to-int demotion,
sheet not-found, missing files, parse-cell rules.
- Dual-licensed under MIT OR Apache-2.0 (Rust ecosystem
convention).
- CI workflow on Ubuntu + macOS + Windows (stable Rust + MSRV
1.85 + clippy + rustfmt + cargo-audit gates) — same template
as `mdkit` and `scankit`.
- `CONTRIBUTING.md`, `SECURITY.md` for repo hygiene.
### Notes
- **Why `unsafe_code = forbid` (not `deny`).** No FFI surface;
every backend is pure Rust. Same posture as `scankit`.
- **Why MSRV 1.85.** Matches the kit-family floor — single MSRV
across `mdkit` / `scankit` / `tabkit` so downstream Tauri apps
don't manage divergent toolchains.
- **Why no `parquet` / `duckdb` backends in v0.1.** Both add
significant compile-time + binary-size cost; consumers reading
only XLSX/CSV shouldn't pay for them. Planned for v0.2 / v0.3
behind opt-in features.
[Unreleased]: https://github.com/seryai/tabkit/compare/v0.4.1...HEAD
[0.4.1]: https://github.com/seryai/tabkit/compare/v0.4.0...v0.4.1
[0.4.0]: https://github.com/seryai/tabkit/compare/v0.3.0...v0.4.0
[0.3.0]: https://github.com/seryai/tabkit/compare/v0.2.0...v0.3.0
[0.2.0]: https://github.com/seryai/tabkit/compare/v0.1.0...v0.2.0
[0.1.0]: https://github.com/seryai/tabkit/releases/tag/v0.1.0