Crate tabkit

Expand description

§tabkit — tabular files → schema + sample rows.

tabkit is the shared spreadsheet reader that Tauri / Iced / native desktop apps reach for when they need to introspect XLSX / CSV / TSV files without reinventing the same calamine- plus-type-inference glue twice.

Its job is small but easy to get wrong:

Open a tabular file by extension (XLSX, CSV, TSV, ODS, …).
Read enough of it to produce a schema (column name + inferred data type) and a sample of rows (first N).
Hand those back as a Table — typed enough for the downstream UI to render and the downstream agent to reason about, but JSON-friendly so it serialises cleanly across the Tauri IPC boundary.

What tabkit deliberately does NOT do:

SQL queries. Different consumers want different SQL engines; pick yours and call it directly. tabkit stops at schema + samples.
Full table iteration. Use the underlying crate (calamine, csv) directly if you need to stream every row.
Persistence, caching, change tracking. Those are the consuming application’s concerns — pair tabkit with scankit for walk-and-watch and persist however suits.

§Quick start

use tabkit::{Engine, ReadOptions};
use std::path::Path;

let engine = Engine::with_defaults();
let table = engine.read(
    Path::new("/Users/me/data/sales.xlsx"),
    &ReadOptions::default().max_sample_rows(10),
)?;

for col in &table.columns {
    println!("{} : {:?}", col.name, col.data_type);
}
for row in &table.sample_rows {
    println!("{row:?}");
}

§Why a separate crate

Every “show the user what’s in their spreadsheet” project rebuilds the same calamine wrapper, the same type-inference pass, the same first-row-is-headers guess, the same ragged-row padding. tabkit ships the bits once with the edge cases (empty sheets, headerless CSVs, mixed-type columns) handled in one place.

§Stability commitment (v0.4+)

v0.4 marks the API stability candidate for 1.0. The following surface is committed to and will only change with a major version bump:

The Reader trait shape — required methods, default implementations, Send + Sync bound. Future trait methods will land with default impls so existing implementors don’t break.
Engine construction + dispatch — new, with_defaults, register, read, len, is_empty.
Table field set + the ReadOptions builder methods. Marked #[non_exhaustive] so we can add fields without major bumps.
Column, DataType, Value, Error enums + structs. All #[non_exhaustive] for the same forward-compat reason — pattern-matchers must include a wildcard arm.
Feature flag names: calamine, csv, parquet, full. Each backend’s per-format extension list (xlsx / csv / parquet / etc.) is also stable.
Per-reader name() strings ("calamine", "csv", "parquet") — used by callers for filtering / logging.

The following are implementation details and may change in minor versions:

The internal layout of any specific reader (private fields, helper methods, type-inference heuristics).
The exact set of Table.metadata keys per backend (new keys may appear; documented keys stay).
The auto-registration order in Engine::with_defaults (the fact that the first registered wins for overlapping extensions stays; the specific order doesn’t).

1.0 will be cut once the API is exercised by at least one downstream production user. Sery Link is the canonical integration target.

Structs§

CalamineReadercalamine: XLSX-family reader. Construct via CalamineReader::new (cannot fail — the crate has no runtime dependency).
Column: One column’s name + inferred type. nullable is true if any sample row had a missing/empty value in this position.
CsvReadercsv: CSV / TSV reader.
Engine: Dispatches read calls to the registered Reader for the file’s extension. Construct with Engine::new for an empty engine, or Engine::with_defaults for the readers matching enabled feature flags.
ReadOptions: Per-call read configuration. Construct via ReadOptions::default then layer on with the builder methods.
Table: One file’s worth of structured tabular content.

Enums§

DataType: Coarse-grained data types the inference pass produces. Designed to round-trip through JSON.
Error: Errors that can arise during tabular extraction.
Value: One cell value. Keep the variants narrow — anything richer (decimals with arbitrary precision, embedded formulas, currency types) degrades to Text so callers don’t have to handle a combinatorial explosion of types. Round-trips cleanly through serde_json::Value for any caller that adds serde.

Traits§

Reader: A backend that knows how to read one or more tabular formats. Implementors register themselves with an Engine.

Type Aliases§

Result: Result alias used across the crate.
Row: One row of sampled data. Row[i] corresponds to Table::columns[i]. Length is always equal to the column count, with Value::Null filling positions where the source row was shorter than the header.

Crate tabkit

Crate tabkit Copy item path

§tabkit — tabular files → schema + sample rows.

§Quick start

§Why a separate crate

§Stability commitment (v0.4+)

Structs§

Enums§

Traits§

Type Aliases§

Crate tabkit