Expand description
§tabkit — tabular files → schema + sample rows.
tabkit is the shared spreadsheet reader that Tauri / Iced /
native desktop apps reach for when they need to introspect
XLSX / CSV / TSV files without reinventing the same calamine-
plus-type-inference glue twice.
Its job is small but easy to get wrong:
- Open a tabular file by extension (XLSX, CSV, TSV, ODS, …).
- Read enough of it to produce a schema (column name + inferred data type) and a sample of rows (first N).
- Hand those back as a
Table— typed enough for the downstream UI to render and the downstream agent to reason about, but JSON-friendly so it serialises cleanly across the Tauri IPC boundary.
What tabkit deliberately does NOT do:
- SQL queries. Different consumers want different SQL engines; pick yours and call it directly. tabkit stops at schema + samples.
- Full table iteration. Use the underlying crate
(
calamine,csv) directly if you need to stream every row. - Persistence, caching, change tracking. Those are the
consuming application’s concerns — pair
tabkitwithscankitfor walk-and-watch and persist however suits.
§Quick start
use tabkit::{Engine, ReadOptions};
use std::path::Path;
let engine = Engine::with_defaults();
let table = engine.read(
Path::new("/Users/me/data/sales.xlsx"),
&ReadOptions::default().max_sample_rows(10),
)?;
for col in &table.columns {
println!("{} : {:?}", col.name, col.data_type);
}
for row in &table.sample_rows {
println!("{row:?}");
}§Why a separate crate
Every “show the user what’s in their spreadsheet” project
rebuilds the same calamine wrapper, the same type-inference
pass, the same first-row-is-headers guess, the same
ragged-row padding. tabkit ships the bits once with the
edge cases (empty sheets, headerless CSVs, mixed-type
columns) handled in one place.
§Stability commitment (v0.4+)
v0.4 marks the API stability candidate for 1.0. The following surface is committed to and will only change with a major version bump:
- The
Readertrait shape — required methods, default implementations,Send + Syncbound. Future trait methods will land with default impls so existing implementors don’t break. Engineconstruction + dispatch —new,with_defaults,register,read,len,is_empty.Tablefield set + theReadOptionsbuilder methods. Marked#[non_exhaustive]so we can add fields without major bumps.Column,DataType,Value,Errorenums + structs. All#[non_exhaustive]for the same forward-compat reason — pattern-matchers must include a wildcard arm.- Feature flag names:
calamine,csv,parquet,full. Each backend’s per-format extension list (xlsx/csv/parquet/ etc.) is also stable. - Per-reader
name()strings ("calamine","csv","parquet") — used by callers for filtering / logging.
The following are implementation details and may change in minor versions:
- The internal layout of any specific reader (private fields, helper methods, type-inference heuristics).
- The exact set of
Table.metadatakeys per backend (new keys may appear; documented keys stay). - The auto-registration order in
Engine::with_defaults(the fact that the first registered wins for overlapping extensions stays; the specific order doesn’t).
1.0 will be cut once the API is exercised by at least one downstream production user. Sery Link is the canonical integration target.
Structs§
- Calamine
Reader calamine - XLSX-family reader. Construct via
CalamineReader::new(cannot fail — the crate has no runtime dependency). - Column
- One column’s name + inferred type.
nullableistrueif any sample row had a missing/empty value in this position. - CsvReader
csv - CSV / TSV reader.
- Engine
- Dispatches
readcalls to the registeredReaderfor the file’s extension. Construct withEngine::newfor an empty engine, orEngine::with_defaultsfor the readers matching enabled feature flags. - Read
Options - Per-call read configuration. Construct via
ReadOptions::defaultthen layer on with the builder methods. - Table
- One file’s worth of structured tabular content.
Enums§
- Data
Type - Coarse-grained data types the inference pass produces. Designed to round-trip through JSON.
- Error
- Errors that can arise during tabular extraction.
- Value
- One cell value. Keep the variants narrow — anything richer
(decimals with arbitrary precision, embedded formulas, currency
types) degrades to
Textso callers don’t have to handle a combinatorial explosion of types. Round-trips cleanly throughserde_json::Valuefor any caller that addsserde.
Traits§
- Reader
- A backend that knows how to read one or more tabular formats.
Implementors register themselves with an
Engine.
Type Aliases§
- Result
- Result alias used across the crate.
- Row
- One row of sampled data.
Row[i]corresponds toTable::columns[i]. Length is always equal to the column count, withValue::Nullfilling positions where the source row was shorter than the header.