Skip to main content

Crate tabkit

Crate tabkit 

Source
Expand description

§tabkit — tabular files → schema + sample rows.

tabkit is the shared spreadsheet reader that Tauri / Iced / native desktop apps reach for when they need to introspect XLSX / CSV / TSV files without reinventing the same calamine- plus-type-inference glue twice.

Its job is small but easy to get wrong:

  1. Open a tabular file by extension (XLSX, CSV, TSV, ODS, …).
  2. Read enough of it to produce a schema (column name + inferred data type) and a sample of rows (first N).
  3. Hand those back as a Table — typed enough for the downstream UI to render and the downstream agent to reason about, but JSON-friendly so it serialises cleanly across the Tauri IPC boundary.

What tabkit deliberately does NOT do:

  • SQL queries. Different consumers want different SQL engines; pick yours and call it directly. tabkit stops at schema + samples.
  • Full table iteration. Use the underlying crate (calamine, csv) directly if you need to stream every row.
  • Persistence, caching, change tracking. Those are the consuming application’s concerns — pair tabkit with scankit for walk-and-watch and persist however suits.

§Quick start

use tabkit::{Engine, ReadOptions};
use std::path::Path;

let engine = Engine::with_defaults();
let table = engine.read(
    Path::new("/Users/me/data/sales.xlsx"),
    &ReadOptions::default().max_sample_rows(10),
)?;

for col in &table.columns {
    println!("{} : {:?}", col.name, col.data_type);
}
for row in &table.sample_rows {
    println!("{row:?}");
}

§Why a separate crate

Every “show the user what’s in their spreadsheet” project rebuilds the same calamine wrapper, the same type-inference pass, the same first-row-is-headers guess, the same ragged-row padding. tabkit ships the bits once with the edge cases (empty sheets, headerless CSVs, mixed-type columns) handled in one place.

§Stability commitment (v0.4+)

v0.4 marks the API stability candidate for 1.0. The following surface is committed to and will only change with a major version bump:

  • The Reader trait shape — required methods, default implementations, Send + Sync bound. Future trait methods will land with default impls so existing implementors don’t break.
  • Engine construction + dispatch — new, with_defaults, register, read, len, is_empty.
  • Table field set + the ReadOptions builder methods. Marked #[non_exhaustive] so we can add fields without major bumps.
  • Column, DataType, Value, Error enums + structs. All #[non_exhaustive] for the same forward-compat reason — pattern-matchers must include a wildcard arm.
  • Feature flag names: calamine, csv, parquet, full. Each backend’s per-format extension list (xlsx / csv / parquet / etc.) is also stable.
  • Per-reader name() strings ("calamine", "csv", "parquet") — used by callers for filtering / logging.

The following are implementation details and may change in minor versions:

  • The internal layout of any specific reader (private fields, helper methods, type-inference heuristics).
  • The exact set of Table.metadata keys per backend (new keys may appear; documented keys stay).
  • The auto-registration order in Engine::with_defaults (the fact that the first registered wins for overlapping extensions stays; the specific order doesn’t).

1.0 will be cut once the API is exercised by at least one downstream production user. Sery Link is the canonical integration target.

Structs§

CalamineReadercalamine
XLSX-family reader. Construct via CalamineReader::new (cannot fail — the crate has no runtime dependency).
Column
One column’s name + inferred type. nullable is true if any sample row had a missing/empty value in this position.
CsvReadercsv
CSV / TSV reader.
Engine
Dispatches read calls to the registered Reader for the file’s extension. Construct with Engine::new for an empty engine, or Engine::with_defaults for the readers matching enabled feature flags.
ReadOptions
Per-call read configuration. Construct via ReadOptions::default then layer on with the builder methods.
Table
One file’s worth of structured tabular content.

Enums§

DataType
Coarse-grained data types the inference pass produces. Designed to round-trip through JSON.
Error
Errors that can arise during tabular extraction.
Value
One cell value. Keep the variants narrow — anything richer (decimals with arbitrary precision, embedded formulas, currency types) degrades to Text so callers don’t have to handle a combinatorial explosion of types. Round-trips cleanly through serde_json::Value for any caller that adds serde.

Traits§

Reader
A backend that knows how to read one or more tabular formats. Implementors register themselves with an Engine.

Type Aliases§

Result
Result alias used across the crate.
Row
One row of sampled data. Row[i] corresponds to Table::columns[i]. Length is always equal to the column count, with Value::Null filling positions where the source row was shorter than the header.