Skip to main content

Crate openqbw

Crate openqbw 

Source
Expand description

QuickBooks .qbw file parser.

Provides parsing of invoice line-item records out of an SA17 page-store that was produced by QuickBooks. Built on the opensqlany crate which handles the lower-level page-store layer (CRC validation, AP-fill deobfuscation, slotted-page directories).

This release (v0.1) covers the invoice line-item record format only. Invoice headers, bills, checks, journal entries, and the system catalog are deferred to later releases.

§Status

Prototype quality. See OpenQBW/re/NOTES.md (entries C.40–C.43) for the reverse-engineered record layout and remaining gaps.

Structs§

AttributionAgreement
Statistics from comparing position-based and content-based attribution over a set of pages.
ContentAttribution
Content-signature attribution map.
CrossValidation
Per-table-id agreement counts.
FkEdge
One inferred foreign-key edge.
FkGraphStats
Summary of edge resolution rates.
LineItem
Parsed invoice line-item record.
NullsFlagBucket
One bucket in the histogram output.
PageAttribution
Resolves page numbers to the SYSTABLE entry that most likely owns them.
RowSignature
A row-0 prefix used as a per-table fingerprint.
SchemaAttribution
Schema-aware width-validator built from SYSCOLUMN + SYSTABLE.
SysColumn
One parsed SYSCOLUMN row.
SysIndexEntry
One parsed SYSINDEX row.
SysTableEntry
A single parsed SYSTABLE row.
TransactionHeader
A minimal transaction-header record.
ValidationStats
Summary of SchemaAttribution::validate_corpus over a page set.
WidthBand
A row-width band derived from a table’s SYSCOLUMN schema.

Enums§

AmountType
Classification of the amount-type byte in a line item.
AuditOutcome
Outcome of auditing one SYSINDEX (root_page, table_id) pair against a position-heuristic PageAttribution.
LineItemError
Errors from line-item parsing.

Constants§

APAGE_MAGIC
C.37 plaintext magic shared by every regular A-page (SA17 allocation/ free-space map B-tree). The 8-byte sequence appears at a varying offset (typically 0xC0..0xF7) inside the page body and is part of the SA17 page-level metadata block.
DATE_EPOCH_DAYS_BEFORE_UNIX
SA-day -> Unix-day offset: unix_day = sa_day - DATE_EPOCH_DAYS_BEFORE_UNIX. Equivalently, SA-day 0 = Unix-day -4017 = 1956-12-?? – but we use the “SA-day 4017 = Unix-day 0” framing throughout the codebase.
DISAGREE_SAMPLE_LIMIT
Maximum disagreement samples retained in CrossValidation::disagree_samples.
MIN_ROW_BODY_BYTES
Minimum row body length any plausible row must clear, regardless of schema. SA17 rows include a small row header that is not modelled per-column.
OPAQUE_ENTROPY_THRESHOLD
Threshold (bits/byte) above which the page body is treated as uniformly-random. Empirically every page in the four-file Phase 5 corpus that survives all four bv-recovery tiers scores in [7.5, 8.0); pages that do decode have body entropy well below 7.5 once the AP cipher is removed.
SA_DAY_MAX_PLAUSIBLE
Upper plausibility bound (~year 2200) for SA-day values. SA-day 80000 is roughly 2199-12-26 in the Unix calendar.
SA_DAY_MIN_PLAUSIBLE
Lower plausibility bound for SA-day values: any sa_day < 1 is rejected as a missing/zero placeholder.
SIG_LEN
Length in bytes of the row-0 prefix used as a table signature.
SYSCOLUMN_TAG
Fixed 8-byte anchor that precedes the numeric portion of every SYSCOLUMN row body.
SYSINDEX_CREATOR
Two-byte creator id appearing at row offset +2 of every SYSINDEX row. Empirically constant across all QBW files inspected so far (C.30).
SYSOBJECT_NAME_OFFSET
Byte distance between the object_id u32 and the <name_len> byte in a SYSOBJECT row. Determined empirically on RC page 550.
VARIABLE_COLUMN_UPPER_ALLOWANCE
Per-column variable-width allowance (bytes) added to the upper bound when the column’s domain is variable-length (Y, V, C, A).

Functions§

bridge_owners_to_tables
Scan all Extent pages for SYSOBJECT-style rows and return a map from SysColumn::owner_object_id to SysTableEntry::name.
build_fk_graph
Build a heuristic FK edge list from the parsed SYSTABLE + SYSCOLUMN catalogs. The list is sorted by (source_table, source_column_id).
collect_unique
Collect a deduplicated catalog keyed by (table_id, name), choosing the first occurrence found.
collect_unique_syscolumns
Deduplicate SYSCOLUMN rows by (owner_object_id, column_id, name) and return them ordered by (owner_object_id, column_id).
collect_unique_sysindex
Collect a deduplicated set of SYSINDEX entries keyed by (table_id, root_page, name), preferring the first sighting.
deobfuscate_with_bv
Decode an E-page with an explicit bv (the inverse of the AP stream cipher).
fk_graph_stats
Compute simple counts over an edge set.
is_opaque_high_entropy
Returns true if the raw page bytes are indistinguishable from random data and therefore unlikely to be standard AP ciphertext.
iter_lineitems
Yields every line item found in store by scanning each E-type page.
iter_lineitems_with_attribution
Like iter_lineitems, but tags each emitted item with the SysTableEntry::name of the table that the line item’s source page is attributed to (via PageAttribution).
iter_syscolumns
Iterate every SYSCOLUMN row recovered from store.
iter_sysindex
Iterate every SYSINDEX row recovered from store.
iter_systable_entries
Iterate every SYSTABLE row recovered from store.
iter_transaction_headers
Iterate transaction headers across every E-page that PageAttribution attributes to a *_header table.
nulls_flag_histogram
Build a histogram of the nulls_flag byte across every SYSCOLUMN row in store, sorted ascending by byte value.
oracle_bv_e_page
Compute the E-page oracle bv assuming plain[0] == 0x00.
recover_bv_any
Cascade of bv-recovery oracles in order of decreasing confidence.
recover_bv_apage
Recover the page bv for an A-page (allocation/free-space map) using the C.37 magic anchor.
recover_bv_brute
Recover the page bv by exhaustive search over all 256 candidates.
recover_bv_qb_data
Try to recover the correct bv for E-page pn (zero-based) using the QB page-trailer anchor. Returns Some(bv) on success, None if no candidate bv decodes the anchor (page is not a QB user-data page).
sa_day_to_unix_day
Convert an SA-day to days since the Unix epoch (1970-01-01).
sa_day_to_unix_seconds
SA-day -> Unix seconds at midnight UTC. Convenience wrapper for downstream code that uses chrono or time for formatting.
scan_syscolumn_page
Scan every slotted-page row body on a single decoded page for SYSCOLUMN rows.
scan_sysindex_page
Scan a single decoded page body for SYSINDEX rows.
scan_systable_page
Scan a decoded page body for SYSTABLE row tags and append parsed entries to out. Matches every occurrence of the 16-byte framed tag followed by a plausible name-length byte; the file-specific 4-byte magic is accepted as wildcard.
schema_for
Return all columns for the table named table_name, ordered by column_id. The bridge is via the SYSOBJECT catalog (see crate::sysobject::bridge_owners_to_tables).
unix_day_to_sa_day
Inverse of sa_day_to_unix_day.