Expand description
QuickBooks .qbw file parser.
Provides parsing of invoice line-item records out of an SA17 page-store
that was produced by QuickBooks. Built on the opensqlany crate which
handles the lower-level page-store layer (CRC validation, AP-fill
deobfuscation, slotted-page directories).
This release (v0.1) covers the invoice line-item record format only. Invoice headers, bills, checks, journal entries, and the system catalog are deferred to later releases.
§Status
Prototype quality. See OpenQBW/re/NOTES.md (entries C.40–C.43) for the
reverse-engineered record layout and remaining gaps.
Structs§
- Attribution
Agreement - Statistics from comparing position-based and content-based attribution over a set of pages.
- Content
Attribution - Content-signature attribution map.
- Cross
Validation - Per-table-id agreement counts.
- FkEdge
- One inferred foreign-key edge.
- FkGraph
Stats - Summary of edge resolution rates.
- Line
Item - Parsed invoice line-item record.
- Nulls
Flag Bucket - One bucket in the
histogramoutput. - Page
Attribution - Resolves page numbers to the
SYSTABLEentry that most likely owns them. - RowSignature
- A row-0 prefix used as a per-table fingerprint.
- Schema
Attribution - Schema-aware width-validator built from SYSCOLUMN + SYSTABLE.
- SysColumn
- One parsed
SYSCOLUMNrow. - SysIndex
Entry - One parsed
SYSINDEXrow. - SysTable
Entry - A single parsed
SYSTABLErow. - Transaction
Header - A minimal transaction-header record.
- Validation
Stats - Summary of
SchemaAttribution::validate_corpusover a page set. - Width
Band - A row-width band derived from a table’s SYSCOLUMN schema.
Enums§
- Amount
Type - Classification of the amount-type byte in a line item.
- Audit
Outcome - Outcome of auditing one SYSINDEX
(root_page, table_id)pair against a position-heuristicPageAttribution. - Line
Item Error - Errors from line-item parsing.
Constants§
- APAGE_
MAGIC - C.37 plaintext magic shared by every regular A-page (SA17 allocation/
free-space map B-tree). The 8-byte sequence appears at a varying offset
(typically
0xC0..0xF7) inside the page body and is part of the SA17 page-level metadata block. - DATE_
EPOCH_ DAYS_ BEFORE_ UNIX - SA-day -> Unix-day offset:
unix_day = sa_day - DATE_EPOCH_DAYS_BEFORE_UNIX. Equivalently, SA-day 0 = Unix-day -4017 = 1956-12-?? – but we use the “SA-day 4017 = Unix-day 0” framing throughout the codebase. - DISAGREE_
SAMPLE_ LIMIT - Maximum disagreement samples retained in
CrossValidation::disagree_samples. - MIN_
ROW_ BODY_ BYTES - Minimum row body length any plausible row must clear, regardless of schema. SA17 rows include a small row header that is not modelled per-column.
- OPAQUE_
ENTROPY_ THRESHOLD - Threshold (bits/byte) above which the page body is treated as
uniformly-random. Empirically every page in the four-file Phase 5 corpus
that survives all four bv-recovery tiers scores in
[7.5, 8.0); pages that do decode have body entropy well below 7.5 once the AP cipher is removed. - SA_
DAY_ MAX_ PLAUSIBLE - Upper plausibility bound (~year 2200) for SA-day values. SA-day 80000 is roughly 2199-12-26 in the Unix calendar.
- SA_
DAY_ MIN_ PLAUSIBLE - Lower plausibility bound for SA-day values: any
sa_day < 1is rejected as a missing/zero placeholder. - SIG_LEN
- Length in bytes of the row-0 prefix used as a table signature.
- SYSCOLUMN_
TAG - Fixed 8-byte anchor that precedes the numeric portion of every
SYSCOLUMNrow body. - SYSINDEX_
CREATOR - Two-byte creator id appearing at row offset +2 of every SYSINDEX row. Empirically constant across all QBW files inspected so far (C.30).
- SYSOBJECT_
NAME_ OFFSET - Byte distance between the
object_idu32 and the<name_len>byte in aSYSOBJECTrow. Determined empirically on RC page 550. - VARIABLE_
COLUMN_ UPPER_ ALLOWANCE - Per-column variable-width allowance (bytes) added to the upper bound
when the column’s domain is variable-length (
Y,V,C,A).
Functions§
- bridge_
owners_ to_ tables - Scan all
Extentpages forSYSOBJECT-style rows and return a map fromSysColumn::owner_object_idtoSysTableEntry::name. - build_
fk_ graph - Build a heuristic FK edge list from the parsed SYSTABLE + SYSCOLUMN catalogs. The list is sorted by (source_table, source_column_id).
- collect_
unique - Collect a deduplicated catalog keyed by
(table_id, name), choosing the first occurrence found. - collect_
unique_ syscolumns - Deduplicate
SYSCOLUMNrows by(owner_object_id, column_id, name)and return them ordered by(owner_object_id, column_id). - collect_
unique_ sysindex - Collect a deduplicated set of SYSINDEX entries keyed by
(table_id, root_page, name), preferring the first sighting. - deobfuscate_
with_ bv - Decode an E-page with an explicit bv (the inverse of the AP stream cipher).
- fk_
graph_ stats - Compute simple counts over an edge set.
- is_
opaque_ high_ entropy - Returns
trueif the raw page bytes are indistinguishable from random data and therefore unlikely to be standard AP ciphertext. - iter_
lineitems - Yields every line item found in
storeby scanning eachE-type page. - iter_
lineitems_ with_ attribution - Like
iter_lineitems, but tags each emitted item with theSysTableEntry::nameof the table that the line item’s source page is attributed to (viaPageAttribution). - iter_
syscolumns - Iterate every
SYSCOLUMNrow recovered fromstore. - iter_
sysindex - Iterate every
SYSINDEXrow recovered fromstore. - iter_
systable_ entries - Iterate every
SYSTABLErow recovered fromstore. - iter_
transaction_ headers - Iterate transaction headers across every E-page that
PageAttributionattributes to a*_headertable. - nulls_
flag_ histogram - Build a histogram of the
nulls_flagbyte across everySYSCOLUMNrow instore, sorted ascending by byte value. - oracle_
bv_ e_ page - Compute the E-page oracle bv assuming
plain[0] == 0x00. - recover_
bv_ any - Cascade of bv-recovery oracles in order of decreasing confidence.
- recover_
bv_ apage - Recover the page bv for an A-page (allocation/free-space map) using the C.37 magic anchor.
- recover_
bv_ brute - Recover the page bv by exhaustive search over all 256 candidates.
- recover_
bv_ qb_ data - Try to recover the correct bv for E-page
pn(zero-based) using the QB page-trailer anchor. ReturnsSome(bv)on success,Noneif no candidate bv decodes the anchor (page is not a QB user-data page). - sa_
day_ to_ unix_ day - Convert an SA-day to days since the Unix epoch (1970-01-01).
- sa_
day_ to_ unix_ seconds - SA-day -> Unix seconds at midnight UTC. Convenience wrapper for
downstream code that uses
chronoortimefor formatting. - scan_
syscolumn_ page - Scan every slotted-page row body on a single decoded page for
SYSCOLUMNrows. - scan_
sysindex_ page - Scan a single decoded page body for
SYSINDEXrows. - scan_
systable_ page - Scan a decoded page body for
SYSTABLErow tags and append parsed entries toout. Matches every occurrence of the 16-byte framed tag followed by a plausible name-length byte; the file-specific 4-byte magic is accepted as wildcard. - schema_
for - Return all columns for the table named
table_name, ordered bycolumn_id. The bridge is via theSYSOBJECTcatalog (seecrate::sysobject::bridge_owners_to_tables). - unix_
day_ to_ sa_ day - Inverse of
sa_day_to_unix_day.