OpenQVD
A free, open, clean-room specification and implementation of the Qlik QVD binary file format, derived purely by binary analysis of publicly available sample files. The goal is a Rust reader and writer that the data science community can use without depending on any proprietary Qlik tooling.
The specification was developed against the
QVD-Sources corpus — a curated
collection of ~1,100 publicly available .qvd files gathered from GitHub.
Status
Seven stages complete:
- XML header and envelope structure. (Spec section 1.)
- Per-field symbol table encoding. (Spec section 2.)
- Bit-packed row index encoding. (Spec section 3.)
- Validation against the full public corpus via a clean-room Python decoder.
- Rust reader prototype (
crates/openqvd) with edge-case tests. - Writer + semantic round-trip tests.
- Python bindings (
crates/openqvd-py) — PyArrow, Polars, Pandas.
See SPEC.md for the current specification and NOTES.md for the
working log of observations.
Rust usage
# Cargo.toml
[]
= "1"
# Enable Arrow integration (PyArrow, RecordBatch, type inference):
= { = "1", = ["arrow"] }
use Qvd;
let qvd = from_path.unwrap;
println!;
for row in qvd.rows
Reader
The Rust reader parses 1,044 of 1,047 valid public QVD samples. The
three remaining files are deliberately-corrupted test fixtures from
third-party projects (two named damaged.qvd, one with invalid UTF-8).
10 unit + integration tests cover bias-based NULL, 2+6 bit packing,
zero-width fields, every symbol type byte, unknown-type rejection,
overlapping bit-fields rejection, inconsistent root Length
rejection, and the LF-terminator header variant.
Writer
A compliant writer is implemented in crates/openqvd::writer. Running
read -> write -> read over the entire corpus yields 1,093 of 1,093
valid files semantically equivalent (same row count, same field
names, byte-for-byte equal cell values). 9 writer tests cover NULL
handling, all five symbol types, zero-width collapse for constant
columns, 500-distinct wide columns, NUL-in-string rejection,
uneven-column rejection, and deterministic output.
Python bindings
crates/openqvd-py is a maturin mixed-layout
package that exposes a pure-Python API on top of the Rust library.
Install (development)
&&
Usage
# Read as a PyArrow Table
=
=
# Predicate pushdown (filtering at the Rust level, before Arrow conversion)
=
# Inspect metadata only (no row decoding)
=
# Write from a PyArrow Table
# Polars (import registers pl.read_qvd, pl.scan_qvd, df.qvd.write)
=
=
=
# Pandas (via PyArrow)
=
The Python bindings read 1,044 of 1,047 valid corpus files (99.7%), matching the Rust reader baseline. The 3 failures are deliberately- corrupted test fixtures.
DuckDB integration
=
# Register a QVD file as a SQL view
# Or get a relation directly
=
# Write a DuckDB query result to a QVD file
Install with pip install openqvd[duckdb]. DuckDB support is provided through
Arrow interop; a native read_qvd() SQL table function would require a C++
extension, which is out of scope.
Arrow type mapping
| QVD NumberFormat/Type | Arrow type |
|---|---|
DATE |
Date32 (Qlik epoch → Unix epoch) |
TIMESTAMP |
Timestamp(Microsecond, None) |
TIME |
Duration(Microsecond) |
| Int / DualInt symbols | Int64 |
| Float / DualFloat symbols | Float64 |
| String symbols | LargeUtf8 |
| Empty symbol table | Null |
CLI
The openqvd binary provides end-user tooling:
openqvd stat <file> # header summary (fields, widths, rows)
openqvd head <file> [--rows N] # first N rows
openqvd csv <file> # every row as tab-separated text
openqvd json <file> # one JSON object per row
openqvd rewrite <in> <out> # read then re-serialise through the writer
Non-goals
- Executing, shipping, or linking any proprietary Qlik code.
- Reading closed or encrypted QVD variants (if they exist).
- Parsing QVW, QVF, or QVS files (those are separate formats).
License
The software (all .rs, .py source files) is licensed under
Apache-2.0.
The specification (SPEC.md) is licensed under
CC BY-SA 4.0.