qvd
High-performance Rust library for reading, writing and converting Qlik QVD files. Parquet/Arrow interop, DataFusion SQL, DuckDB integration, streaming reader, CLI tool, and Python bindings (PyArrow, pandas, Polars).
First and only QVD crate on crates.io.
Features
- Read/Write QVD — byte-identical roundtrip, Qlik Sense-compatible output
- Parquet ↔ QVD — bidirectional conversion with compression (snappy, zstd, gzip, lz4). Binary-identical to Qlik Sense output
- Arrow RecordBatch — zero-copy QVD ↔ Arrow for DataFusion, DuckDB, Polars
- DuckDB integration — register QVD files as SQL tables, query with JOINs, batch folder registration
- DataFusion SQL — register QVD files as tables, run SQL queries
- Streaming reader — read QVD in chunks without loading everything into memory
- EXISTS() index — O(1) hash lookup like Qlik's
EXISTS(). Streaming filtered reads — 2.5× faster than Qlik Sense normalize()— auto-detect and set proper symbol types, NumberFormat, Tags, BitWidth for Qlik compatibility- CLI tool —
qvd-cli convert,inspect,head,schema,filter - Python bindings — PyArrow, pandas, Polars via zero-copy Arrow bridge
- Zero dependencies for core QVD read/write (Parquet/Arrow/DataFusion/Python are optional features)
Performance
Tested on 399 real QVD files (11 KB — 2.8 GB), all byte-identical roundtrip (MD5 match).
| File | Size | Rows | Columns | Read | Write |
|---|---|---|---|---|---|
| sample_tiny.qvd | 11 KB | 12 | 5 | 0.0s | 0.0s |
| sample_small.qvd | 418 KB | 2,746 | 8 | 0.0s | 0.0s |
| sample_medium.qvd | 41 MB | 465,810 | 12 | 0.5s | 0.0s |
| sample_large.qvd | 587 MB | 5,458,618 | 15 | 6.1s | 0.4s |
| sample_xlarge.qvd | 1.7 GB | 87,617,047 | 8 | 23.6s | 1.6s |
| sample_huge.qvd | 2.8 GB | 11,907,648 | 42 | 24.3s | 2.4s |
Streaming EXISTS() filter — vs Qlik Sense
1.7 GB QVD, 87.6M rows × 8 columns → filter by 2 values, select 3 columns → 20.4M rows × 3 columns
Qlik Sense script equivalent:
types:
LOAD * INLINE [%Type_ID
7
9];
filtered:
LOAD %Key_ID, DateField_BK, %Type_ID
FROM [lib://data/large_table.qvd](qvd)
WHERE EXISTS(%Type_ID);
STORE filtered INTO [lib://data/result.qvd](qvd);
DROP TABLE filtered;
| Qlik Sense | qvdrs | |
|---|---|---|
| Total (→ QVD) | ~28s | 11.4s |
| Total (→ Parquet) | — | 15.5s |
| Speedup | 1× | 2.5× |
The streaming reader loads only symbol tables into memory, then scans the index table in chunks. For each row, only the filter column is decoded first. Matching rows get selected columns decoded. Non-matching rows are skipped entirely.
Parquet → QVD conversion
QVD files generated from Parquet are binary-identical to those created by Qlik Sense (same symbol types, NumberFormat, Tags, BitWidth, BitOffset ordering). Verified by MD5 hash comparison of the binary section.
Installation
Rust
# Core QVD read/write (zero dependencies)
[]
= "0.5.0"
# With Parquet/Arrow support
[]
= { = "0.5.0", = ["parquet_support"] }
# With DataFusion SQL support
[]
= { = "0.5.0", = ["datafusion_support"] }
Python
CLI
Or run without installing via uvx:
Rust Examples
Read and write QVD
use ;
let table = read_qvd_file?;
println!;
println!;
// Access individual values
for row in 0..5
// Byte-identical roundtrip
write_qvd_file?;
Convert Parquet ↔ QVD
use ;
// Parquet → QVD (Qlik Sense-compatible output)
convert_parquet_to_qvd?;
// QVD → Parquet
convert_qvd_to_parquet?;
convert_qvd_to_parquet?;
convert_qvd_to_parquet?;
convert_qvd_to_parquet?;
convert_qvd_to_parquet?;
Arrow RecordBatch
use ;
// QVD → Arrow
let table = read_qvd_file?;
let batch = qvd_to_record_batch?;
println!;
// Arrow → QVD
let qvd_table = record_batch_to_qvd?;
write_qvd_file?;
Normalize — Qlik Sense compatibility
normalize() auto-detects and sets proper Qlik-compatible metadata on any QvdTable.
let mut table = read_qvd_file?;
// Filter or modify the table
let matching = table.filter_by_values;
let mut subset = table.subset_rows;
// Normalize for Qlik compatibility before saving
subset.normalize;
write_qvd_file?;
What normalize() does:
- Converts DualInt → Int, DualDouble → Double (removes redundant string representations)
- Uses Int for float values that are exact integers (like Qlik does)
- Sets NumberFormat:
INTEGER(###0),REAL(14 decimals),ASCII - Sets Tags:
$numeric,$integer,$ascii,$text - Reserves NULL sentinel in BitWidth (
bits_needed(num_symbols + 1)) - Sorts BitOffsets by descending width (optimal packing)
normalize()is called automatically during Parquet/Arrow → QVD conversion. Call it manually only when modifying existing tables.
Streaming reader
use open_qvd_stream;
let mut reader = open_qvd_stream?;
println!;
println!;
// Process in chunks of 64K rows
while let Some = reader.next_chunk?
EXISTS() — O(1) lookup
Like Qlik's EXISTS() function — build an index of unique values and filter another table.
use ;
// Build index from a table column
let clients = read_qvd_file?;
let index = from_column.unwrap;
// O(1) lookup
assert!;
println!;
// Filter another table
let facts = read_qvd_file?;
let col_idx = facts.column_index.unwrap;
let matching_rows = filter_rows_by_exists_fast;
println!;
// Create subset and save
let filtered = facts.subset_rows;
write_qvd_file?;
Streaming EXISTS() — filtered read (recommended for large files)
For large QVD files, read_filtered() streams the index table and only loads matching rows into memory.
use ;
// Build index from explicit values
let index = from_values;
// Or from another table
let clients = read_qvd_file?;
let index = from_column.unwrap;
drop; // free memory before opening the large file
// Stream + filter + select columns
let mut stream = open_qvd_stream?;
let filtered = stream.read_filtered?;
println!;
// Save as QVD or Parquet
write_qvd_file?;
DataFusion SQL
use *;
use register_qvd;
async
Build QVD from scratch
use ;
let table = new
.add_column
.add_column
.add_column
.build;
write_qvd_file?;
Python Examples
Read and write QVD
=
# QvdTable(table='data', rows=1000, cols=5)
# ['ID', 'Name', 'Region', 'Amount', 'Date']
# 1000
# 5
# first 5 rows as formatted string
Convert Parquet ↔ QVD
# Parquet → QVD
# QVD → Parquet
# Load Parquet as QvdTable, inspect, save
=
PyArrow
# QVD → PyArrow (zero-copy via Arrow C Data Interface)
=
# Or via QvdTable
=
=
# PyArrow → QVD
=
# Any PyArrow RecordBatch works — from pandas, Polars, DuckDB, etc.
=
pandas
# QVD → pandas DataFrame
=
# Or via QvdTable
=
# pandas → QVD
=
Polars
# QVD → Polars DataFrame
=
# Or via QvdTable
=
# Polars → QVD
=
EXISTS() — filter and subset
# Build index from a table column
=
=
# O(1) lookup
# True/False
# same
# unique values count
# [True, False]
# Filter another table
=
=
# Subset rows and save
=
EXISTS() from explicit values
# Build index from a list of values (like LOAD * INLINE in Qlik)
=
# Streaming filtered read — memory-efficient for large files
=
Normalize for Qlik compatibility
=
# Filter
=
=
# Normalize before saving — sets proper types, tags, format
DuckDB — register single file
=
# Register QVD as a DuckDB table
# SQL queries
DuckDB — register multiple files
=
# Register multiple QVD files
# JOIN across QVD tables
DuckDB — register folder
=
# Register all QVD files from a folder (table name = file name without .qvd)
=
# Register from multiple folders
=
# With glob pattern — only matching files
=
# Recursive scan of subdirectories
=
# Skip large files (default 500 MB limit)
=
# All options together
=
# Query any registered table
DuckDB — register with EXISTS() filter
=
# Register with streaming filter — only matching rows loaded
=
DuckDB — export results to QVD
=
# Query → pandas
=
# Query → PyArrow
=
# Query result → QVD (aggregation, JOIN, filter — anything)
=
Database → QVD (PostgreSQL, MySQL, SQLite, Snowflake, etc.)
Any database that can return Arrow/pandas data can save to QVD. Dates, timestamps, integers, floats — all types are automatically converted.
# === PostgreSQL (via connectorx — fastest) ===
=
# connectorx returns PyArrow Table, convert to RecordBatch
= # or combine if multiple batches
# === PostgreSQL (via psycopg + pandas) ===
=
=
=
# === SQLite ===
=
=
=
# === DuckDB (local or remote) ===
=
=
# === Snowflake (via snowflake-connector-python) ===
# pip install snowflake-connector-python[pandas]
=
=
=
=
# === BigQuery ===
=
=
=
# === Any ADBC-compatible database ===
=
=
=
CSV/Excel → QVD
# CSV → QVD
=
=
# Excel → QVD
=
=
# Multiple sheets → multiple QVDs
=
=
CLI
Or via uvx (no install needed):
Convert between formats
# Parquet → QVD
# QVD → Parquet (default: snappy)
# QVD → Parquet with compression
# Rewrite QVD (re-generate from internal representation)
# Recompress Parquet
Inspect QVD metadata
File: data.qvd
Size: 41.3 MB
Table: SalesData
Rows: 465,810
Columns: 12
Created: 2024-01-15 10:30:00
Build: 50699
RecordSize: 89 bytes
Read time: 0.50s
Column Symbols BitWidth Bias FmtType Tags
--------------------------------------------------------------------------------
OrderID 465810 20 0 INTEGER $numeric, $integer
CustomerID 12500 14 0 INTEGER $numeric, $integer
Region 5 3 0 ASCII $ascii, $text
Amount 389201 19 0 REAL $numeric
Preview rows
Filter with EXISTS() (streaming)
# Filter by column values
# Filter + select columns
# Filter → Parquet
Show Arrow schema
Arrow Schema for 'data.qvd':
OrderID Int64
CustomerID Int64
Region Utf8
Amount Float64 (nullable)
OrderDate Date32
Architecture
src/
├── lib.rs — public API, re-exports
├── error.rs — error types (QvdError, QvdResult)
├── header.rs — XML header parser/writer
├── value.rs — QVD data types (QvdSymbol, QvdValue)
├── symbol.rs — symbol table binary reader/writer
├── index.rs — index table bit-packed reader/writer
├── reader.rs — QVD reader + normalize()
├── writer.rs — QVD writer + QvdTableBuilder
├── exists.rs — ExistsIndex + filter functions
├── streaming.rs — streaming chunk-based reader with filtered reads
├── parquet.rs — Parquet/Arrow ↔ QVD conversion (optional)
├── datafusion.rs — DataFusion TableProvider (optional)
├── python.rs — PyO3 bindings (optional)
└── bin/qvd.rs — CLI binary (optional)
QVD file format
A QVD file consists of three sections:
- XML header — metadata: table name, field definitions (name, BitOffset, BitWidth, Bias, NumberFormat, Tags), record count
- Symbol tables — unique values per column, each encoded as Int (0x01), Double (0x02), Text (0x04), DualInt (0x05), or DualDouble (0x06). Dates are stored as DualDouble (Qlik serial number + formatted string)
- Index table — bit-packed rows, each row is
RecordByteSizebytes. Fields are packed at theirBitOffsetwithBitWidthbits. The stored value +Bias= symbol index. Index =NoOfSymbolsmeans NULL
NumberFormat types: UNKNOWN, ASCII, INTEGER, REAL, FIX, MONEY, DATE, TIMESTAMP.
Tags: $numeric, $integer, $text, $ascii, $timestamp, $date, $key.
Feature Flags
| Feature | Dependencies | Description |
|---|---|---|
| (default) | none | Core QVD read/write |
parquet_support |
arrow, parquet, chrono | Parquet/Arrow conversion |
datafusion_support |
+ datafusion, tokio | SQL queries via DataFusion |
cli |
+ clap | CLI binary |
python |
+ pyo3, arrow/pyarrow | Python bindings |
Author
Stanislav Chernov (@bintocher)
License
MIT — see LICENSE