wblidar 0.1.0

High-performance library for reading and writing LiDAR point cloud data (LAS, LAZ, COPC, PLY, E57)
Documentation

wblidar

wblidar is the core LiDAR I/O engine for the Whitebox project. It provides fast, standards-focused, pure-Rust read/write support for common point-cloud formats so Whitebox tools can reliably ingest and emit LiDAR data.

Table of Contents

Mission

  • Provide robust LiDAR format I/O for Whitebox applications and workflows.
  • Keep codec logic in Rust with minimal external dependencies.
  • Prioritize standards compliance, interoperability, and predictable behavior.

The Whitebox Project

Whitebox is a collection of related open-source geospatial data analysis software. The Whitebox project began in 2009 at the University of Guelph, Canada, developed by Dr. John Lindsay a professor of geomatics. Whitebox has long served as Dr. Lindsay's platform for disseminating the output of his geomatics-based research and has developed an extensive worldwide user base. In 2021 Dr. Lindsay and Anthony Francioni founded Whitebox Geospatial Inc. in order to ensure the sustainable and ongoing development of this open-source geospatial project. We are currently working on the next iteration of the Whitebox software, Whitebox Next Gen. This crate is part of that larger effort.

Is wblidar Only for Whitebox?

No. wblidar is developed primarily to support Whitebox, but it is not restricted to Whitebox projects.

  • Whitebox-first: API and roadmap decisions prioritize Whitebox I/O needs.
  • General-purpose: the crate is usable as a standalone LiDAR I/O engine in other Rust applications.
  • Interop-focused: standards-compliant LAS/LAZ/COPC/PLY/E57 support makes it suitable for broader tooling and data pipelines.

What wblidar Is Not

wblidar is an I/O and format layer. It is not intended to be a full LiDAR processing framework.

  • Not a filtering/classification framework.
  • Not a replacement for Whitebox analysis/processing tools.
  • Not a pipeline engine for arbitrary geospatial transformations.

Point-cloud processing, filtering, segmentation, and analysis belong in the Whitebox frontend/tooling layer.

Supported Formats

Format Read Write Notes
LAS yes yes LAS 1.1-1.5, PDRF 0-15
LAZ yes yes Standards-compliant LASzip v2/v3 Point10/Point14 codecs
COPC yes yes COPC 1.0 hierarchy with Point14-family payloads
PLY yes yes ASCII, binary little-endian, binary big-endian
E57 yes yes ASTM E2807 with CRC-32 page validation

Design Goals

  • Standards first: prefer interoperable, standards-compliant encoding/decoding paths.
  • Pure Rust codecs: avoid native/C++ LASzip dependency by implementing core codecs in Rust.
  • Streaming I/O APIs: expose incremental read/write interfaces for large files.
  • Minimal dependencies: keep dependency surface tight and auditable.
  • Whitebox integration: maintain a stable API for Whitebox ingestion/export workflows.
  • Predictable behavior: deterministic output where applicable and explicit error modes.

Compilation Features

wblidar uses optional Cargo features for specific capabilities.

Feature Default Purpose
copc-http no Enables HTTP range fetching support for COPC access (reqwest).
parallel no Convenience umbrella feature enabling all current parallel paths.
copc-parallel no Enables Rayon-based parallel work in COPC writing paths (node encoding/sorting thresholds).
laz-parallel no Enables optional parallel LAZ chunk decode paths where beneficial.

Example:

cargo build -p wblidar --features "parallel"

Use copc-parallel or laz-parallel individually when you want narrower benchmarking or regression isolation.

API Overview

wblidar exposes two main usage styles:

  • Low-level streaming APIs via format-specific readers/writers and PointReader / PointWriter traits.
  • Unified frontend API via PointCloud for format-agnostic workflows.

1) Stream LAS -> LAS

This example shows minimal-memory, record-by-record conversion between LAS files using the streaming reader/writer traits.

use std::fs::File;
use std::io::{BufReader, BufWriter};

use wblidar::{
    io::{PointReader, PointWriter},
    las::{LasReader, LasWriter, WriterConfig},
    PointRecord,
};

fn main() -> wblidar::Result<()> {
    let input = BufReader::new(File::open("input.las")?);
    let mut reader = LasReader::new(input)?;

    let output = BufWriter::new(File::create("output.las")?);
    let mut writer = LasWriter::new(output, WriterConfig::default())?;

    let mut p = PointRecord::default();
    while reader.read_point(&mut p)? {
        writer.write_point(&p)?;
    }
    writer.finish()?;
    Ok(())
}

2) Format-Agnostic Read/Write

This example shows the high-level PointCloud API auto-detecting input format and writing multiple output formats.

use wblidar::{LidarFormat, PointCloud};

fn main() -> wblidar::Result<()> {
    let cloud = PointCloud::read("input.laz")?;

    cloud.write("out.copc.laz")?;
    cloud.write("out.ply")?;

    // Force output format regardless of extension.
    cloud.write_as("out.data", LidarFormat::E57)?;
    Ok(())
}

3) Read With Diagnostics

This example shows ingest diagnostics for observability, including partial Point14 recovery counters.

use wblidar::read_with_diagnostics;

fn main() -> wblidar::Result<()> {
    let (cloud, diag) = read_with_diagnostics("input.copc.laz")?;
    println!("points: {}", cloud.points.len());

    if diag.point14_partial_events > 0 {
        println!(
            "partial Point14 recovery: events={} decoded/expected={}/{}",
            diag.point14_partial_events,
            diag.point14_partial_decoded_points,
            diag.point14_partial_expected_points
        );
    }
    Ok(())
}

4) Reproject a PointCloud

This example shows a straightforward end-to-end reprojection workflow using PointCloud convenience methods.

use wblidar::PointCloud;

fn main() -> wblidar::Result<()> {
    let mut cloud = PointCloud::read("input.las")?;
    cloud.reproject_in_place_to_epsg(3857)?;
    cloud.write("output_3857.laz")?;
    Ok(())
}

5) Write COPC with Explicit Spatial Ordering

This example shows COPC writing with explicit root geometry and node point ordering configuration.

use std::fs::File;
use std::io::BufWriter;

use wblidar::{
    copc::{CopcNodePointOrdering, CopcWriter, CopcWriterConfig},
    io::PointWriter,
    PointRecord,
};

fn main() -> wblidar::Result<()> {
    let out = BufWriter::new(File::create("out.copc.laz")?);
    let cfg = CopcWriterConfig {
        center_x: 500000.0,
        center_y: 6000000.0,
        center_z: 100.0,
        halfsize: 500.0,
        spacing: 5.0,
        node_point_ordering: CopcNodePointOrdering::Auto,
        ..CopcWriterConfig::default()
    };

    let mut writer = CopcWriter::new(out, cfg);
    writer.write_point(&PointRecord::default())?;
    writer.finish()?;
    Ok(())
}

6) Optional Parallel LAZ Decode (Feature-Gated)

This example shows feature-gated parallel LAZ decode for high-volume workloads where chunk-level parallelism can improve throughput.

// Requires Cargo feature: parallel or laz-parallel
use std::fs::File;
use std::io::BufReader;

use wblidar::laz::reader::LazReader;

fn main() -> wblidar::Result<()> {
    let input = BufReader::new(File::open("input.laz")?);
    let mut reader = LazReader::new(input)?;

    #[cfg(any(feature = "parallel", feature = "laz-parallel"))]
    {
        let points = reader.read_all_points_parallel()?;
        println!("decoded points: {}", points.len());
    }

    Ok(())
}

Architecture

At a high level:

  1. Common model: PointRecord is the central in-memory point representation.
  2. Traits: PointReader and PointWriter provide streaming semantics.
  3. Format modules: las, laz, copc, ply, e57 encapsulate format-specific details.
  4. Frontend: PointCloud and helper functions provide a unified API for common workflows.

Format notes:

  • LAS: direct structured read/write with VLR/CRS support.
  • LAZ: in-house LASzip-compatible codecs for Point10/Point14 families.
  • COPC: LAZ-backed octree hierarchy with COPC metadata/hierarchy pages.
  • PLY: ASCII and binary interchange for general point cloud exchange.
  • E57: standards-oriented reader/writer with integrity checks.

Performance Notes

  • wblidar uses SIMD in hot numeric paths where safe and beneficial.
  • Optional parallelism is feature-gated and thresholded to avoid regressions on small jobs.
  • Streaming APIs are the default path for low-memory workflows.
  • Some decode/encode paths intentionally trade memory for correctness and interoperability.

Point14 compression_level Behavior

LazWriterConfig::compression_level is now effective for Point14-family LAZ writes. It tunes the effective chunk target size used during encoding:

  • Lower levels favor smaller chunks (often faster writes, sometimes slightly larger files).
  • Higher levels favor larger chunks (often slightly better compression, potentially more memory/latency per flush).

Current mapping (base chunk_size = configured chunk_size):

Level Effective chunk target
0 chunk_size / 2
1 2 * chunk_size / 3
2 3 * chunk_size / 4
3-6 chunk_size
7 5 * chunk_size / 4
8 3 * chunk_size / 2
9 2 * chunk_size

Notes:

  • This behavior currently applies to Point14-family LAZ writes.
  • Point10 paths continue to use the configured chunk size directly.
  • COPC compression_level remains independent of this LAZ chunk-size tuning.

Useful environment knobs:

  • WBLIDAR_COPC_PARALLEL_MIN_NODES (default: 16, requires parallel or copc-parallel): Minimum number of COPC nodes required before parallel node encoding is considered. Effective threshold is max(WBLIDAR_COPC_PARALLEL_MIN_NODES, 2 * rayon_thread_count). Increase to reduce thread overhead on smaller jobs; decrease to parallelize sooner.
  • WBLIDAR_COPC_PARALLEL_MIN_POINTS (default: 400000, requires parallel or copc-parallel): Minimum total points across candidate COPC nodes before parallel node encoding is used. Increase to keep more workloads serial; decrease to enable parallel encoding for smaller datasets.
  • WBLIDAR_COPC_PARALLEL_SORT_MIN_POINTS (default: 80000, requires parallel or copc-parallel): Minimum per-node point count before Morton/Hilbert code sorting switches to parallel sort. Increase to favor serial sort on medium nodes; decrease to parallelize sort earlier.
  • WBLIDAR_LAZ_PARALLEL_MIN_CHUNKS (default: 4, requires parallel or laz-parallel): Minimum non-empty LAZ chunks required before read_all_points_parallel() uses parallel decode. Increase to avoid parallel overhead on files with few chunks; decrease to parallelize earlier.
  • WBLIDAR_LAZ_PARALLEL_MIN_POINTS (default: 200000, requires parallel or laz-parallel): Minimum total points required before read_all_points_parallel() uses parallel decode. Increase to keep more files on serial fallback; decrease to use parallel decode more aggressively.

Using the Environment Knobs

Set knobs inline for a single command:

WBLIDAR_COPC_PARALLEL_MIN_NODES=24 \
WBLIDAR_COPC_PARALLEL_MIN_POINTS=600000 \
WBLIDAR_COPC_PARALLEL_SORT_MIN_POINTS=120000 \
cargo run -p wblidar --features "copc-parallel" --example copc_parity_benchmark_csv -- input.las out_prefix

For normal builds, prefer --features "parallel"; keep copc-parallel for COPC-only benchmarking or regression isolation.

Set LAZ knobs for a single parallel-decode run:

WBLIDAR_LAZ_PARALLEL_MIN_CHUNKS=8 \
WBLIDAR_LAZ_PARALLEL_MIN_POINTS=500000 \
cargo run -p wblidar --features "laz-parallel" --example laz_parallel_parity_benchmark -- input.laz /tmp/laz_bench

For normal builds, prefer --features "parallel"; keep laz-parallel for LAZ-only benchmarking or regression isolation.

Export knobs for the current shell session:

export WBLIDAR_COPC_PARALLEL_MIN_NODES=16
export WBLIDAR_COPC_PARALLEL_MIN_POINTS=400000
export WBLIDAR_COPC_PARALLEL_SORT_MIN_POINTS=80000
export WBLIDAR_LAZ_PARALLEL_MIN_CHUNKS=4
export WBLIDAR_LAZ_PARALLEL_MIN_POINTS=200000

Quick starting presets:

Preset COPC Min Nodes COPC Min Points COPC Sort Min Points LAZ Min Chunks LAZ Min Points When to Use
Conservative 32 1000000 160000 12 1000000 Prioritize predictable serial behavior on mixed or smaller jobs
Balanced (default-like) 16 400000 80000 4 200000 Good first choice for most workloads
Aggressive 8 150000 40000 2 100000 Favor parallelism earlier on large multi-core systems

Notes:

  • Knobs are read once per process startup; restart your process to apply changed values.
  • Knobs only affect feature-enabled code paths (parallel, copc-parallel, and laz-parallel).

Known Limitations

  • wblidar focuses on I/O and format correctness, not higher-level LiDAR processing algorithms.
  • COPC payloads are Point14-family; some LAS 1.5-specific fields are promoted or omitted when mapping to COPC-compatible formats.
  • Legacy wb-native LAZ DEFLATE paths are intentionally out of scope; standards LASzip-compatible paths are used.
  • Some Point14-heavy paths can require substantial memory because layered decode/encode may materialize large in-memory buffers.
  • COPC writing is batch-oriented; appending incremental updates to an existing COPC file is not currently supported.
  • COPC node ordering is configurable (Auto, Morton, Hilbert) but not yet auto-tuned per dataset.
  • Partial Point14 handling defaults to lenient recovery; strict failure mode is opt-in via WBLIDAR_FAIL_ON_PARTIAL_POINT14.
  • LAZ parallel decode tuning knobs apply to read_all_points_parallel(); regular streaming read_point() remains serial.
  • External interoperability validation is strong but still benefits from broader real-world fixture coverage across toolchains.
  • Some advanced paths are feature-gated (copc-http, parallel, and the granular parallel modes) and are not enabled by default.
  • Performance characteristics vary by file structure (for example, chunking strategy can limit parallel speedups on some LAZ datasets).

Validation and Interoperability

Internal validation checklists and QA procedures are maintained in docs/internal/. These cover external interoperability workflows (PDAL, LAStools, validate.copc.io) and are intended for maintainers rather than library users.

Suggested Additional Sections (Optional)

If you want to expand this README further, the highest-value additions would be:

  • Versioning/compatibility policy (what constitutes a breaking API change).
  • Error-handling guide (common Error variants and recovery guidance).
  • Benchmark methodology (how performance claims are measured).
  • Contributing guide pointer (coding conventions, tests, fixture requirements).

License

Licensed under either of Apache License 2.0 or MIT License at your option.