block-mesh-bgm

block-mesh-bgm is a companion crate for block-mesh.

It provides a block_mesh::greedy_quads-style API backed by a binary-mask greedy meshing implementation designed for high performance and low overhead.

API At A Glance

binary_greedy_quads for the maximum-merge fast path
binary_greedy_quads_ao_safe when quad boundaries must stay compatible with per-vertex ambient occlusion

Goals

Match block_mesh::greedy_quads visible-face geometry for supported inputs
Stay close to block_mesh::greedy_quads quad counts while prioritizing speed
Reuse block_mesh public types (QuadBuffer, UnorientedQuad, OrientedBlockFace)
Avoid voxel remapping or intermediate buffer conversions
Keep AO-safe meshing close to the performance of the non-AO path

How It Works

The public API mirrors block_mesh::greedy_quads, but the internal representation is different.

Treat the queried extent as a padded box
The outer one-voxel shell is only used to determine face visibility
Build compact occupancy columns
Each column is stored as a u64 bitmask representing voxels along one axis
Derive visible-face rows
Face visibility becomes simple bitwise comparisons between adjacent columns
Greedily merge visible cells
Merging only consults MergeVoxel::merge_value() when masks indicate a merge is possible

The 62-voxel interior limit exists because each padded axis must fit inside a single u64.

AO-Safe Meshing

binary_greedy_quads is the zero-extra-work fast path.

binary_greedy_quads_ao_safe enforces merge boundaries compatible with ambient occlusion shading. It does not compute AO values for you; it only preserves the boundaries that AO shading depends on.

Key Idea

Instead of computing AO first and attaching signatures to each face:

This implementation derives AO-compatible merge constraints directly from binary occupancy
Specifically, it examines the exterior plane of opaque voxels adjacent to each face
Using bitwise shifts and masks, it determines where merges would violate AO consistency

Merge Classification

From exterior-plane occupancy, each visible cell is classified into one of:

unit → must remain a single quad
horizontal → may merge only within its row
vertical → may merge only across rows (width = 1)
bidirectional → can use full greedy merging

This classification happens using whole-row bit operations before the hot merge loop.

Only the remaining bidirectional cells go through the full greedy carry merge.

What This Means

No AO signatures are computed or stored during meshing
No per-cell AO comparisons inside the merge loop
AO-safe constraints are enforced purely from topology

After meshing, AO values can still be computed per vertex in the usual way.

Why This Is Fast

Traditional AO-safe greedy meshing:

Computes AO per vertex or per face
Stores AO signatures
Compares signatures during merging

This implementation:

Uses bitwise row operations instead of per-cell AO computation
Removes AO as a data dependency during merging
Pushes most AO work outside the hot loop

In practice, this keeps AO-safe meshing much closer to the baseline fast path, and in some cases can outperform naive approaches even without AO.

Reading the Source

The crate is split by stage:

src/lib.rs — public API and pipeline
src/context.rs — query validation and layout
src/prep.rs — occupancy columns and visibility masks
src/merge.rs — greedy merging
src/ao.rs — AO-safe merge logic derived from occupancy masks
src/face.rs — face orientation mapping

Limitations

Interior query extents are limited to 62 voxels per axis
Transparency semantics match block_mesh::VoxelVisibility
Operates directly on the caller’s voxel slice (no repacking)

Example

use block_mesh::ndshape::{ConstShape, ConstShape3u32};
use block_mesh::{
    MergeVoxel, Voxel, VoxelVisibility, RIGHT_HANDED_Y_UP_CONFIG,
};
use block_mesh_bgm::{
    binary_greedy_quads, binary_greedy_quads_ao_safe, BinaryGreedyQuadsBuffer,
};

#[derive(Clone, Copy, Debug, Eq, PartialEq)]
struct BoolVoxel(bool);

const EMPTY: BoolVoxel = BoolVoxel(false);
const FULL: BoolVoxel = BoolVoxel(true);

impl Voxel for BoolVoxel {
    fn get_visibility(&self) -> VoxelVisibility {
        if *self == EMPTY {
            VoxelVisibility::Empty
        } else {
            VoxelVisibility::Opaque
        }
    }
}

impl MergeVoxel for BoolVoxel {
    type MergeValue = Self;

    fn merge_value(&self) -> Self::MergeValue {
        *self
    }
}

type ChunkShape = ConstShape3u32<18, 18, 18>;

let mut voxels = [EMPTY; ChunkShape::SIZE as usize];
for i in 0..ChunkShape::SIZE {
    let [x, y, z] = ChunkShape::delinearize(i);
    voxels[i as usize] = if ((x * x + y * y + z * z) as f32).sqrt() < 15.0 {
        FULL
    } else {
        EMPTY
    };
}

let mut buffer = BinaryGreedyQuadsBuffer::new();

binary_greedy_quads(
    &voxels,
    &ChunkShape {},
    [0; 3],
    [17; 3],
    &RIGHT_HANDED_Y_UP_CONFIG.faces,
    &mut buffer,
);

binary_greedy_quads_ao_safe(
    &voxels,
    &ChunkShape {},
    [0; 3],
    [17; 3],
    &RIGHT_HANDED_Y_UP_CONFIG.faces,
    &mut buffer,
);

Development

Release-quality changes should be validated with:

cargo test
cargo bench
cargo doc --no-deps
cargo check -p block-mesh-bgm-examples --examples
cargo package --allow-dirty --list

The benchmark suite always includes the main reference points:

visible_block_faces: the fast "one quad per visible face" baseline from block-mesh
greedy_quads: the upstream greedy implementation
binary_greedy_quads: this crate
binary_greedy_quads_ao_safe: the AO-safe path from this crate

That makes it easier to reason about where time is going: visible_block_faces is the speed target, while greedy_quads is the output-shape baseline.

Benchmark Snapshot

Recent local cargo bench --bench bench Criterion medians on my machine. Treat these as relative comparisons between meshers, not as universal absolute timings.

Core Cases

Case	`visible_block_faces`	`greedy_quads`	`binary_greedy_quads`
`dense-sphere`	`40.262 µs`	`252.25 µs`	`33.284 µs`
`translucent-sphere`	`40.621 µs`	`260.87 µs`	`34.820 µs`
`translucent-shell-sphere`	`38.191 µs`	`259.11 µs`	`35.486 µs`
`layered-caves`	`69.536 µs`	`742.53 µs`	`73.930 µs`
`checkerboard`	`74.162 µs`	`656.34 µs`	`67.541 µs`
`partial-extent`	`36.081 µs`	`231.45 µs`	`42.878 µs`
`translucent-mix`	`81.532 µs`	`830.92 µs`	`119.75 µs`
`layered-caves-2x2x2`	`612.30 µs`	`4.1293 ms`	`558.68 µs`

AO-Safe Cases

Case	`visible_block_faces`	`binary_greedy_quads`	`binary_greedy_quads_ao_safe`
`dense-sphere-ao`	`40.569 µs`	`33.737 µs`	`44.427 µs`
`translucent-shell-sphere-ao`	`38.530 µs`	`36.091 µs`	`48.287 µs`
`layered-caves-2x2x2`	`612.30 µs`	`558.68 µs`	`432.73 µs`
`ao-boundary-stress`	`18.419 µs`	`19.301 µs`	`29.456 µs`
`ao-unit-patterns`	`18.234 µs`	`19.128 µs`	`29.429 µs`

Useful takeaways from that run:

binary_greedy_quads is faster than visible_block_faces on dense-sphere, translucent-sphere, translucent-shell-sphere, checkerboard, and layered-caves-2x2x2.
It is still very close on layered-caves, and the main remaining misses are partial-extent and the deliberately hostile translucent-mix case.
binary_greedy_quads remains much faster than upstream greedy_quads across every benchmark in the suite.
binary_greedy_quads_ao_safe is slower than the vanilla path on the small AO-sensitive microbenchmarks, but on the real multi-chunk layered-caves-2x2x2 case it is currently faster than both visible_block_faces and vanilla binary.

Visual Examples

The workspace includes an examples_crate for visual inspection.

Build-check the examples with:

cargo check -p block-mesh-bgm-examples --examples

Run the side-by-side renderer with:

cargo run -p block-mesh-bgm-examples --example render

That example places visible_block_faces, greedy_quads, and binary_greedy_quads side-by-side and logs their quad counts. Press Space to toggle wireframe so you can switch between surface shading and quad layout. Press T to switch between the original striped opaque sphere and a solid-core sphere wrapped in translucent voxels. Press O to switch the translucent demo between AlphaToCoverage and camera-level OIT.

Run the ambient-occlusion viewer with:

cargo run -p block-mesh-bgm-examples --example ambient_occlusion

That example renders one opaque sphere with wireframe always enabled so you can inspect how AO-safe meshing changes the quad layout. Press U to toggle between binary greedy output and unit quads, S to toggle between an opaque sphere and an opaque torus, and A to toggle AO-safe merging together with AO vertex shading.

Run the bevy_voxel_world integration example with:

cargo run -p block-mesh-bgm-examples --example custom_meshing

That example is based on the bevy_voxel_world custom meshing demo, but swaps in this crate's binary greedy mesher. Press A to toggle ambient occlusion for the visible-faces path and AO-safe binary greedy meshing, so you can compare both shaded output and chunk timing inside the same world.

License

This crate follows the same dual-license model as block-mesh:

Apache License, Version 2.0, in LICENSE.Apache-2.0
MIT license, in LICENSE.MIT

You may choose either license.

block-mesh-bgm 0.2.1