docs.rs failed to build forge-filter-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
forge-filter
GPU filter+compact for Apple Silicon. 10x+ faster than Polars on numeric WHERE clauses, using Metal compute shaders.
use ;
let mut filter = new?;
let data: = .collect;
let result = filter.filter_u32?;
Benchmarks
Measured on Apple M4 Pro (20-core GPU, 48GB unified memory). Polars baseline: 5.8ms @ 16M u32.
filter_u32 @ 16M elements
| Mode | 50% sel. | vs Polars |
|---|---|---|
| Ordered | 848 us | 6.8x |
| Unordered | 574 us | 10.1x |
Selectivity sweep (ordered, 16M u32)
| Selectivity | Time | Mrows/s |
|---|---|---|
| 1% | 695 us | 23,022 |
| 10% | 735 us | 21,769 |
| 50% | 848 us | 18,868 |
| 90% | 935 us | 17,112 |
| 99% | 960 us | 16,667 |
Features
- 6 numeric types: u32, i32, f32, u64, i64, f64
- 7 predicates:
>,<,>=,<=,==,!=,BETWEEN - Compound predicates: AND/OR with automatic BETWEEN optimization
- Index output: get matching row indices for multi-column gather
- Unordered mode: 50% faster via atomic scatter (for aggregation queries)
- Zero-copy:
FilterBuffer<T>API for GPU-resident data pipelines
Requirements
- macOS with Apple Silicon (M1 or later)
- Metal 3.2 support
- Rust 1.70+
- Xcode Command Line Tools (for
xcrun metalshader compiler)
Usage
[]
= "0.1"
Simple (slice in, Vec out)
use ;
let mut filter = new?;
let result = filter.filter_u32?;
Zero-copy (FilterBuffer)
let mut filter = new?;
let mut buf = filter.;
buf.copy_from_slice;
let result = filter.filter?;
let filtered = result.as_slice;
Index output
let indices_result = filter.filter_indices?;
let indices: & = indices_result.indices.unwrap;
Unordered (faster for aggregation)
let result = filter.filter_unordered?;
// Same elements as ordered, but in arbitrary order — 50% faster
Algorithm
Fused 3-dispatch pipeline within a single Metal command encoder:
- Predicate + Scan — evaluate predicate per element, SIMD prefix sum, write TG totals
- Scan Partials — exclusive prefix sum of TG totals (hierarchical for >16M elements)
- Scatter — re-evaluate predicate, compute global write positions, scatter to output
Unordered mode uses a single dispatch with SIMD-aggregated atomics.
License
Dual-licensed.
- Open source: AGPL-3.0 — free for open-source projects that comply with AGPL terms.
- Commercial: Proprietary license available for closed-source / commercial use. Contact kavanagh.patrick@gmail.com for pricing.