Crate oxiblas_core

Crate oxiblas_core 

Source
Expand description

OxiBLAS Core - Foundational types and traits for OxiBLAS.

This crate provides the core infrastructure for OxiBLAS:

  • Scalar traits: Scalar, Real, ComplexScalar, Field for numeric types
  • SIMD abstraction: Custom SIMD layer via core::arch intrinsics
  • Memory management: Aligned allocation, stack-based temporaries
  • Parallelization: Work partitioning and parallel execution

§Supported Types

  • f32, f64: Real floating-point numbers
  • Complex32, Complex64: Complex numbers (via num-complex)

§SIMD Support

The SIMD abstraction automatically detects and uses the best available instruction set:

  • x86_64: AVX2 (256-bit), AVX512F (512-bit)
  • AArch64: NEON (128-bit), with 256-bit emulation
  • Fallback: Scalar operations for unsupported platforms

§Example

use oxiblas_core::scalar::{Scalar, Field};
use oxiblas_core::simd::detect_simd_level;

// Check SIMD capability
let level = detect_simd_level();
println!("SIMD level: {:?}", level);

// Use scalar traits
let x: f64 = 3.0;
let y: f64 = 4.0;
assert_eq!(x.abs_sq() + y.abs_sq(), 25.0);

Re-exports§

pub use blocking::BASE_CASE_THRESHOLD;
pub use blocking::BlockRange;
pub use blocking::BlockVisitor;
pub use blocking::MAX_BLOCK_SIZE;
pub use blocking::MIN_BLOCK_SIZE;
pub use blocking::RecursiveTask;
pub use blocking::cache_oblivious_traverse;
pub use blocking::factorization_panel_width;
pub use blocking::gemm_block_sizes;
pub use blocking::morton_decode;
pub use blocking::morton_index;
pub use blocking::trsm_block_size;
pub use memory::AlignedPool;
pub use memory::AlignedVec;
pub use memory::Alloc;
pub use memory::CACHE_LINE_SIZE;
pub use memory::DEFAULT_ALIGN;
pub use memory::Global;
pub use memory::MemStack;
pub use memory::MemoryPool;
pub use memory::NumaAllocHint;
pub use memory::NumaInterleavingStrategy;
pub use memory::NumaTopology;
pub use memory::NumaWorkHint;
pub use memory::PrefetchDistance;
pub use memory::PrefetchLocality;
pub use memory::StackReq;
pub use memory::get_huge_page_size;
pub use memory::get_page_size;
pub use memory::numa_alloc;
pub use memory::numa_alloc_zeroed;
pub use memory::numa_distribute_work;
pub use memory::prefetch_read;
pub use memory::prefetch_read_range;
pub use memory::prefetch_write;
pub use memory::prefetch_write_range;
pub use parallel::Par;
pub use parallel::ParThreshold;
pub use parallel::PoolScope;
pub use parallel::SequentialPool;
pub use parallel::ThreadPool;
pub use parallel::WorkRange;
pub use parallel::default_pool;
pub use parallel::for_each_indexed;
pub use parallel::for_each_range;
pub use parallel::map_reduce;
pub use parallel::partition_work;
pub use parallel::with_default_pool;
pub use scalar::C32;
pub use scalar::C64;
pub use scalar::ComplexExt;
pub use scalar::ComplexScalar;
pub use scalar::ExtendedPrecision;
pub use scalar::Field;
pub use scalar::HasFastFma;
pub use scalar::I32;
pub use scalar::I64;
pub use scalar::KBKSum;
pub use scalar::KahanSum;
pub use scalar::Real;
pub use scalar::Scalar;
pub use scalar::ScalarBatch;
pub use scalar::ScalarClass;
pub use scalar::ScalarClassify;
pub use scalar::SimdCompatible;
pub use scalar::ToComplex;
pub use scalar::UnrollHints;
pub use scalar::c32;
pub use scalar::c64;
pub use scalar::from_polar;
pub use scalar::from_polar32;
pub use scalar::imag;
pub use scalar::imag_unit;
pub use scalar::imag_unit32;
pub use scalar::imag32;
pub use scalar::pairwise_sum;
pub use scalar::real;
pub use scalar::real32;
pub use simd::SimdChunks;
pub use simd::SimdLevel;
pub use simd::SimdRegister;
pub use simd::SimdScalar;
pub use simd::detect_simd_level;
pub use simd::detect_simd_level_raw;
pub use tuning::AutoTuner;
pub use tuning::TuningCache;
pub use tuning::TuningConfig;

Modules§

blocking
Cache-oblivious blocking utilities.
memory
Memory management utilities for OxiBLAS.
parallel
Parallelization primitives for OxiBLAS.
prelude
Prelude module for convenient imports.
scalar
Scalar traits for numeric types used in OxiBLAS.
simd
SIMD abstraction layer for OxiBLAS.
tuning
Auto-tuning utilities for optimal block sizes and algorithm selection.

Macros§

simd_dispatch
Dispatch macro for runtime SIMD level selection.
stack_req_all
Combines multiple stack requirements (all must be satisfied).
stack_req_any
Takes the maximum of multiple stack requirements.