docs.rs failed to build bitnet-core-0.2.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
BitNet Core
The core foundation library for BitNet neural networks, providing sophisticated memory management, device abstraction, tensor infrastructure, and GPU acceleration optimized for Apple Silicon and high-performance computing.
๐ฏ Purpose
bitnet-core
serves as the foundational layer for the BitNet ecosystem, focusing on:
- Advanced Memory Management: Production-ready hybrid memory pool system
- Device Abstraction: Unified interface for CPU, Metal GPU, and future accelerators
- Metal GPU Acceleration: Complete Metal compute pipeline with shader compilation
- Tensor Infrastructure: Basic tensor operations and metadata management
- Performance Optimization: Zero-copy operations and SIMD-friendly data structures
โ What's Implemented
๐ข Memory Management System (Production Ready)
Hybrid Memory Pool Architecture
- SmallBlockPool: Fixed-size allocation for blocks < 1MB with O(1) operations
- LargeBlockPool: Buddy allocation algorithm for blocks โฅ 1MB with coalescing
- DeviceSpecificPools: Separate memory pools for CPU and Metal GPU memory
- Thread Safety: Fine-grained locking with minimal contention
Advanced Memory Tracking
- Real-time Metrics: Allocation patterns, peak usage, fragmentation analysis
- Memory Pressure Detection: Automatic detection of memory pressure with callbacks
- Leak Detection: Comprehensive tracking of unreleased allocations
- Performance Profiling: Timeline analysis and allocation pattern recognition
Automatic Cleanup System
- Intelligent Compaction: Automatic memory defragmentation
- Configurable Strategies: Idle, pressure-based, and periodic cleanup
- Device-Specific Cleanup: Optimized cleanup for different device types
- Safety Validation: Prevents corruption of active tensors
๐ข Device Abstraction Layer (Production Ready)
Device Management
- Automatic Device Selection: Intelligent selection of optimal compute device
- Device Capabilities: Runtime detection of device features and limitations
- Memory Bandwidth Detection: Automatic detection of memory bandwidth characteristics
- Cross-Platform Support: Unified API across different hardware platforms
Device-Specific Optimizations
- CPU Optimizations: Cache-friendly memory layouts and SIMD alignment
- Metal GPU Support: Optimized memory management for Apple Silicon GPUs
- Future Extensibility: Architecture ready for CUDA and other accelerators
๐ข Metal GPU Acceleration (Production Ready)
Metal Compute Pipeline
- Device Management: Automatic Metal device detection and initialization
- Command Buffer Management: Advanced command buffer pooling and lifecycle management
- Shader Compilation: Dynamic Metal shader compilation with caching
- Pipeline Creation: Automatic compute pipeline state management
BitNet-Specific Shaders
- BitLinear Operations: GPU-accelerated BitLinear forward/backward passes
- Quantization Kernels: 1-bit weight and 8-bit activation quantization
- Activation Functions: Optimized ReLU, GELU, Swish, Sigmoid, Tanh, and more
- Mixed Precision: Support for mixed precision operations
Advanced Metal Features
- Buffer Pooling: High-performance Metal buffer allocation and reuse
- Synchronization: Events, fences, and sync points for GPU operations
- Resource Tracking: Automatic dependency management for GPU resources
- Error Handling: Comprehensive error recovery and validation
๐ก Tensor Infrastructure (Basic Implementation)
Tensor Metadata System
- BitNetDType: Custom data types optimized for quantized operations
- TensorMetadata: Comprehensive tensor shape, stride, and device information
- TensorHandle: Safe reference counting and lifetime management
- Memory Layout: Optimized memory layouts for different tensor operations
Basic Tensor Operations
- Tensor Creation: Basic tensor allocation and initialization
- Memory Management: Integration with the hybrid memory pool system
- Device Placement: Automatic tensor placement on appropriate devices
- Metadata Tracking: Comprehensive tracking of tensor properties
๐ด What Needs Implementation
High Priority
-
Advanced Tensor Operations
- Matrix multiplication optimizations
- Element-wise operations (add, mul, etc.)
- Reduction operations (sum, mean, max, etc.)
- Broadcasting and reshaping operations
-
SIMD Optimizations
- AVX2/AVX-512 implementations for x86_64
- NEON optimizations for ARM64
- Auto-vectorization hints and intrinsics
-
Memory Layout Optimizations
- Strided tensor support
- Memory-efficient tensor views
- Zero-copy tensor slicing
Medium Priority
-
Advanced Device Features
- Multi-GPU support and load balancing
- Device-to-device memory transfers
- Asynchronous operations and streams
-
Performance Monitoring
- Detailed performance counters
- Operation-level profiling
- Memory bandwidth utilization tracking
-
Error Handling
- Comprehensive error recovery
- Graceful degradation on memory pressure
- Device failure handling
Low Priority
-
Serialization Support
- Tensor serialization/deserialization
- Memory pool state persistence
- Cross-platform compatibility
-
Advanced Memory Features
- Memory-mapped file support
- Shared memory between processes
- Memory compression for inactive tensors
๐ Quick Start
Metal GPU Acceleration
use *;
// Initialize Metal context
let = initialize_metal_context?;
println!;
// Create BitNet shader collection
let shaders = new?;
// Create and execute a ReLU operation
let input_data = vec!;
let input_buffer = create_buffer?;
let output_buffer = create_empty_buffer?;
// Create command buffer and encoder
let command_buffer = command_queue.new_command_buffer;
let encoder = shaders.create_compute_encoder_with_pipeline?;
// Set buffers and dispatch
encoder.set_buffer;
encoder.set_buffer;
set_compute_bytes;
let = shaders.calculate_dispatch_params?;
dispatch_compute;
encoder.end_encoding;
command_buffer.commit;
command_buffer.wait_until_completed;
// Read results
let output_data: = read_buffer?;
println!; // [1.0, 0.0, 3.0, 0.0]
Basic Memory Pool Usage
use ;
use auto_select_device;
// Create memory pool with default configuration
let pool = new?;
let device = auto_select_device;
// Allocate 1MB of memory with 64-byte alignment
let handle = pool.allocate?;
// Get memory metrics
let metrics = pool.get_metrics;
println!;
println!;
// Deallocate memory
pool.deallocate?;
Advanced Memory Tracking
use ;
// Configure advanced tracking
let mut config = default;
config.enable_advanced_tracking = true;
config.tracking_config = Some;
let pool = with_config?;
// Register pressure callback
pool.register_pressure_callback;
// Get detailed metrics
if let Some = pool.get_detailed_metrics
Advanced Metal Operations
use *;
// Initialize with custom configuration
let config = ShaderCompilerConfig ;
let shaders = new_with_config?;
// Execute BitLinear forward pass
let encoder = create_bitlinear_forward_encoder?;
dispatch_bitlinear_forward;
// Execute quantization
let quant_encoder = create_quantization_encoder?;
dispatch_quantization;
Device Abstraction
use ;
// Automatic device selection
let device = auto_select_device;
println!;
// Check device capabilities
let caps = for_device;
println!;
println!;
Basic Tensor Operations
use ;
use auto_select_device;
let device = auto_select_device;
let pool = new?;
// Create tensor metadata
let metadata = new;
// Create tensor
let tensor = new?;
println!;
println!;
๐ Performance Characteristics
Metal GPU Performance (Apple M1 Pro)
Operation | Throughput | Latency | Notes |
---|---|---|---|
Buffer Creation | 1000+ ops/sec | ~1ms | Includes data transfer |
Shader Compilation | 10-50 shaders/sec | ~20-100ms | Cached after first compile |
Command Buffer | 10,000+ ops/sec | ~100ฮผs | Pooled and reused |
ReLU Forward | 50+ GB/s | <1ms | 1M elements |
BitLinear Forward | 20+ GB/s | ~2ms | Depends on matrix size |
Quantization | 30+ GB/s | ~1ms | 1-bit weights, 8-bit activations |
Memory Pool Performance (Apple M1 Pro)
Operation | Small Blocks (<1MB) | Large Blocks (โฅ1MB) |
---|---|---|
Allocation | ~50 ns | ~200 ns |
Deallocation | ~30 ns | ~150 ns |
Throughput | 20M ops/sec | 5M ops/sec |
Memory Overhead | <2% | <1% |
Memory Tracking Overhead
Tracking Level | CPU Overhead | Memory Overhead |
---|---|---|
None | 0% | 0% |
Basic | <1% | <0.1% |
Standard | ~2% | ~0.5% |
Detailed | ~5% | ~1% |
๐๏ธ Architecture
Memory Management Architecture
HybridMemoryPool
โโโ SmallBlockPool (< 1MB allocations)
โ โโโ Fixed-size block allocation
โ โโโ Fast O(1) allocation/deallocation
โ โโโ Minimal fragmentation
โโโ LargeBlockPool (โฅ 1MB allocations)
โ โโโ Buddy allocation algorithm
โ โโโ Efficient large block handling
โ โโโ Memory coalescing
โโโ DeviceSpecificPools
โ โโโ CPU memory pools
โ โโโ Metal GPU memory pools
โ โโโ Future: CUDA memory pools
โโโ AdvancedTracking
โโโ Memory pressure detection
โโโ Allocation pattern analysis
โโโ Leak detection and reporting
โโโ Performance profiling
Module Structure
bitnet-core/src/
โโโ device/ # Device abstraction layer
โ โโโ mod.rs # Device selection and capabilities
โโโ memory/ # Memory management system
โ โโโ mod.rs # Main memory pool interface
โ โโโ small_block.rs # Small block allocator
โ โโโ large_block.rs # Large block allocator
โ โโโ device_pool.rs # Device-specific pools
โ โโโ handle.rs # Memory handle management
โ โโโ metrics.rs # Memory metrics and monitoring
โ โโโ tracking/ # Advanced memory tracking
โ โ โโโ mod.rs # Tracking system interface
โ โ โโโ tracker.rs # Main tracking implementation
โ โ โโโ patterns.rs # Allocation pattern analysis
โ โ โโโ pressure.rs # Memory pressure detection
โ โ โโโ timeline.rs # Timeline analysis
โ โ โโโ profiler.rs # Performance profiling
โ โ โโโ config.rs # Tracking configuration
โ โโโ cleanup/ # Automatic cleanup system
โ โ โโโ mod.rs # Cleanup system interface
โ โ โโโ manager.rs # Cleanup manager
โ โ โโโ scheduler.rs # Cleanup scheduling
โ โ โโโ strategies.rs # Cleanup strategies
โ โ โโโ metrics.rs # Cleanup metrics
โ โ โโโ config.rs # Cleanup configuration
โ โ โโโ device_cleanup.rs # Device-specific cleanup
โ โโโ tensor/ # Tensor memory management
โ โโโ mod.rs # Tensor system interface
โ โโโ tensor.rs # Tensor implementation
โ โโโ handle.rs # Tensor handle management
โ โโโ metadata.rs # Tensor metadata
โ โโโ dtype.rs # BitNet data types
โโโ metal/ # Metal GPU acceleration
โ โโโ mod.rs # Metal device and command buffer management
โ โโโ shader_compiler.rs # Dynamic shader compilation and caching
โ โโโ shader_utils.rs # High-level BitNet shader utilities
โ โโโ shaders/ # Metal compute shaders
โ โโโ README.md # Shader documentation
โ โโโ bitlinear.metal # BitLinear layer operations
โ โโโ quantization.metal # Quantization kernels
โ โโโ activation.metal # Activation functions
โโโ tensor/ # Basic tensor operations
โ โโโ mod.rs # Tensor operation interface
โโโ lib.rs # Library root and re-exports
๐งช Testing
Run the comprehensive test suite:
# Run all tests
# Run specific test modules
# Run with detailed output
# Run Metal-specific tests (macOS only)
# Run integration tests
Running Examples
# Metal shader compilation demo
# Memory tracking demo
# Cleanup system demo
# Tensor lifecycle demo
๐ Benchmarks
Run performance benchmarks:
# Run all benchmarks
# Run memory-specific benchmarks
# Generate benchmark reports
๐ง Configuration
Metal GPU Configuration
use *;
// Shader compiler configuration
let shader_config = ShaderCompilerConfig ;
// Command buffer pool configuration
let cb_config = CommandBufferPoolConfig ;
// Buffer pool configuration
let buffer_config = BufferPoolConfig ;
// Create configured Metal context
let = initialize_metal_context?;
let shaders = new_with_config?;
let manager = create_command_buffer_manager_with_config;
let buffer_pool = create_buffer_pool_with_config;
Memory Pool Configuration
use ;
let config = MemoryPoolConfig ;
let pool = with_config?;
๐ค Contributing
Contributions are welcome! Priority areas for bitnet-core
:
- Metal Shaders: Add new BitNet-specific compute kernels
- Tensor Operations: Implement missing tensor operations
- SIMD Optimizations: Add platform-specific optimizations
- Device Support: Extend device abstraction for new hardware
- Performance: Optimize critical paths and reduce overhead
Metal Development
When contributing Metal shaders:
- Add
.metal
files tosrc/metal/shaders/
- Update
BitNetShaderFunction
enum - Add function mapping in
shader_utils.rs
- Include comprehensive tests and benchmarks
- Document shader parameters and usage
See the main project README for contribution guidelines.
๐ License
Licensed under the MIT License. See LICENSE for details.