# Task 014: Performance Optimization
**Priority**: ⚡ High Impact
**Phase**: 2-3 (Foundation for Scale)
**Estimated Effort**: 3-4 weeks
**Dependencies**: Task 001 (Cargo Integration), existing core functionality
## **Objective**
Optimize workspace_tools performance to handle large-scale projects, complex workspace hierarchies, and high-frequency operations efficiently. Ensure the library scales from small personal projects to enterprise monorepos without performance degradation.
## **Performance Targets**
### **Micro-benchmarks**
- Workspace resolution: < 1ms (currently ~5ms)
- Path joining operations: < 100μs (currently ~500μs)
- Standard directory access: < 50μs (currently ~200μs)
- Configuration loading: < 5ms for 1KB files (currently ~20ms)
- Resource discovery (glob): < 100ms for 10k files (currently ~800ms)
### **Macro-benchmarks**
- Zero cold-start overhead in build scripts
- Memory usage: < 1MB additional heap allocation
- Support 100k+ files in workspace without degradation
- Handle 50+ nested workspace levels efficiently
- Concurrent access from 100+ threads without contention
### **Real-world Performance**
- Large monorepos (Rust compiler scale): < 10ms initialization
- CI/CD environments: < 2ms overhead per invocation
- IDE integration: < 1ms for autocomplete/navigation
- Hot reload scenarios: < 500μs for path resolution
## **Technical Requirements**
### **Core Optimizations**
1. **Lazy Initialization and Caching**
- Lazy workspace detection with memoization
- Path resolution result caching
- Standard directory path pre-computation
2. **Memory Optimization**
- String interning for common paths
- Compact data structures
- Memory pool allocation for frequent operations
3. **I/O Optimization**
- Asynchronous file operations where beneficial
- Batch filesystem calls
- Efficient directory traversal algorithms
4. **Algorithmic Improvements**
- Fast workspace root detection using heuristics
- Optimized glob pattern matching
- Efficient path canonicalization
## **Implementation Steps**
### **Phase 1: Benchmarking and Profiling** (Week 1)
#### **Comprehensive Benchmark Suite**
```rust
// benches/workspace_performance.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
use workspace_tools::{workspace, Workspace};
use std::path::PathBuf;
use std::sync::Arc;
use tempfile::TempDir;
fn bench_workspace_resolution(c: &mut Criterion) {
let (_temp_dir, test_ws) = create_large_test_workspace();
std::env::set_var("WORKSPACE_PATH", test_ws.root());
c.bench_function("workspace_resolution_cold", |b| {
b.iter(|| {
// Simulate cold start by clearing any caches
workspace_tools::clear_caches();
let ws = workspace().unwrap();
black_box(ws.root());
})
});
c.bench_function("workspace_resolution_warm", |b| {
let ws = workspace().unwrap(); // Prime the cache
b.iter(|| {
let ws = workspace().unwrap();
black_box(ws.root());
})
});
}
fn bench_path_operations(c: &mut Criterion) {
let (_temp_dir, test_ws) = create_large_test_workspace();
let ws = workspace().unwrap();
let paths = vec![
"config/app.toml",
"data/cache/sessions.db",
"logs/application.log",
"docs/api/reference.md",
"tests/integration/user_tests.rs",
];
c.bench_function("path_joining", |b| {
b.iter_batched(
|| paths.clone(),
|paths| {
for path in paths {
black_box(ws.join(path));
}
},
BatchSize::SmallInput,
)
});
c.bench_function("standard_directories", |b| {
b.iter(|| {
black_box(ws.config_dir());
black_box(ws.data_dir());
black_box(ws.logs_dir());
black_box(ws.docs_dir());
black_box(ws.tests_dir());
})
});
}
fn bench_concurrent_access(c: &mut Criterion) {
let (_temp_dir, test_ws) = create_large_test_workspace();
let ws = Arc::new(workspace().unwrap());
c.bench_function("concurrent_path_resolution_10_threads", |b| {
b.iter(|| {
let handles: Vec<_> = (0..10)
.map(|i| {
let ws = ws.clone();
std::thread::spawn(move || {
for j in 0..100 {
let path = format!("config/service_{}.toml", i * 100 + j);
black_box(ws.join(&path));
}
})
})
.collect();
for handle in handles {
handle.join().unwrap();
}
})
});
}
#[cfg(feature = "glob")]
fn bench_resource_discovery(c: &mut Criterion) {
let (_temp_dir, test_ws) = create_large_test_workspace();
let ws = workspace().unwrap();
// Create test structure with many files
create_test_files(&test_ws, 10_000);
c.bench_function("glob_small_pattern", |b| {
b.iter(|| {
let results = ws.find_resources("src/**/*.rs").unwrap();
black_box(results.len());
})
});
c.bench_function("glob_large_pattern", |b| {
b.iter(|| {
let results = ws.find_resources("**/*.rs").unwrap();
black_box(results.len());
})
});
c.bench_function("glob_complex_pattern", |b| {
b.iter(|| {
let results = ws.find_resources("**/test*/**/*.{rs,toml,md}").unwrap();
black_box(results.len());
})
});
}
fn bench_memory_usage(c: &mut Criterion) {
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
struct TrackingAllocator {
allocated: AtomicUsize,
}
unsafe impl GlobalAlloc for TrackingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
let ret = System.alloc(layout);
if !ret.is_null() {
self.allocated.fetch_add(layout.size(), Ordering::Relaxed);
}
ret
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
System.dealloc(ptr, layout);
self.allocated.fetch_sub(layout.size(), Ordering::Relaxed);
}
}
#[global_allocator]
static ALLOCATOR: TrackingAllocator = TrackingAllocator {
allocated: AtomicUsize::new(0),
};
c.bench_function("memory_usage_workspace_creation", |b| {
b.iter_custom(|iters| {
let start_memory = ALLOCATOR.allocated.load(Ordering::Relaxed);
let start_time = std::time::Instant::now();
for _ in 0..iters {
let ws = workspace().unwrap();
black_box(ws);
}
let end_time = std::time::Instant::now();
let end_memory = ALLOCATOR.allocated.load(Ordering::Relaxed);
println!("Memory delta: {} bytes", end_memory - start_memory);
end_time.duration_since(start_time)
})
});
}
fn create_large_test_workspace() -> (TempDir, Workspace) {
let temp_dir = TempDir::new().unwrap();
let workspace_root = temp_dir.path();
// Create realistic directory structure
let dirs = [
"src/bin", "src/lib", "src/models", "src/routes", "src/services",
"tests/unit", "tests/integration", "tests/fixtures",
"config/environments", "config/schemas",
"data/cache", "data/state", "data/migrations",
"logs/application", "logs/access", "logs/errors",
"docs/api", "docs/guides", "docs/architecture",
"scripts/build", "scripts/deploy", "scripts/maintenance",
"assets/images", "assets/styles", "assets/fonts",
];
for dir in &dirs {
std::fs::create_dir_all(workspace_root.join(dir)).unwrap();
}
std::env::set_var("WORKSPACE_PATH", workspace_root);
let workspace = Workspace::resolve().unwrap();
(temp_dir, workspace)
}
fn create_test_files(workspace: &Workspace, count: usize) {
let base_dirs = ["src", "tests", "docs", "config"];
let extensions = ["rs", "toml", "md", "json"];
for i in 0..count {
let dir = base_dirs[i % base_dirs.len()];
let ext = extensions[i % extensions.len()];
let subdir = format!("subdir_{}", i / 100);
let filename = format!("file_{}.{}", i, ext);
let full_dir = workspace.join(dir).join(subdir);
std::fs::create_dir_all(&full_dir).unwrap();
let file_path = full_dir.join(filename);
std::fs::write(file_path, format!("// Test file {}\n", i)).unwrap();
}
}
criterion_group!(
workspace_benches,
bench_workspace_resolution,
bench_path_operations,
bench_concurrent_access,
);
#[cfg(feature = "glob")]
criterion_group!(
glob_benches,
bench_resource_discovery,
);
criterion_group!(
memory_benches,
bench_memory_usage,
);
#[cfg(feature = "glob")]
criterion_main!(workspace_benches, glob_benches, memory_benches);
#[cfg(not(feature = "glob"))]
criterion_main!(workspace_benches, memory_benches);
```
#### **Profiling Integration**
```rust
// profiling/src/lib.rs - Profiling utilities
use std::time::{Duration, Instant};
use std::sync::{Arc, Mutex};
use std::collections::HashMap;
#[derive(Debug, Clone)]
pub struct ProfileData {
pub name: String,
pub duration: Duration,
pub call_count: u64,
pub memory_delta: i64,
}
pub struct Profiler {
measurements: Arc<Mutex<HashMap<String, Vec<ProfileData>>>>,
}
impl Profiler {
pub fn new() -> Self {
Self {
measurements: Arc::new(Mutex::new(HashMap::new())),
}
}
pub fn measure<F, R>(&self, name: &str, f: F) -> R
where
F: FnOnce() -> R,
{
let start_time = Instant::now();
let start_memory = self.get_memory_usage();
let result = f();
let end_time = Instant::now();
let end_memory = self.get_memory_usage();
let profile_data = ProfileData {
name: name.to_string(),
duration: end_time.duration_since(start_time),
call_count: 1,
memory_delta: end_memory - start_memory,
};
let mut measurements = self.measurements.lock().unwrap();
measurements.entry(name.to_string())
.or_insert_with(Vec::new)
.push(profile_data);
result
}
fn get_memory_usage(&self) -> i64 {
// Platform-specific memory usage measurement
#[cfg(target_os = "linux")]
{
use std::fs;
let status = fs::read_to_string("/proc/self/status").unwrap_or_default();
for line in status.lines() {
if line.starts_with("VmRSS:") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 {
return parts[1].parse::<i64>().unwrap_or(0) * 1024; // Convert KB to bytes
}
}
}
}
0 // Fallback for unsupported platforms
}
pub fn report(&self) -> ProfilingReport {
let measurements = self.measurements.lock().unwrap();
let mut report = ProfilingReport::new();
for (name, data_points) in measurements.iter() {
let total_duration: Duration = data_points.iter().map(|d| d.duration).sum();
let total_calls = data_points.len() as u64;
let avg_duration = total_duration / total_calls.max(1) as u32;
let total_memory_delta: i64 = data_points.iter().map(|d| d.memory_delta).sum();
report.add_measurement(name.clone(), MeasurementSummary {
total_duration,
avg_duration,
call_count: total_calls,
memory_delta: total_memory_delta,
});
}
report
}
}
#[derive(Debug)]
pub struct ProfilingReport {
measurements: HashMap<String, MeasurementSummary>,
}
#[derive(Debug, Clone)]
pub struct MeasurementSummary {
pub total_duration: Duration,
pub avg_duration: Duration,
pub call_count: u64,
pub memory_delta: i64,
}
impl ProfilingReport {
fn new() -> Self {
Self {
measurements: HashMap::new(),
}
}
fn add_measurement(&mut self, name: String, summary: MeasurementSummary) {
self.measurements.insert(name, summary);
}
pub fn print_report(&self) {
println!("Performance Profiling Report");
println!("==========================");
println!();
let mut sorted: Vec<_> = self.measurements.iter().collect();
sorted.sort_by(|a, b| b.1.total_duration.cmp(&a.1.total_duration));
for (name, summary) in sorted {
println!("Function: {}", name);
println!(" Total time: {:?}", summary.total_duration);
println!(" Average time: {:?}", summary.avg_duration);
println!(" Call count: {}", summary.call_count);
println!(" Memory delta: {} bytes", summary.memory_delta);
println!();
}
}
}
// Global profiler instance
lazy_static::lazy_static! {
pub static ref GLOBAL_PROFILER: Profiler = Profiler::new();
}
// Convenience macro for profiling
#[macro_export]
macro_rules! profile {
($name:expr, $body:expr) => {
$crate::profiling::GLOBAL_PROFILER.measure($name, || $body)
};
}
```
### **Phase 2: Core Performance Optimizations** (Week 2)
#### **Lazy Initialization and Caching**
```rust
// Optimized workspace implementation with caching
use std::sync::{Arc, Mutex, OnceLock};
use std::collections::HashMap;
use std::path::{Path, PathBuf};
use parking_lot::RwLock; // Faster RwLock implementation
// Global workspace cache
static WORKSPACE_CACHE: OnceLock<Arc<RwLock<WorkspaceCache>>> = OnceLock::new();
#[derive(Debug)]
struct WorkspaceCache {
resolved_workspaces: HashMap<PathBuf, Arc<CachedWorkspace>>,
path_resolutions: HashMap<(PathBuf, PathBuf), PathBuf>,
standard_dirs: HashMap<PathBuf, StandardDirectories>,
}
impl WorkspaceCache {
fn new() -> Self {
Self {
resolved_workspaces: HashMap::new(),
path_resolutions: HashMap::new(),
standard_dirs: HashMap::new(),
}
}
fn get_or_compute_workspace<F>(&mut self, key: PathBuf, f: F) -> Arc<CachedWorkspace>
where
F: FnOnce() -> Result<Workspace>,
{
if let Some(cached) = self.resolved_workspaces.get(&key) {
return cached.clone();
}
// Compute new workspace
let workspace = f().unwrap_or_else(|_| Workspace::from_cwd());
let cached = Arc::new(CachedWorkspace::new(workspace));
self.resolved_workspaces.insert(key, cached.clone());
cached
}
}
#[derive(Debug)]
struct CachedWorkspace {
inner: Workspace,
standard_dirs: OnceLock<StandardDirectories>,
path_cache: RwLock<HashMap<PathBuf, PathBuf>>,
}
impl CachedWorkspace {
fn new(workspace: Workspace) -> Self {
Self {
inner: workspace,
standard_dirs: OnceLock::new(),
path_cache: RwLock::new(HashMap::new()),
}
}
fn standard_directories(&self) -> &StandardDirectories {
self.standard_dirs.get_or_init(|| {
StandardDirectories::new(self.inner.root())
})
}
fn join_cached(&self, path: &Path) -> PathBuf {
// Check cache first
{
let cache = self.path_cache.read();
if let Some(cached_result) = cache.get(path) {
return cached_result.clone();
}
}
// Compute and cache
let result = self.inner.root().join(path);
let mut cache = self.path_cache.write();
cache.insert(path.to_path_buf(), result.clone());
result
}
}
// Optimized standard directories with pre-computed paths
#[derive(Debug, Clone)]
pub struct StandardDirectories {
config: PathBuf,
data: PathBuf,
logs: PathBuf,
docs: PathBuf,
tests: PathBuf,
workspace: PathBuf,
cache: PathBuf,
tmp: PathBuf,
}
impl StandardDirectories {
fn new(workspace_root: &Path) -> Self {
Self {
config: workspace_root.join("config"),
data: workspace_root.join("data"),
logs: workspace_root.join("logs"),
docs: workspace_root.join("docs"),
tests: workspace_root.join("tests"),
workspace: workspace_root.join(".workspace"),
cache: workspace_root.join(".workspace/cache"),
tmp: workspace_root.join(".workspace/tmp"),
}
}
}
// Optimized workspace implementation
impl Workspace {
/// Fast workspace resolution with caching
pub fn resolve_cached() -> Result<Arc<CachedWorkspace>> {
let cache = WORKSPACE_CACHE.get_or_init(|| Arc::new(RwLock::new(WorkspaceCache::new())));
let current_dir = std::env::current_dir()
.map_err(|e| WorkspaceError::IoError(e.to_string()))?;
let mut cache_guard = cache.write();
Ok(cache_guard.get_or_compute_workspace(current_dir, || Self::resolve()))
}
/// Ultra-fast standard directory access
#[inline]
pub fn config_dir_fast(&self) -> &Path {
// Pre-computed path, no allocations
static CONFIG_DIR: OnceLock<PathBuf> = OnceLock::new();
CONFIG_DIR.get_or_init(|| self.root.join("config"))
}
/// Optimized path joining with string interning
pub fn join_optimized<P: AsRef<Path>>(&self, path: P) -> PathBuf {
let path = path.as_ref();
// Fast path for common directories
if let Some(std_dir) = self.try_standard_directory(path) {
return std_dir;
}
// Use cached computation for complex paths
self.root.join(path)
}
fn try_standard_directory(&self, path: &Path) -> Option<PathBuf> {
if let Ok(path_str) = path.to_str() {
match path_str {
"config" => Some(self.root.join("config")),
"data" => Some(self.root.join("data")),
"logs" => Some(self.root.join("logs")),
"docs" => Some(self.root.join("docs")),
"tests" => Some(self.root.join("tests")),
_ => None,
}
} else {
None
}
}
}
```
#### **String Interning for Path Performance**
```rust
// String interning system for common paths
use string_interner::{StringInterner, Sym};
use std::sync::Mutex;
static PATH_INTERNER: Mutex<StringInterner> = Mutex::new(StringInterner::new());
pub struct InternedPath {
symbol: Sym,
}
impl InternedPath {
pub fn new<P: AsRef<str>>(path: P) -> Self {
let mut interner = PATH_INTERNER.lock().unwrap();
let symbol = interner.get_or_intern(path.as_ref());
Self { symbol }
}
pub fn as_str(&self) -> &str {
let interner = PATH_INTERNER.lock().unwrap();
interner.resolve(self.symbol).unwrap()
}
pub fn to_path_buf(&self) -> PathBuf {
PathBuf::from(self.as_str())
}
}
// Memory pool for path allocations
use bumpalo::Bump;
use std::cell::RefCell;
thread_local! {
static PATH_ARENA: RefCell<Bump> = RefCell::new(Bump::new());
}
pub struct ArenaAllocatedPath<'a> {
path: &'a str,
}
impl<'a> ArenaAllocatedPath<'a> {
pub fn new(path: &str) -> Self {
PATH_ARENA.with(|arena| {
let bump = arena.borrow();
let allocated = bump.alloc_str(path);
Self { path: allocated }
})
}
pub fn as_str(&self) -> &str {
self.path
}
}
// Reset arena periodically
pub fn reset_path_arena() {
PATH_ARENA.with(|arena| {
arena.borrow_mut().reset();
});
}
```
### **Phase 3: I/O and Filesystem Optimizations** (Week 3)
#### **Async I/O Integration**
```rust
// Async workspace operations for high-performance scenarios
#[cfg(feature = "async")]
pub mod async_ops {
use super::*;
use tokio::fs;
use futures::stream::{self, StreamExt, TryStreamExt};
impl Workspace {
/// Asynchronously load multiple configuration files
pub async fn load_configs_batch<T>(&self, names: &[&str]) -> Result<Vec<T>>
where
T: serde::de::DeserializeOwned + Send + 'static,
{
let futures: Vec<_> = names.iter()
.map(|name| self.load_config_async(*name))
.collect();
futures::future::try_join_all(futures).await
}
/// Async configuration loading with caching
pub async fn load_config_async<T>(&self, name: &str) -> Result<T>
where
T: serde::de::DeserializeOwned + Send + 'static,
{
let config_path = self.find_config(name)?;
let content = fs::read_to_string(&config_path).await
.map_err(|e| WorkspaceError::IoError(e.to_string()))?;
// Deserialize on background thread to avoid blocking
let deserialized = tokio::task::spawn_blocking(move || {
serde_json::from_str(&content)
.map_err(|e| WorkspaceError::ConfigurationError(e.to_string()))
}).await
.map_err(|e| WorkspaceError::ConfigurationError(e.to_string()))??;
Ok(deserialized)
}
/// High-performance directory scanning
pub async fn scan_directory_fast(&self, pattern: &str) -> Result<Vec<PathBuf>> {
let base_path = self.root().to_path_buf();
let pattern = pattern.to_string();
tokio::task::spawn_blocking(move || {
use walkdir::WalkDir;
use glob::Pattern;
let glob_pattern = Pattern::new(&pattern)
.map_err(|e| WorkspaceError::GlobError(e.to_string()))?;
let results: Vec<PathBuf> = WalkDir::new(&base_path)
.into_iter()
.par_bridge() // Use rayon for parallel processing
.filter_map(|entry| entry.ok())
.filter(|entry| entry.file_type().is_file())
.filter(|entry| {
if let Ok(relative) = entry.path().strip_prefix(&base_path) {
glob_pattern.matches_path(relative)
} else {
false
}
})
.map(|entry| entry.path().to_path_buf())
.collect();
Ok(results)
}).await
.map_err(|e| WorkspaceError::ConfigurationError(e.to_string()))?
}
/// Batch file operations for workspace setup
pub async fn create_directories_batch(&self, dirs: &[&str]) -> Result<()> {
let futures: Vec<_> = dirs.iter()
.map(|dir| {
let path = self.join(dir);
async move {
fs::create_dir_all(&path).await
.map_err(|e| WorkspaceError::IoError(e.to_string()))
}
})
.collect();
futures::future::try_join_all(futures).await?;
Ok(())
}
/// Watch workspace for changes with debouncing
pub async fn watch_changes(&self) -> Result<impl Stream<Item = WorkspaceEvent>> {
use notify::{Watcher, RecommendedWatcher, RecursiveMode, Event, EventKind};
use tokio::sync::mpsc;
use std::time::Duration;
let (tx, rx) = mpsc::unbounded_channel();
let workspace_root = self.root().to_path_buf();
let mut watcher: RecommendedWatcher = notify::recommended_watcher(move |res| {
if let Ok(event) = res {
let workspace_event = match event.kind {
EventKind::Create(_) => WorkspaceEvent::Created(event.paths),
EventKind::Modify(_) => WorkspaceEvent::Modified(event.paths),
EventKind::Remove(_) => WorkspaceEvent::Removed(event.paths),
_ => WorkspaceEvent::Other(event),
};
let _ = tx.send(workspace_event);
}
}).map_err(|e| WorkspaceError::IoError(e.to_string()))?;
watcher.watch(&workspace_root, RecursiveMode::Recursive)
.map_err(|e| WorkspaceError::IoError(e.to_string()))?;
// Debounce events to avoid flooding
let debounced_stream = tokio_stream::wrappers::UnboundedReceiverStream::new(rx)
.debounce(Duration::from_millis(100));
Ok(debounced_stream)
}
}
#[derive(Debug, Clone)]
pub enum WorkspaceEvent {
Created(Vec<PathBuf>),
Modified(Vec<PathBuf>),
Removed(Vec<PathBuf>),
Other(notify::Event),
}
}
```
#### **Optimized Glob Implementation**
```rust
// High-performance glob matching
pub mod fast_glob {
use super::*;
use rayon::prelude::*;
use regex::Regex;
use std::sync::Arc;
pub struct FastGlobMatcher {
patterns: Vec<CompiledPattern>,
workspace_root: PathBuf,
}
#[derive(Debug, Clone)]
struct CompiledPattern {
regex: Regex,
original: String,
is_recursive: bool,
}
impl FastGlobMatcher {
pub fn new(workspace_root: PathBuf) -> Self {
Self {
patterns: Vec::new(),
workspace_root,
}
}
pub fn compile_pattern(&mut self, pattern: &str) -> Result<()> {
let regex_pattern = self.glob_to_regex(pattern)?;
let regex = Regex::new(®ex_pattern)
.map_err(|e| WorkspaceError::GlobError(e.to_string()))?;
self.patterns.push(CompiledPattern {
regex,
original: pattern.to_string(),
is_recursive: pattern.contains("**"),
});
Ok(())
}
pub fn find_matches(&self) -> Result<Vec<PathBuf>> {
let workspace_root = &self.workspace_root;
// Use parallel directory traversal
let results: Result<Vec<Vec<PathBuf>>> = self.patterns.par_iter()
.map(|pattern| {
self.find_matches_for_pattern(pattern, workspace_root)
})
.collect();
let all_matches: Vec<PathBuf> = results?
.into_iter()
.flatten()
.collect();
// Remove duplicates while preserving order
let mut seen = std::collections::HashSet::new();
let unique_matches: Vec<PathBuf> = all_matches
.into_iter()
.filter(|path| seen.insert(path.clone()))
.collect();
Ok(unique_matches)
}
fn find_matches_for_pattern(
&self,
pattern: &CompiledPattern,
root: &Path,
) -> Result<Vec<PathBuf>> {
use walkdir::WalkDir;
let mut results = Vec::new();
let walk_depth = if pattern.is_recursive { None } else { Some(3) };
let walker = if let Some(depth) = walk_depth {
WalkDir::new(root).max_depth(depth)
} else {
WalkDir::new(root)
};
// Process entries in parallel batches
let entries: Vec<_> = walker
.into_iter()
.filter_map(|e| e.ok())
.collect();
let batch_size = 1000;
for batch in entries.chunks(batch_size) {
let batch_results: Vec<PathBuf> = batch
.par_iter()
.filter_map(|entry| {
if let Ok(relative_path) = entry.path().strip_prefix(root) {
if pattern.regex.is_match(&relative_path.to_string_lossy()) {
Some(entry.path().to_path_buf())
} else {
None
}
} else {
None
}
})
.collect();
results.extend(batch_results);
}
Ok(results)
}
fn glob_to_regex(&self, pattern: &str) -> Result<String> {
let mut regex = String::new();
let mut chars = pattern.chars().peekable();
regex.push('^');
while let Some(ch) = chars.next() {
match ch {
'*' => {
if chars.peek() == Some(&'*') {
chars.next(); // consume second *
if chars.peek() == Some(&'/') {
chars.next(); // consume /
regex.push_str("(?:.*/)?"); // **/ -> zero or more directories
} else {
regex.push_str(".*"); // ** -> match everything
}
} else {
regex.push_str("[^/]*"); // * -> match anything except /
}
}
'?' => regex.push_str("[^/]"), // ? -> any single character except /
'[' => {
regex.push('[');
while let Some(bracket_char) = chars.next() {
regex.push(bracket_char);
if bracket_char == ']' {
break;
}
}
}
'.' | '+' | '(' | ')' | '{' | '}' | '^' | '$' | '|' | '\\' => {
regex.push('\\');
regex.push(ch);
}
_ => regex.push(ch),
}
}
regex.push('$');
Ok(regex)
}
}
}
```
### **Phase 4: Memory and Algorithmic Optimizations** (Week 4)
#### **Memory Pool Allocations**
```rust
// Custom allocator for workspace operations
pub mod memory {
use std::alloc::{alloc, dealloc, Layout};
use std::ptr::NonNull;
use std::sync::Mutex;
use std::collections::VecDeque;
const POOL_SIZES: &[usize] = &[32, 64, 128, 256, 512, 1024, 2048];
const POOL_CAPACITY: usize = 1000;
pub struct MemoryPool {
pools: Vec<Mutex<VecDeque<NonNull<u8>>>>,
}
impl MemoryPool {
pub fn new() -> Self {
let pools = POOL_SIZES.iter()
.map(|_| Mutex::new(VecDeque::with_capacity(POOL_CAPACITY)))
.collect();
Self { pools }
}
pub fn allocate(&self, size: usize) -> Option<NonNull<u8>> {
let pool_index = self.find_pool_index(size)?;
let mut pool = self.pools[pool_index].lock().unwrap();
if let Some(ptr) = pool.pop_front() {
Some(ptr)
} else {
// Pool is empty, allocate new memory
let layout = Layout::from_size_align(POOL_SIZES[pool_index], 8)
.ok()?;
unsafe {
let ptr = alloc(layout);
NonNull::new(ptr)
}
}
}
pub fn deallocate(&self, ptr: NonNull<u8>, size: usize) {
if let Some(pool_index) = self.find_pool_index(size) {
let mut pool = self.pools[pool_index].lock().unwrap();
if pool.len() < POOL_CAPACITY {
pool.push_back(ptr);
} else {
// Pool is full, actually deallocate
let layout = Layout::from_size_align(POOL_SIZES[pool_index], 8)
.unwrap();
unsafe {
dealloc(ptr.as_ptr(), layout);
}
}
}
}
fn find_pool_index(&self, size: usize) -> Option<usize> {
POOL_SIZES.iter().position(|&pool_size| size <= pool_size)
}
}
// Global memory pool instance
lazy_static::lazy_static! {
static ref GLOBAL_POOL: MemoryPool = MemoryPool::new();
}
// Custom allocator for PathBuf
#[derive(Debug)]
pub struct PooledPathBuf {
data: NonNull<u8>,
len: usize,
capacity: usize,
}
impl PooledPathBuf {
pub fn new(path: &str) -> Self {
let len = path.len();
let capacity = POOL_SIZES.iter()
.find(|&&size| len <= size)
.copied()
.unwrap_or(len.next_power_of_two());
let data = GLOBAL_POOL.allocate(capacity)
.expect("Failed to allocate memory");
unsafe {
std::ptr::copy_nonoverlapping(
path.as_ptr(),
data.as_ptr(),
len
);
}
Self { data, len, capacity }
}
pub fn as_str(&self) -> &str {
unsafe {
let slice = std::slice::from_raw_parts(self.data.as_ptr(), self.len);
std::str::from_utf8_unchecked(slice)
}
}
}
impl Drop for PooledPathBuf {
fn drop(&mut self) {
GLOBAL_POOL.deallocate(self.data, self.capacity);
}
}
}
```
#### **SIMD-Optimized Path Operations**
```rust
// SIMD-accelerated path operations where beneficial
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
pub mod simd_ops {
use std::arch::x86_64::*;
/// Fast path separator normalization using SIMD
pub unsafe fn normalize_path_separators_simd(path: &mut [u8]) -> usize {
let len = path.len();
let mut i = 0;
// Process 16 bytes at a time with AVX2
if is_x86_feature_detected!("avx2") {
let separator_mask = _mm256_set1_epi8(b'\\' as i8);
let replacement = _mm256_set1_epi8(b'/' as i8);
while i + 32 <= len {
let chunk = _mm256_loadu_si256(path.as_ptr().add(i) as *const __m256i);
let mask = _mm256_cmpeq_epi8(chunk, separator_mask);
let normalized = _mm256_blendv_epi8(chunk, replacement, mask);
_mm256_storeu_si256(path.as_mut_ptr().add(i) as *mut __m256i, normalized);
i += 32;
}
}
// Handle remaining bytes
while i < len {
if path[i] == b'\\' {
path[i] = b'/';
}
i += 1;
}
len
}
/// Fast string comparison for path matching
pub unsafe fn fast_path_compare(a: &[u8], b: &[u8]) -> bool {
if a.len() != b.len() {
return false;
}
let len = a.len();
let mut i = 0;
// Use SSE2 for fast comparison
if is_x86_feature_detected!("sse2") {
while i + 16 <= len {
let a_chunk = _mm_loadu_si128(a.as_ptr().add(i) as *const __m128i);
let b_chunk = _mm_loadu_si128(b.as_ptr().add(i) as *const __m128i);
let comparison = _mm_cmpeq_epi8(a_chunk, b_chunk);
let mask = _mm_movemask_epi8(comparison);
if mask != 0xFFFF {
return false;
}
i += 16;
}
}
// Compare remaining bytes
a[i..] == b[i..]
}
}
```
## **Success Criteria**
- [ ] All micro-benchmark targets met (1ms workspace resolution, etc.)
- [ ] Memory usage stays under 1MB additional allocation
- [ ] Zero performance regression in existing functionality
- [ ] 10x improvement in large workspace scenarios (>10k files)
- [ ] Concurrent access performance scales linearly up to 16 threads
- [ ] CI/CD integration completes in <2ms per invocation
## **Metrics to Track**
- Benchmark results across different project sizes
- Memory usage profiling
- Real-world performance in popular Rust projects
- User-reported performance improvements
- CI/CD build time impact
## **Future Performance Enhancements**
- GPU-accelerated glob matching for massive projects
- Machine learning-based path prediction and caching
- Integration with OS-level file system events for instant updates
- Compression of cached workspace metadata
- Background pre-computation of common operations
This comprehensive performance optimization ensures workspace_tools can scale from personal projects to enterprise monorepos without becoming a bottleneck.