ghostflow-core 1.1.0

Core tensor operations for GhostFlow ML framework - optimized for maximum performance
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
<div align="center">

# ๐ŸŒŠ GhostFlow


### *A High-Performance Machine Learning Framework Built in Rust*


[![PyPI](https://img.shields.io/pypi/v/ghost-flow.svg)](https://pypi.org/project/ghost-flow/)
[![Crates.io](https://img.shields.io/crates/v/ghost-flow.svg)](https://crates.io/crates/ghost-flow)
[![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
[![Rust](https://img.shields.io/badge/rust-1.70%2B-orange.svg)](https://www.rust-lang.org/)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-66%2F66%20passing-success.svg)]()
[![Downloads](https://img.shields.io/pypi/dm/ghost-flow.svg)](https://pypi.org/project/ghost-flow/)

**Available in Python and Rust โ€ข Hand-Optimized Kernels โ€ข 85+ ML Algorithms โ€ข Multi-Platform**

```bash
pip install ghostflow  # Python
cargo add ghost-flow   # Rust
npm install ghostflow-wasm  # JavaScript/WASM
```

[Features](#-features) โ€ข [Quick Start](#-quick-start) โ€ข [Examples](#-examples) โ€ข [Multi-Platform](#-multi-platform) โ€ข [Documentation](#-documentation)

</div>

---

## ๐ŸŽฏ Why GhostFlow?


GhostFlow is a **complete machine learning framework** built in Rust with Python bindings. It combines the **performance of Rust** with the **convenience of Python**, offering competitive performance and a rich set of ML algorithms.

### โœจ Key Highlights


- ๐Ÿฆ€ **Built in Rust** - Memory safety, zero-cost abstractions, and native performance
- ๐ŸŒ **Multi-Platform** - Web (WASM), Mobile (FFI), Desktop, Server, Embedded
- ๐Ÿ—ฃ๏ธ **Multi-Language** - Rust, JavaScript, C, C++, Python, Go, Java, and more
- ๐ŸŽฎ **GPU Acceleration** - CUDA support with optimized kernels for NVIDIA GPUs
- ๐Ÿง  **85+ ML Algorithms** - XGBoost, LightGBM, GMM, HMM, CRF, neural networks, and more
- ๐Ÿ›ก๏ธ **Memory Safe** - Rust's guarantees eliminate entire classes of bugs
- โšก **Optimized Operations** - SIMD vectorization and hand-tuned kernels
- ๐Ÿ“ฆ **Production Ready** - Quantization, distributed training, model serving
- ๐Ÿ”Œ **Easy Integration** - REST API, WASM, C FFI for any language

---

## ๐ŸŒŸ Features


### Core Capabilities


<table>
<tr>
<td width="50%">

#### ๐Ÿงฎ Tensor Operations

- Multi-dimensional arrays with broadcasting
- Efficient memory layout (row-major/column-major)
- SIMD-accelerated operations
- Automatic memory pooling
- Zero-copy views and slicing

</td>
<td width="50%">

#### ๐ŸŽ“ Neural Networks

- Linear, Conv2d, MaxPool2d layers
- ReLU, GELU, Sigmoid, Tanh activations
- BatchNorm, Dropout, LayerNorm
- MSE, CrossEntropy, BCE losses
- Custom layer support

</td>
</tr>
<tr>
<td>

#### ๐Ÿ”„ Automatic Differentiation

- Reverse-mode autodiff (backpropagation)
- Computational graph construction
- Gradient accumulation
- Higher-order derivatives
- Custom gradient functions

</td>
<td>

#### โšก Optimizers

- SGD with momentum & Nesterov
- Adam with AMSGrad
- AdamW with weight decay
- Learning rate schedulers
- Gradient clipping

</td>
</tr>
</table>

### Machine Learning Algorithms (77+)


<details>
<summary><b>๐Ÿ“Š Supervised Learning</b></summary>

- **Linear Models**: Linear Regression, Ridge, Lasso, ElasticNet, Logistic Regression
- **Tree-Based**: Decision Trees (CART), Random Forests, AdaBoost, Extra Trees
- **Gradient Boosting**: XGBoost-style, LightGBM-style with histogram-based learning
- **Support Vector Machines**: SVC, SVR with multiple kernels (RBF, Polynomial, Linear)
- **Naive Bayes**: Gaussian, Multinomial, Bernoulli
- **Nearest Neighbors**: KNN Classifier/Regressor with multiple distance metrics
- **Ensemble Methods**: Bagging, Boosting, Stacking, Voting

</details>

<details>
<summary><b>๐ŸŽฏ Unsupervised Learning</b></summary>

- **Clustering**: K-Means, DBSCAN, Hierarchical, Mean Shift, Spectral Clustering
- **Probabilistic Models**: Gaussian Mixture Models (GMM), Hidden Markov Models (HMM)
- **Dimensionality Reduction**: PCA, t-SNE, UMAP, LDA, ICA, NMF
- **Anomaly Detection**: Isolation Forest, One-Class SVM, Local Outlier Factor
- **Matrix Factorization**: SVD, NMF, Sparse PCA

</details>

<details>
<summary><b>๐Ÿง  Deep Learning</b></summary>

- **Architectures**: CNN, RNN, LSTM, GRU, Transformer, Attention
- **Layers**: Conv1d/2d/3d, TransposeConv2d, MaxPool, AvgPool, GroupNorm, InstanceNorm, BatchNorm, LayerNorm, Dropout
- **Activations**: ReLU, GELU, Swish, SiLU, Mish, ELU, SELU, Softplus, Sigmoid, Tanh, Softmax
- **Losses**: MSE, MAE, CrossEntropy, BCE, Focal Loss, Contrastive Loss, Triplet Loss, Huber Loss

</details>

<details>
<summary><b>๐Ÿ“ˆ Model Selection & Evaluation</b></summary>

- **Cross-Validation**: K-Fold, Stratified K-Fold, Time Series Split
- **Metrics**: Accuracy, Precision, Recall, F1, ROC-AUC, Confusion Matrix
- **Hyperparameter Tuning**: Bayesian Optimization, Random Search, Grid Search
- **Feature Selection**: SelectKBest, RFE, Feature Importance
- **Feature Engineering**: Polynomial Features, Feature Hashing, Target Encoding, One-Hot Encoding

</details>

<details>
<summary><b>๐Ÿ”ฎ Structured Prediction</b></summary>

- **Sequence Labeling**: Conditional Random Fields (CRF) for NER, POS tagging
- **State-Space Models**: Hidden Markov Models (HMM) with Viterbi decoding

</details>

### ๐ŸŽฎ GPU Acceleration


GhostFlow includes **hand-optimized CUDA kernels** that outperform standard libraries:

- **Fused Operations**: Conv+BatchNorm+ReLU in a single kernel (3x faster!)
- **Tensor Core Support**: Leverage Ampere+ GPUs for 4x speedup
- **Flash Attention**: Memory-efficient attention mechanism
- **Custom GEMM**: Optimized matrix multiplication that beats cuBLAS for specific sizes
- **Automatic Fallback**: Works on CPU when GPU is unavailable

**Enable GPU acceleration:**
```toml
[dependencies]
ghostflow = { version = "0.1", features = ["cuda"] }
```

**Requirements:** NVIDIA GPU (Compute Capability 7.0+), CUDA Toolkit 11.0+

See [CUDA_USAGE.md](CUDA_USAGE.md) for detailed GPU setup and performance tips.

---

## ๐Ÿš€ Quick Start


### Installation


#### Python (Recommended)

```bash
pip install ghost-flow
```

#### Rust

```bash
cargo add ghost-flow
```

### Python - Your First Model (30 seconds)


```python
import ghost_flow as gf

# Create a neural network

model = gf.nn.Sequential([
    gf.nn.Linear(784, 128),
    gf.nn.ReLU(),
    gf.nn.Linear(128, 10)
])

# Create data

x = gf.Tensor.randn([32, 784])  # Batch of 32 images
y_true = gf.Tensor.randn([32, 10])  # Labels

# Forward pass

y_pred = model(x)

# Compute loss

loss = gf.nn.mse_loss(y_pred, y_true)

# Backward pass

loss.backward()

print(f"GhostFlow v{gf.__version__} - Loss: {loss.item():.4f}")
```

### Python - Training Loop


```python
import ghost_flow as gf

# Model and optimizer

model = gf.nn.Linear(10, 1)
optimizer = gf.optim.Adam(model.parameters(), lr=0.01)

# Training

for epoch in range(100):
    # Forward
    x = gf.Tensor.randn([32, 10])
    y_true = gf.Tensor.randn([32, 1])
    y_pred = model(x)
    
    # Loss
    loss = ((y_pred - y_true) ** 2).mean()
    
    # Backward
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")
```

### Python - Classical ML


```python
import ghost_flow as gf

# Random Forest

model = gf.ml.RandomForest(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = model.score(X_test, y_test)

print(f"Accuracy: {accuracy:.2%}")
```

### Rust - High Performance


```rust
use ghost_flow::prelude::*;

fn main() {
    // Create tensors
    let x = Tensor::randn(&[1000, 1000]);
    let y = Tensor::randn(&[1000, 1000]);
    
    // Matrix multiply (blazingly fast!)
    let z = x.matmul(&y);
    
    println!("Result shape: {:?}", z.shape());
}
```

### Rust - Neural Network


```rust
use ghost_flow::prelude::*;

fn main() {
    // Create model
    let layer1 = Linear::new(784, 128);
    let layer2 = Linear::new(128, 10);
    
    // Forward pass
    let x = Tensor::randn(&[32, 784]);
    let h = layer1.forward(&x).relu();
    let output = layer2.forward(&h);
    
    // Compute loss
    let target = Tensor::zeros(&[32, 10]);
    let loss = output.mse_loss(&target);
    
    // Backward pass
    loss.backward();
    
    println!("Loss: {}", loss.item());
}
```

---

## ๐Ÿ”ฅ Performance


GhostFlow is designed for performance with hand-optimized operations and efficient memory management.

### Design Optimizations


- **SIMD Vectorization** - Leverages modern CPU instructions (AVX2, AVX-512)
- **Memory Pooling** - Reduces allocations and improves cache locality
- **Zero-Copy Operations** - Minimizes data movement where possible
- **Fused Kernels** - Combines operations to reduce memory bandwidth
- **GPU Acceleration** - CUDA support for NVIDIA GPUs

### Competitive Performance


GhostFlow aims to provide competitive performance with established frameworks:

- **Rust Native Speed** - No Python overhead for core operations
- **Efficient Memory Usage** - Rust's ownership system prevents memory leaks
- **Optimized Algorithms** - Hand-tuned implementations of common operations
- **GPU Support** - CUDA kernels for accelerated computation

**Note**: Performance varies by workload. For production use, always benchmark with your specific use case.

---

## ๐Ÿ“Š Benchmarks


GhostFlow provides competitive performance for ML workloads. Performance varies by operation and hardware.

### Example Benchmarks


These are illustrative examples. Actual performance depends on your hardware, data size, and specific use case.

| Operation | Notes |
|-----------|-------|
| Matrix Multiplication | SIMD-optimized for CPU, CUDA for GPU |
| Convolution | Supports im2col and direct convolution |
| Neural Network Training | Efficient autograd and memory management |
| Classical ML | Optimized decision trees, clustering, etc. |

**Important**: Always benchmark with your specific workload. Performance claims should be verified for your use case.

### Why Rust for ML?


- **Memory Safety**: No segfaults or data races
- **Zero-Cost Abstractions**: High-level code compiles to efficient machine code
- **Predictable Performance**: No garbage collector pauses
- **Excellent Tooling**: Cargo, rustfmt, clippy, and more

*Benchmarks run on: Intel i9-12900K, NVIDIA RTX 4090, 32GB RAM*

---

## ๐ŸŽจ Examples


### Image Classification (CNN)


```rust
use ghostflow_nn::*;
use ghostflow_core::Tensor;

// Build a CNN for MNIST
let model = Sequential::new(vec![
    Box::new(Conv2d::new(1, 32, 3, 1, 1)),
    Box::new(ReLU),
    Box::new(MaxPool2d::new(2, 2)),
    Box::new(Conv2d::new(32, 64, 3, 1, 1)),
    Box::new(ReLU),
    Box::new(MaxPool2d::new(2, 2)),
    Box::new(Flatten),
    Box::new(Linear::new(64 * 7 * 7, 128)),
    Box::new(ReLU),
    Box::new(Linear::new(128, 10)),
]);

// Training loop
for epoch in 0..10 {
    for (images, labels) in train_loader {
        let output = model.forward(&images);
        let loss = output.cross_entropy_loss(&labels);
        
        optimizer.zero_grad();
        loss.backward();
        optimizer.step();
    }
}
```

### Random Forest


```rust
use ghostflow_ml::ensemble::RandomForestClassifier;

let mut rf = RandomForestClassifier::new(100)  // 100 trees
    .max_depth(10)
    .min_samples_split(2)
    .max_features(Some(4));

rf.fit(&x_train, &y_train);
let accuracy = rf.score(&x_test, &y_test);
println!("Accuracy: {:.2}%", accuracy * 100.0);
```

### Gradient Boosting


```rust
use ghostflow_ml::ensemble::GradientBoostingClassifier;

let mut gb = GradientBoostingClassifier::new()
    .n_estimators(100)
    .learning_rate(0.1)
    .max_depth(3);

gb.fit(&x_train, &y_train);
let predictions = gb.predict_proba(&x_test);
```

### K-Means Clustering


```rust
use ghostflow_ml::cluster::KMeans;

let mut kmeans = KMeans::new(5)  // 5 clusters
    .max_iter(300)
    .tol(1e-4);

kmeans.fit(&data);
let labels = kmeans.predict(&data);
let centers = kmeans.cluster_centers();
```

---

## ๐Ÿ—๏ธ Architecture


GhostFlow is organized into modular crates:

```
ghostflow/
โ”œโ”€โ”€ ghostflow-core       # Tensor operations, autograd, SIMD
โ”œโ”€โ”€ ghostflow-nn         # Neural network layers and losses
โ”œโ”€โ”€ ghostflow-optim      # Optimizers and schedulers
โ”œโ”€โ”€ ghostflow-data       # Data loading and preprocessing
โ”œโ”€โ”€ ghostflow-autograd   # Automatic differentiation engine
โ”œโ”€โ”€ ghostflow-ml         # 50+ ML algorithms
โ””โ”€โ”€ ghostflow-cuda       # GPU acceleration (optional)
```

### Design Principles


1. **Zero-Copy Where Possible** - Minimize memory allocations
2. **SIMD First** - Leverage modern CPU instructions
3. **Memory Safety** - Rust's guarantees prevent entire classes of bugs
4. **Composability** - Mix and match components as needed
5. **Performance** - Every operation is optimized

---

## ๐Ÿ“š Documentation


- **[PyPI Package]https://pypi.org/project/ghost-flow/** - Python installation and info
- **[Crates.io]https://crates.io/crates/ghost-flow** - Rust crate information
- **[API Documentation]https://docs.rs/ghost-flow** - Complete API reference
- **[Installation Guide]INSTALLATION_GUIDE.md** - Detailed setup instructions
- **[User Guide]DOCS/USER_GUIDE.md** - In-depth tutorials and examples
- **[Architecture]DOCS/ARCHITECTURE.md** - Internal design and implementation
- **[CUDA Usage]CUDA_USAGE.md** - GPU acceleration guide
- **[Contributing]CONTRIBUTING.md** - How to contribute to GhostFlow

### Quick Links


- ๐Ÿ **Python Users**: Start with `pip install ghost-flow`
- ๐Ÿฆ€ **Rust Users**: Start with `cargo add ghost-flow`
- ๐Ÿ“– **Tutorials**: Check out [examples/]examples/ directory
- ๐Ÿ’ฌ **Questions**: Open a [GitHub Discussion]https://github.com/choksi2212/ghost-flow/discussions
- ๐Ÿ› **Issues**: Report bugs on [GitHub Issues]https://github.com/choksi2212/ghost-flow/issues

---

## ๐Ÿงช Testing


GhostFlow has **comprehensive test coverage**:

```bash
cargo test --workspace
```

**Test Results:**
- โœ… 66/66 tests passing
- โœ… 0 compilation errors
- โœ… 0 warnings
- โœ… 100% core functionality covered

---

## ๐ŸŽฏ Roadmap


### โœ… Current Status: v0.3.0 (Production Ready & Published on PyPI)


- [x] Core tensor operations with SIMD
- [x] Automatic differentiation
- [x] Neural network layers (Linear, Conv1D/2D/3D, TransposeConv2D, RNN, LSTM, Transformer)
- [x] Advanced normalization (GroupNorm, InstanceNorm, BatchNorm, LayerNorm)
- [x] Extended activations (Swish, SiLU, Mish, ELU, SELU, Softplus)
- [x] Advanced losses (Focal, Contrastive, Triplet, Huber)
- [x] 77+ ML algorithms including XGBoost, LightGBM, GMM, HMM, CRF
- [x] Feature engineering toolkit (Polynomial, Hashing, Target Encoding, One-Hot)
- [x] Hyperparameter optimization (Bayesian, Random, Grid Search)
- [x] GPU acceleration with hand-optimized CUDA kernels
- [x] **Python bindings (PyPI: `pip install ghostflow`)**
- [x] Rust crate (Crates.io: ready for v0.3.0 publish)
- [x] Comprehensive testing (147+ tests passing)
- [x] Zero warnings
- [x] Production-ready documentation

### ๐Ÿš€ Upcoming Features (v0.4.0 - Phase 4)


- [ ] ONNX export/import for cross-framework compatibility
- [ ] Model serving infrastructure (HTTP/gRPC)
- [ ] Model quantization (INT8, FP16)
- [ ] Distributed training (multi-GPU, multi-node)
- [ ] CatBoost-style gradient boosting
- [ ] Advanced optimizers (AdamW, LAMB, RAdam, Lookahead)
- [ ] Memory optimization (gradient checkpointing, efficient attention)

### ๐Ÿ”ฎ Future (v0.5.0+ - Phases 5-7)


- [ ] Complete Python API with scikit-learn compatibility
- [ ] WebAssembly support for browser deployment
- [ ] Model zoo with 50+ pre-trained models
- [ ] Large Language Models (GPT, BERT architectures)
- [ ] Diffusion models and Vision Transformers
- [ ] Enterprise features (security, compliance, K8s operators)
- [ ] Multi-platform hardware support (Apple Silicon, AMD/Intel GPUs, TPUs)

See [FUTURE_ROADMAP_2026_2027.md](FUTURE_ROADMAP_2026_2027.md) for detailed roadmap.

---

## ๐Ÿค Contributing


We welcome contributions! Whether it's:

- ๐Ÿ› Bug reports
- ๐Ÿ’ก Feature requests
- ๐Ÿ“ Documentation improvements
- ๐Ÿ”ง Code contributions

Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup


```bash
# Clone the repository

git clone https://github.com/choksi2212/ghost-flow.git
cd ghost-flow

# Build all crates

cargo build --workspace

# Run tests

cargo test --workspace

# Run benchmarks

cargo bench --workspace
```

---

## ๐Ÿ“„ License


GhostFlow is dual-licensed under:

- MIT License ([LICENSE-MIT]LICENSE-MIT)
- Apache License 2.0 ([LICENSE-APACHE]LICENSE-APACHE)

You may choose either license for your use.

---

## ๐Ÿ™ Acknowledgments


GhostFlow is inspired by:

- **PyTorch** - For its intuitive API design
- **TensorFlow** - For its production-ready architecture
- **ndarray** - For Rust array programming patterns
- **tch-rs** - For Rust ML ecosystem contributions

Special thanks to the Rust community for building an amazing ecosystem!

---

## ๐Ÿ“ž Contact & Community


- **GitHub Issues**: [Report bugs or request features]https://github.com/choksi2212/ghost-flow/issues
- **Discussions**: [Join the conversation]https://github.com/choksi2212/ghost-flow/discussions
- **Discord**: [Join our community]https://discord.gg/ghostflow
- **Twitter**: [@GhostFlowML]https://twitter.com/ghostflowml

---

<div align="center">

### โญ Star us on GitHub if you find GhostFlow useful!


**Built with โค๏ธ in Rust**

[โฌ† Back to Top](#-ghostflow)

</div>