docs.rs failed to build rustorch-0.4.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

Visit the last successful build: rustorch-0.6.29

RusTorch 🚀

A production-ready deep learning library in Rust with PyTorch-like API, GPU acceleration, and enterprise-grade performance
本番環境対応のRust製ディープラーニングライブラリ - PyTorchライクなAPI、GPU加速、エンタープライズグレードパフォーマンス

RusTorch is a fully functional deep learning library that leverages Rust's safety and performance, providing comprehensive tensor operations, automatic differentiation, neural network layers, transformer architectures, multi-backend GPU acceleration (CUDA/Metal/OpenCL), advanced SIMD optimizations, and enterprise-grade memory management features.
RusTorchは、Rustの安全性とパフォーマンスを活かした完全機能のディープラーニングライブラリです。包括的なテンソル演算、自動微分システム、ニューラルネットワーク層、Transformerアーキテクチャ、マルチバックエンドGPU加速（CUDA/Metal/OpenCL）、高度なSIMD最適化、エンタープライズグレードメモリ管理機能を提供します。

✨ Features / 主な特徴

🔥 Comprehensive Tensor Operations: Math operations, broadcasting, indexing, and statistics
包括的テンソル演算: 数学演算、ブロードキャスティング、インデックス操作、統計機能
🤖 Transformer Architecture: Complete transformer implementation with multi-head attention
Transformerアーキテクチャ: マルチヘッドアテンション付き完全なTransformer実装
📝 Embedding Systems: Word embeddings, positional encoding, sinusoidal encoding
埋め込みシステム: 単語埋め込み、位置エンコーディング、正弦波エンコーディング
📊 Advanced Statistics: Mean, variance, std, median, quantile, covariance, correlation
高度な統計: 平均、分散、標準偏差、中央値、分位数、共分散、相関
🎯 Broadcasting Support: Automatic shape compatibility and dimension expansion
ブロードキャスティング: 自動形状互換性と次元拡張
🔍 Flexible Indexing: Select operations, slicing, and advanced tensor manipulation
柔軟なインデックス: 選択操作、スライシング、高度なテンソル操作
🧮 Mathematical Functions: Trigonometric, exponential, power, and activation functions
数学関数: 三角関数、指数関数、べき乗、活性化関数
🎲 Special Mathematical Functions: Gamma, Bessel, error functions with high precision and PyTorch compatibility
特殊数学関数: 高精度でPyTorch互換のガンマ、ベッセル、誤差関数
- ⚠️ Note: Some special functions have precision limitations (1e-3 to 1e-6 accuracy) - contributions welcome for improved implementations
- 注意: 一部の特殊関数に精度制限あり(1e-3〜1e-6精度) - より高精度な実装の貢献を歓迎
📊 Statistical Distributions: Complete probability distributions (Normal, Gamma, Beta, etc.)
統計分布: 完全な確率分布（正規、ガンマ、ベータ等）
🧠 Automatic Differentiation: Tape-based computational graph for gradient computation
自動微分: テープベースの計算グラフによる勾配計算
🏗️ Neural Network Layers: Linear, Conv1d/2d/3d, ConvTranspose, RNN/LSTM/GRU, BatchNorm, Dropout, and more
ニューラルネットワーク層: Linear、Conv1d/2d/3d、ConvTranspose、RNN/LSTM/GRU、BatchNorm、Dropout等
🖼️ Computer Vision: Advanced transformation pipelines with caching, conditional transforms, built-in datasets (MNIST, CIFAR-10/100)
コンピュータビジョン: キャッシュ、条件付き変換、組み込みデータセット（MNIST、CIFAR-10/100）を持つ高度な変換パイプライン
🔧 Safe Operations: Type-safe tensor operations with comprehensive error handling
安全な操作: 包括的エラーハンドリング付き型安全テンソル演算
⚙️ Shared Base Traits: Reusable convolution and pooling base implementations
共有基底トレイト: 再利用可能な畳み込み・プーリング基底実装
⚡ SIMD Optimizations: AVX2/SSE4.1 vectorized operations for high performance
SIMD最適化: 高性能なAVX2/SSE4.1ベクトル化演算
🔄 Unified Parallel Operations: Trait-based parallel tensor operations with intelligent scheduling
統一並列操作: インテリジェントスケジューリング付きトレイトベース並列テンソル演算
🚀 Multi-threaded Processing: Rayon-based parallel batch operations and reductions
マルチスレッド処理: Rayonベース並列バッチ演算とリダクション
🎮 GPU Integration: CUDA/Metal/OpenCL support with automatic device selection
GPU統合: 自動デバイス選択付きCUDA/Metal/OpenCLサポート
💾 Advanced Memory Management: Zero-copy operations, SIMD-aligned allocation, and memory pools
高度メモリ管理: ゼロコピー操作、SIMDアライメント割り当て、メモリプール
🛡️ Rust Safety: Memory safety and thread safety guarantees
Rust安全性: メモリ安全性とスレッドセーフティを保証
🌐 WebAssembly Support: Browser-compatible WASM bindings for client-side ML
WebAssemblyサポート: クライアントサイドML向けブラウザ互換WASMバインディング
🧮 Matrix Decomposition: Complete SVD, QR, LU decomposition and eigenvalue solver with PyTorch compatibility
行列分解: PyTorch互換の完全なSVD、QR、LU分解と固有値求解
📁 Model Format Support: Safetensors, ONNX inference, PyTorch state dict compatibility
モデル形式サポート: Safetensors、ONNX推論、PyTorchステートディクト互換
✅ Production Ready: 682 tests passing (100% success rate), unified error handling system
本番環境対応: 682個テスト合格(100%成功率)、統一エラーハンドリングシステム

Installation

Add this to your Cargo.toml:

[dependencies]
rustorch = "0.4.0"

# Optional features
[features]
default = ["linalg"]
linalg = ["rustorch/linalg"]           # Linear algebra operations (SVD, QR, LU, eigenvalue)
cuda = ["rustorch/cuda"]
metal = ["rustorch/metal"] 
opencl = ["rustorch/opencl"]
safetensors = ["rustorch/safetensors"]
onnx = ["rustorch/onnx"]
all-gpu = ["cuda", "metal", "opencl"]
all-formats = ["safetensors", "onnx"]

# To disable linalg features (avoid OpenBLAS/LAPACK dependencies):
rustorch = { version = "0.4.0", default-features = false }

📊 Performance / パフォーマンス

Latest benchmark results (実測値):
最新ベンチマーク結果（実測値）:

🔥 Core Performance Metrics / コア性能指標

Operation / 演算	Performance / 性能	Details / 詳細
Tensor Addition / テンソル加算	34K - 2.3M ops/sec	✅ Broadcasting support / ブロードキャスト対応
Tensor Sum / テンソル合計	52M+ ops/sec	✅ Consistently high performance / 一貫した高性能
Matrix Multiplication / 行列乗算	0.71 - 0.77 GFLOPS	✅ Stable scaling / 安定したスケーリング
Neural Network Inference / NN推論	15 - 60 inferences/sec	✅ Batch processing / バッチ処理対応

🧮 Matrix Decomposition Performance / 行列分解パフォーマンス

Pure Rust implementation benchmarks (100 iterations) / 純Rust実装ベンチマーク (100回反復):

Matrix Size / 行列サイズ	SVD	QR	LU	Symeig	Eig
4×4	0.96 μs	0.56 μs	1.12 μs	0.51 μs	0.70 μs
8×8	1.38 μs	1.17 μs	1.65 μs	0.47 μs	0.71 μs
16×16	3.02 μs	4.98 μs	3.60 μs	0.43 μs	0.71 μs
32×32	9.92 μs	33.41 μs	11.81 μs	0.54 μs	0.78 μs

✅ 100% Success Rate: All matrix sizes and algorithms achieve 100/100 successful decompositions
✅ 100% 成功率: 全ての行列サイズ・アルゴリズムで100/100の分解成功

⚡ Detailed Performance Breakdown / 詳細性能内訳

Matrix Size	MatMul Performance	Batch Size	NN Inference Rate
64×64	0.77 GFLOPS	32	59.86 inferences/sec
128×128	0.76 GFLOPS	64	29.35 inferences/sec
256×256	0.76 GFLOPS	128	15.09 inferences/sec
512×512	0.71 GFLOPS	-	-

🚀 System Status / システム状態

✅ 682 Tests Passing / 682個全テスト通過
✅ Zero Compilation Errors / コンパイルエラーゼロ
✅ Unified Error Handling / 統一エラーハンドリング
✅ Broadcasting Support / ブロードキャスト対応
✅ Matrix Decomposition / 行列分解対応
✅ Production Ready / 本番環境対応

🚀 Quick Start / クイックスタート

Basic Tensor Operations / 基本的なテンソル演算

use rustorch::tensor::Tensor;

fn main() {
    // Create tensors / テンソル作成
    let a = Tensor::from_vec(vec![1.0f32, 2.0, 3.0, 4.0], vec![2, 2]);
    let b = Tensor::from_vec(vec![5.0f32, 6.0, 7.0, 8.0], vec![2, 2]);
    
    // Basic operations / 基本演算
    let c = &a + &b;  // Addition / 加算
    let d = a.matmul(&b);  // Matrix multiplication / 行列乗算
    
    // Mathematical functions / 数学関数
    let e = a.sin();  // Sine function / サイン関数
    let f = a.exp();  // Exponential function / 指数関数
    
    println!("Shape: {:?}", c.shape());
    println!("Result: {:?}", c.as_slice());
}

Broadcasting Support / ブロードキャスト対応

use rustorch::tensor::Tensor;

fn main() {
    // Broadcasting: (batch, features) + (1, features)
    // ブロードキャスト: (バッチ, 特徴量) + (1, 特徴量)
    let batch_data = Tensor::from_vec(
        (0..64).map(|i| i as f32 * 0.01).collect(),
        vec![32, 2]  // 32 samples, 2 features
    );
    
    let bias = Tensor::from_vec(
        vec![0.1, 0.2],
        vec![1, 2]  // Broadcast shape
    );
    
    // Automatic broadcasting / 自動ブロードキャスト
    let result = batch_data.add(&bias).unwrap();
    println!("Result shape: {:?}", result.shape()); // [32, 2]
    
    // Neural network bias addition / ニューラルネットワークバイアス加算
    use rustorch::nn::{Linear, Module};
    use rustorch::autograd::Variable;
    
    let linear = Linear::<f32>::new(256, 128);
    let input = Variable::new(
        Tensor::from_vec((0..32*256).map(|i| i as f32 * 0.01).collect(), vec![32, 256]),
        false
    );
    
    let output = linear.forward(&input); // Automatic bias broadcasting
    println!("Linear output: {:?}", output.data().read().unwrap().shape());
}

Advanced Tensor Operations / 高度なテンソル演算

use rustorch::tensor::Tensor;

fn main() {
    // Create a 3x4 matrix / 3x4行列を作成
    let data = Tensor::from_vec(
        vec![1.0f32, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0],
        vec![3, 4]
    );
    
    // Statistical operations / 統計演算
    let mean = data.mean(None);  // Overall mean / 全体平均
    let std_dev = data.std(Some(0), true);  // Standard deviation along axis 0 / 軸0の標準偏差
    let median = data.median(Some(1));  // Median along axis 1 / 軸1の中央値
    
    // Broadcasting operations / ブロードキャスティング演算
    let broadcasted = data.broadcast_to(&[6, 4]).unwrap();
    
    // Indexing operations / インデックス演算
    let selected = data.select(0, &[0, 2]).unwrap();  // Select rows 0 and 2 / 行0と2を選択
    
    println!("Mean: {:?}", mean.as_slice());
    println!("Selected shape: {:?}", selected.shape());
}

🧮 Matrix Decomposition / 行列分解

⚠️ Important Note / 重要な注意:
Matrix decomposition features require the linalg feature (enabled by default). On some systems, this may require OpenBLAS/LAPACK libraries to be installed. To avoid these dependencies, use:
行列分解機能は linalg フィーチャーが必要です（デフォルト有効）。一部のシステムでは、OpenBLAS/LAPACKライブラリのインストールが必要な場合があります。これらの依存関係を避けるには：

rustorch = { version = "0.4.0", default-features = false }

SVD, QR, LU Decomposition and Eigenvalue Decomposition / SVD、QR、LU分解と固有値分解

use rustorch::tensor::Tensor;

fn main() {
    // Create a 3x3 matrix / 3x3行列を作成
    let matrix = Tensor::from_vec(
        vec![1.0f32, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0],
        vec![3, 3]
    );
    
    // Singular Value Decomposition (torch.svd compatible) / 特異値分解（torch.svd互換）
    let (u, s, v) = matrix.svd(false).unwrap();
    println!("SVD - U: {:?}, S: {:?}, V: {:?}", u.shape(), s.shape(), v.shape());
    
    // QR decomposition / QR分解
    let (q, r) = matrix.qr().unwrap();
    println!("QR - Q: {:?}, R: {:?}", q.shape(), r.shape());
    
    // LU decomposition with partial pivoting / 部分ピボット選択付きLU分解
    let (l, u, p) = matrix.lu().unwrap();
    println!("LU - L: {:?}, U: {:?}, P: {:?}", l.shape(), u.shape(), p.shape());
    
    // Create symmetric matrix for eigenvalue decomposition / 固有値分解用対称行列作成
    let sym_data = vec![4.0f32, 2.0, 1.0, 2.0, 3.0, 0.5, 1.0, 0.5, 1.0];
    let sym_matrix = Tensor::from_vec(sym_data, vec![3, 3]);
    
    // Symmetric eigenvalue decomposition (torch.symeig compatible) / 対称固有値分解（torch.symeig互換）
    let (eigenvals, eigenvecs) = sym_matrix.symeig(true, true).unwrap();
    println!("Symeig - Values: {:?}, Vectors: {:?}", eigenvals.shape(), eigenvecs.shape());
    
    // General eigenvalue decomposition (torch.eig compatible) / 一般固有値分解（torch.eig互換）
    let (eig_vals, eig_vecs) = matrix.eig(true).unwrap();
    println!("Eig - Values: {:?}, Vectors: {:?}", eig_vals.shape(), eig_vecs.unwrap().shape());
    
    // Performance: All operations complete in microseconds / 性能: 全演算がマイクロ秒で完了
    // 4x4 matrices: SVD ~0.96μs, QR ~0.56μs, LU ~1.12μs, Symeig ~0.51μs
    // 32x32 matrices: SVD ~9.92μs, QR ~33.41μs, LU ~11.81μs, Symeig ~0.54μs
}

### Automatic Differentiation and Neural Networks / 自動微分とニューラルネットワーク

```rust
use rustorch::prelude::*;
use rustorch::nn::{Linear, loss::mse_loss};
use rustorch::optim::{SGD, Optimizer};

fn main() {
    // Create model / モデル作成
    let model = Linear::new(784, 10);
    let params = model.parameters();
    let mut optimizer = SGD::new(params, 0.01, None, None, None, None);
    
    // Prepare data / データ準備
    let input = Variable::new(
        Tensor::from_vec((0..784).map(|i| i as f32 * 0.01).collect(), vec![1, 784]),
        false
    );
    let target = Variable::new(
        Tensor::from_vec(vec![1.0; 10], vec![1, 10]),
        false
    );
    
    // Training loop / 訓練ループ
    for epoch in 0..100 {
        optimizer.zero_grad();
        
        let output = model.forward(&input);
        let loss = mse_loss(&output, &target);
        
        loss.backward();
        optimizer.step();
        
        if epoch % 10 == 0 {
            println!("Epoch {}: Loss = {:.4}", epoch, loss.data().as_array()[[0]]);
        }
    }
}

Computer Vision / コンピュータビジョン

Basic Transforms / 基本変換

use rustorch::prelude::*;
use rustorch::vision::{transforms::*, datasets::*, Image, ImageFormat};

fn main() {
    // Load MNIST dataset / MNISTデータセットを読み込み
    let train_dataset = MNIST::new("./data", true, true).unwrap();
    
    // Create basic transforms / 基本変換を作成
    let transform = Compose::new(vec![
        Box::new(Resize::new((224, 224))),
        Box::new(RandomHorizontalFlip::new(0.5)),
        Box::new(ToTensor::new()),
        Box::new(Normalize::imagenet()),
    ]);
    
    let cifar10 = CIFAR10::new("./data", true, true)
        .unwrap()
        .with_transform(Box::new(transform));
    
    let train_loader = DataLoader::new(cifar10, 32, true);
}

Advanced Pipeline / 高度なパイプライン

use rustorch::prelude::*;

fn main() {
    // Create advanced pipeline with caching and conditional transforms
    // キャッシュと条件付き変換を持つ高度なパイプラインを作成
    let pipeline = PipelineBuilder::new("training_pipeline".to_string())
        .transform(Box::new(Resize::new((256, 256))))
        .conditional_transform(
            Box::new(RandomCrop::new((224, 224))),
            predicates::min_size(100, 100), // Only for images >= 100x100
            "large_image_crop".to_string()
        )
        .conditional_transform(
            Box::new(RandomHorizontalFlip::new(1.0)),
            predicates::probability(0.5), // 50% chance
            "random_flip".to_string()
        )
        .transform(Box::new(ToTensor::new()))
        .transform(Box::new(Normalize::imagenet()))
        .cache(500) // Cache 500 processed images
        .execution_mode(ExecutionMode::Batch)
        .build();
    
    // Use preset pipelines / プリセットパイプラインを使用
    let imagenet_train = ImageNetPreprocessing::training();
    let cifar_train = CIFARPreprocessing::training();
    let mobile_optimized = MobileOptimizedPreprocessing::mobile_inference();
    
    // Apply pipeline with performance monitoring
    // パフォーマンス監視付きでパイプラインを適用
    let result = pipeline.apply(&image).unwrap();
    let stats = pipeline.get_stats();
    println!("Processed: {} images, Cache hit rate: {:.1}%", 
             stats.total_processed,
             stats.cache_hits as f64 / (stats.cache_hits + stats.cache_misses) as f64 * 100.0);
}

Safe Operations and ReLU Activation / 安全な操作とReLU活性化

use rustorch::nn::safe_ops::SafeOps;
use rustorch::autograd::Variable;
use rustorch::tensor::Tensor;

fn main() {
    // Create a variable safely with validation / 検証付きで変数を安全に作成
    let var = SafeOps::create_variable(
        vec![-2.0, -1.0, 0.0, 1.0, 2.0], 
        vec![5], 
        false
    ).unwrap();
    
    // Apply ReLU activation: max(0, x) / ReLU活性化を適用: max(0, x)
    let relu_result = SafeOps::relu(&var).unwrap();
    println!("ReLU output: {:?}", relu_result.data().read().unwrap().as_array());
    // Output: [0.0, 0.0, 0.0, 1.0, 2.0]
    
    // Get tensor statistics safely / テンソル統計を安全に取得
    let stats = SafeOps::get_stats(&var).unwrap();
    println!("Mean: {:.2}, Std: {:.2}", stats.mean, stats.std_dev());
    
    // Validate tensor for NaN or infinity / NaNや無限大を検証
    SafeOps::validate_finite(&var).unwrap();
    println!("Tensor is finite and valid!");
}

GPU Acceleration / GPU加速

use rustorch::gpu::{DeviceType, kernels::{KernelExecutor, AddKernel, MatMulKernel}};

fn main() {
    // Automatic device detection / 自動デバイス検出
    let available_devices = DeviceType::available_devices();
    println!("Available devices: {:?}", available_devices);
    
    // GPU kernel execution / GPUカーネル実行
    let device = DeviceType::best_available();
    let executor = KernelExecutor::new(device);
    
    // Element-wise addition on GPU / GPU上での要素ごと加算
    let a = vec![1.0f32; 1024];
    let b = vec![2.0f32; 1024];
    let mut c = vec![0.0f32; 1024];
    
    let kernel = AddKernel;
    let inputs = [a.as_slice(), b.as_slice()];
    let mut outputs = [c.as_mut_slice()];
    
    executor.execute_kernel(&kernel, &inputs, &mut outputs)
        .expect("GPU kernel execution failed");
    
    println!("GPU computation completed: {:?}", &c[..5]);
    
    // Matrix multiplication with GPU acceleration / GPU加速行列乗算
    let kernel = MatMulKernel;
    // ... matrix multiplication setup
}

WebAssembly Support / WebAssemblyサポート

RusTorch provides comprehensive WebAssembly support for running neural networks in browsers with optimized performance and memory management.
RusTorchは、最適化されたパフォーマンスとメモリ管理でブラウザ内でニューラルネットワークを実行するための包括的なWebAssemblyサポートを提供します。

// Browser usage / ブラウザでの使用
import init, * as rustorch from './pkg/rustorch.js';

async function main() {
    // Initialize WASM / WASMを初期化
    await init();
    
    // Basic tensor operations / 基本的なテンソル操作
    const tensor1 = rustorch.WasmTensor.zeros([2, 3]);
    const tensor2 = rustorch.WasmTensor.ones([2, 3]);
    const tensor3 = rustorch.WasmTensor.random([2, 3]);
    
    // Mathematical operations / 数学演算
    const sum = tensor1.add(tensor2);
    const product = tensor1.multiply(tensor2);
    const relu_result = tensor1.relu();
    const sigmoid_result = tensor2.sigmoid();
    const tanh_result = tensor3.tanh();
    
    // Advanced operations / 高度な操作
    const reshaped = tensor1.reshape([3, 2]);
    const transposed = reshaped.transpose();
    const scalar_added = tensor2.add_scalar(0.5);
    const power = tensor3.pow(2.0);
    
    // Statistics / 統計
    console.log('Mean:', tensor3.mean());
    console.log('Max:', tensor3.max());
    console.log('Min:', tensor3.min());
    console.log('Sum:', tensor3.sum());
    
    // Neural network layers / ニューラルネットワーク層
    const relu_layer = new rustorch.WasmReLU();
    const relu_output = relu_layer.forward(tensor1);
    
    // Neural network model / ニューラルネットワークモデル
    const model = new rustorch.WasmModel();
    model.add_linear(4, 8, true);  // Linear layer: 4 inputs → 8 outputs
    model.add_relu();              // ReLU activation
    model.add_linear(8, 2, true);  // Output layer: 8 → 2
    
    console.log('Model layers:', model.num_layers());
    
    // JavaScript interoperability / JavaScript相互運用
    const interop = new rustorch.JsInterop();
    
    // Create tensors from JavaScript data / JavaScriptデータからテンソル作成
    const js_array = [[1.0, 2.0], [3.0, 4.0]];
    const tensor_from_array = rustorch.tensor_from_nested_array(js_array);
    
    // Convert tensor to JavaScript array / テンソルをJavaScript配列に変換
    const back_to_array = rustorch.tensor_to_nested_array(tensor_from_array);
    
    // Float32Array conversion / Float32Array変換
    const float32_data = new Float32Array([1, 2, 3, 4, 5, 6]);
    const shape = [2, 3];
    const tensor_from_float32 = rustorch.tensor_from_float32_array(float32_data, shape);
    const back_to_float32 = rustorch.tensor_to_float32_array(tensor_from_float32);
    
    // Performance benchmarking / パフォーマンス・ベンチマーク
    const benchmark = rustorch.benchmark_matmul(256, 10);
    console.log('Matrix multiplication benchmark:');
    console.log(`- Operation: ${benchmark.operation}`);
    console.log(`- Duration: ${benchmark.duration_ms}ms`);
    console.log(`- Throughput: ${benchmark.throughput} FLOPS`);
}

main();

Advanced WASM Features / 高度なWASM機能

// Browser storage integration / ブラウザストレージ統合
const storage = new rustorch.BrowserStorage();

// Save tensor to localStorage / テンソルをlocalStorageに保存
const my_tensor = rustorch.WasmTensor.random([5, 5]);
await storage.save_tensor('my_model_weights', my_tensor);

// Load tensor from localStorage / localStorageからテンソルを読み込み
const loaded_tensor = await storage.load_tensor('my_model_weights');

// Canvas visualization / Canvas可視化
const canvas_renderer = new rustorch.CanvasRenderer('my-canvas');
const heatmap_data = rustorch.WasmTensor.random([20, 20]);
canvas_renderer.render_heatmap(heatmap_data);

// Performance monitoring / パフォーマンス監視
rustorch.PerformanceMonitor.time_function('inference');
const result = model.forward(input_tensor);
rustorch.PerformanceMonitor.time_end('inference');

// Memory optimization / メモリ最適化
const memory_pool = new rustorch.WasmMemoryPool();
const optimized_ops = new rustorch.OptimizedOps();

// Fast matrix multiplication with blocking / ブロッキング付き高速行列乗算
const a = rustorch.WasmTensor.random([512, 512]);
const b = rustorch.WasmTensor.random([512, 512]);
const fast_result = optimized_ops.fast_matmul(a, b);

// Vectorized operations / ベクトル化操作
const vec_result = optimized_ops.vectorized_add(tensor1, tensor2);

// Batch processing / バッチ処理
const batch_processor = new rustorch.BatchProcessor();
batch_processor.add_tensor(tensor1);
batch_processor.add_tensor(tensor2);
const batch_results = batch_processor.batch_relu();

// Web worker integration / Web Worker統合
const worker_manager = new rustorch.WorkerManager();
await worker_manager.create_worker('ml_worker.js');
worker_manager.send_tensor(my_tensor);

<!-- HTML Integration / HTML統合 -->
<!DOCTYPE html>
<html>
<head>
    <title>RusTorch WASM Demo</title>
    <style>
        #tensor-canvas { border: 1px solid #ccc; }
        #performance-stats { font-family: monospace; }
    </style>
</head>
<body>
    <h1>RusTorch WebAssembly Demo</h1>
    <canvas id="tensor-canvas" width="400" height="400"></canvas>
    <div id="performance-stats"></div>
    <button onclick="runInference()">Run Neural Network</button>
    
    <script type="module">
        import init, * as rustorch from './pkg/rustorch.js';
        
        await init();
        
        // Global model for demo / デモ用グローバルモデル
        const model = new rustorch.WasmModel();
        model.add_linear(10, 20, true);
        model.add_relu();
        model.add_linear(20, 5, true);
        
        // Canvas renderer / Canvas描画器
        const renderer = new rustorch.CanvasRenderer('tensor-canvas');
        
        window.runInference = function() {
            const input = rustorch.WasmTensor.random([1, 10]);
            
            rustorch.PerformanceMonitor.time_function('inference');
            const output = model.forward(input);
            rustorch.PerformanceMonitor.time_end('inference');
            
            // Visualize random data / ランダムデータを可視化
            const viz_data = rustorch.WasmTensor.random([20, 20]);
            renderer.render_heatmap(viz_data);
            
            // Update performance stats / パフォーマンス統計更新
            const memory_info = rustorch.PerformanceMonitor.get_memory_info();
            document.getElementById('performance-stats').innerHTML = 
                `Memory: ${JSON.stringify(memory_info)}<br>Output: ${output.data().slice(0, 5)}`;
        };
    </script>
</body>
</html>

Building for WebAssembly / WebAssembly向けビルド

# Install wasm-pack / wasm-packをインストール
curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh

# Build for web / Web向けビルド
wasm-pack build --target web --features wasm

# Build for Node.js / Node.js向けビルド
wasm-pack build --target nodejs --features wasm

# Run examples / 例を実行
cd examples
python -m http.server 8000
# Open http://localhost:8000/wasm_basic.html

🏗️ Architecture / アーキテクチャ

src/
├── tensor/          # Tensor operations (ndarray-based) / テンソル演算（ndarray基盤）
│   ├── operations.rs       # Matrix decomposition (SVD, QR, LU, eigenvalue) / 行列分解（SVD、QR、LU、固有値）
│   ├── parallel_traits.rs  # Parallel operation traits / 並列操作トレイト
│   ├── parallel_impl.rs    # Parallel implementations / 並列実装
│   ├── parallel_ops.rs     # Legacy parallel ops / レガシー並列操作
│   ├── gpu_parallel.rs     # GPU-integrated parallel ops / GPU統合並列操作
│   ├── memory_optimized.rs # Memory optimization strategies / メモリ最適化戦略
│   ├── zero_copy.rs        # Zero-copy operations / ゼロコピー操作
│   ├── simd_aligned.rs     # SIMD-aligned tensors / SIMDアライメントテンソル
│   ├── math_ops.rs         # Mathematical functions / 数学関数
│   ├── broadcasting.rs     # Broadcasting operations / ブロードキャスト操作
│   ├── indexing.rs         # Indexing and selection / インデックスと選択
│   └── statistics.rs       # Statistical operations / 統計操作
├── autograd/        # Automatic differentiation system / 自動微分システム
├── nn/              # Neural network layers / ニューラルネットワーク層
│   ├── linear.rs    # Linear layers / 線形層
│   ├── conv2d.rs    # Convolution layers / 畳み込み層
│   ├── rnn.rs       # RNN/LSTM/GRU
│   ├── activation.rs # Activation functions / 活性化関数
│   └── loss.rs      # Loss functions / 損失関数
├── simd/            # SIMD optimizations / SIMD最適化
│   ├── vectorized.rs # AVX2/SSE4.1 operations / AVX2/SSE4.1演算
│   └── traits.rs     # SIMD trait system / SIMDトレイトシステム
├── memory/          # Advanced memory management / 高度メモリ管理
├── gpu/             # GPU acceleration support / GPU加速サポート
│   ├── device.rs    # Device management / デバイス管理
│   ├── memory.rs    # GPU memory pools / GPUメモリプール
│   ├── kernels.rs   # Unified kernel interface / 統一カーネルインターフェース
│   ├── cuda_kernels.rs   # CUDA implementations with cuBLAS / cuBLAS統合CUDA実装
│   ├── metal_kernels.rs  # Metal Performance Shaders / Metal Performance Shaders
│   ├── opencl_kernels.rs # OpenCL cross-platform kernels / OpenCLクロスプラットフォームカーネル
│   └── validation.rs     # GPU kernel validation framework / GPUカーネル検証フレームワーク
├── wasm/            # WebAssembly support / WebAssemblyサポート
│   ├── tensor.rs    # WASM tensor operations with enhanced math functions / 拡張数学関数付きWASMテンソル演算
│   ├── bindings.rs  # Neural network layer bindings (Linear, ReLU, Model) / ニューラルネットワーク層バインディング（Linear、ReLU、Model）
│   ├── interop.rs   # JavaScript interoperability and benchmarking / JavaScript相互運用とベンチマーク
│   ├── browser.rs   # Browser-specific features (storage, canvas, workers) / ブラウザ専用機能（ストレージ、Canvas、Worker）
│   └── optimized.rs # Performance-optimized WASM operations / パフォーマンス最適化WASM操作
├── special/         # Special mathematical functions / 特殊数学関数
│   ├── gamma.rs     # Gamma functions (Γ, ln Γ, ψ, B) / ガンマ関数（Γ、ln Γ、ψ、B）
│   ├── bessel.rs    # Bessel functions (J, Y, I, K) / ベッセル関数（J、Y、I、K）
│   ├── error.rs     # Error functions (erf, erfc, erfinv) / 誤差関数（erf、erfc、erfinv）
│   └── utils.rs     # Utility functions / ユーティリティ関数
├── distributions/   # Statistical distributions / 統計分布
│   ├── normal.rs    # Normal distribution / 正規分布
│   ├── gamma.rs     # Gamma distribution / ガンマ分布
│   ├── beta.rs      # Beta distribution / ベータ分布
│   └── ...         # Other distributions / その他の分布
├── optim/           # Optimization algorithms / 最適化アルゴリズム
└── data/            # Data loaders / データローダー

📚 Rich Features / 豊富な機能

Tensor Operations / テンソル演算

Basic operations / 基本演算: +, -, *, /, matmul()
Mathematical functions / 数学関数: sin(), cos(), exp(), log(), sqrt(), pow(), sigmoid(), tanh()
Special functions / 特殊関数: gamma(), lgamma(), digamma(), erf(), erfc(), bessel_j(), bessel_y(), bessel_i(), bessel_k()
Statistical operations / 統計演算: mean(), var(), std(), median(), quantile(), cumsum(), cov(), corrcoef()
Matrix decomposition / 行列分解: svd(), qr(), lu(), eig(), symeig() with PyTorch compatibility
Broadcasting / ブロードキャスティング: broadcast_to(), broadcast_with(), unsqueeze(), squeeze(), repeat()
Indexing / インデックス: select(), advanced slicing and tensor manipulation
Shape manipulation / 形状操作: transpose(), reshape(), permute()
Parallel operations / 並列操作: Trait-based parallel processing with automatic SIMD acceleration
GPU operations / GPU操作: CUDA/Metal/OpenCL unified kernel execution with automatic device selection
Memory optimization / メモリ最適化: Zero-copy views, SIMD-aligned allocation, memory pools

Neural Network Layers / ニューラルネットワーク層

Linear: Fully connected layers / 全結合層
Conv2d: 2D convolution layers / 2D畳み込み層
RNN/LSTM/GRU: Recurrent neural networks (multi-layer & bidirectional) / 再帰ニューラルネットワーク（多層・双方向対応）
Transformer: Complete transformer architecture with encoder/decoder / エンコーダー・デコーダー付き完全Transformerアーキテクチャ
Multi-Head Attention: Self-attention and cross-attention mechanisms / セルフアテンション・クロスアテンション機構
Embedding: Word embeddings, positional encoding, sinusoidal encoding / 単語埋め込み、位置エンコーディング、正弦波エンコーディング
Normalization: BatchNorm, LayerNorm, GroupNorm, RMSNorm / バッチ正規化、レイヤー正規化、グループ正規化、RMS正規化
Dropout: Standard and Alpha dropout layers / 標準・Alphaドロップアウト層
Pooling: MaxPool2d, AvgPool2d

Activation Functions / 活性化関数

ReLU, Sigmoid, Tanh, Softmax, GELU, Swish, Mish, LeakyReLU, ELU, SELU

Loss Functions / 損失関数

MSELoss, CrossEntropyLoss, BCELoss, HuberLoss

Optimization Algorithms / 最適化アルゴリズム

SGD, Adam + Learning rate schedulers / 学習率スケジューラー

📖 Examples / サンプル

Comprehensive examples in the examples/ directory:
examples/ ディレクトリに包括的なサンプルを用意:

Tensor Operations / テンソル演算:
- math_ops_demo.rs - Mathematical functions demonstration
- broadcasting_demo.rs - Broadcasting operations
- indexing_demo.rs - Indexing and selection operations
- statistics_demo.rs - Statistical functions
Matrix Decomposition / 行列分解:
- svd_demo.rs - SVD demonstration with verification and edge cases
- eigenvalue_demo.rs - Eigenvalue decomposition with PCA examples
- matrix_decomposition_demo.rs - QR/LU demonstrations with linear system solving
Transformer & Attention / Transformer・アテンション:
- transformer_demo.rs - Complete transformer pipeline
- embedding_demo.rs - Word and positional embeddings
- attention_demo.rs - Multi-head attention mechanisms
Special Functions / 特殊関数:
- special_functions_demo.rs - Gamma, Bessel, error functions demonstration
Model Formats / モデル形式:
- model_formats_demo.rs - Safetensors, ONNX, PyTorch compatibility demonstration
Performance Optimization / パフォーマンス最適化:
- parallel_operations_demo.rs - Parallel tensor operations with trait-based system
- memory_optimization_demo.rs - Advanced memory optimization strategies
- gpu_acceleration_demo.rs - GPU acceleration with multi-backend support
- gpu_kernel_demo.rs - GPU kernel validation and performance demonstration
- simd_demo.rs - SIMD vectorized operations
Basic / 基本: tensor_demo.rs, autograd_demo.rs
Neural Networks / NN: linear_regression.rs, neural_network_demo.rs
Advanced / 高度: rnn_demo.rs, advanced_features_demo.rs

Running Examples / サンプル実行

# Run tensor operations examples / テンソル演算サンプル実行
cargo run --example math_ops_demo --release
cargo run --example broadcasting_demo --release
cargo run --example statistics_demo --release

# Run special functions examples / 特殊関数サンプル実行
cargo run --example special_functions_demo --release

# Run matrix decomposition examples / 行列分解サンプル実行
cargo run --example svd_demo --release
cargo run --example eigenvalue_demo --release
cargo run --example matrix_decomposition_demo --release

# Run transformer examples / Transformerサンプル実行
cargo run --example transformer_demo --release
cargo run --example embedding_demo --release
cargo run --example attention_demo --release

# Run performance optimization examples / パフォーマンス最適化サンプル実行
cargo run --example parallel_operations_demo --release
cargo run --example memory_optimization_demo --release
cargo run --example gpu_acceleration_demo --release
cargo run --example gpu_kernel_demo --release
cargo run --example simd_demo --release

# Run neural network examples / ニューラルネットワークサンプル実行
cargo run --example linear_regression --release
cargo run --example neural_network_demo --release
cargo run --example rnn_demo --release

# Run advanced examples / 高度なサンプル実行
cargo run --example autograd_demo --release
cargo run --example advanced_features_demo --release

🧪 Testing / テスト

All 682 tests passing - Production-ready quality assurance with unified error handling system
682個全テスト合格 - 統一エラーハンドリングシステム付き本番環境対応の品質保証

🧮 Matrix Decomposition Tests / 行列分解テスト

25 comprehensive matrix decomposition tests including:
包括的な25個の行列分解テスト含む:

SVD (9 tests): Square matrices, rectangular matrices, identity matrices, rank-deficient cases, orthogonality verification
Eigenvalue (8 tests): General matrices, symmetric matrices, eigenvector computation, identity and zero matrices
QR (4 tests): Basic decomposition, rectangular matrices, identity matrices, error handling
LU (4 tests): Basic decomposition, identity matrices, rectangular matrices, partial pivoting

All matrix decomposition algorithms achieve 100% success rate across all test cases.
全ての行列分解アルゴリズムが全テストケースで100%成功率を達成。

# Run all tests / 全テスト実行
cargo test

# Run with release optimizations / リリース最適化でテスト実行
cargo test --release

# Run specific test modules / 特定のテストモジュール実行
cargo test tensor
cargo test nn
cargo test autograd

📊 Benchmarks / ベンチマーク

Comprehensive performance measurement with dedicated benchmark suites:
専用ベンチマークスイートで包括的な性能測定:

# Run all benchmarks / 全ベンチマーク実行
cargo bench

# Run specific benchmark suites / 特定のベンチマークスイート実行
cargo bench --bench parallel_performance      # Parallel processing benchmarks
cargo bench --bench simd_performance         # SIMD optimization benchmarks  
cargo bench --bench memory_strategy_performance  # Memory optimization benchmarks
cargo bench --bench gpu_cpu_performance      # GPU vs CPU comparison benchmarks
cargo bench --bench gpu_kernel_performance   # GPU kernel validation and performance
cargo bench --bench integrated_performance   # Integrated performance tests

# Matrix decomposition benchmarks / 行列分解ベンチマーク
cargo bench --bench matrix_decomposition_benchmark  # Comprehensive matrix decomposition
cargo bench --bench optimized_matrix_benchmark      # Timeout-optimized matrix benchmarks
cargo bench --bench quick_matrix_benchmark         # Quick matrix performance tests

# Legacy benchmarks / レガシーベンチマーク
cargo bench --bench tensor_ops
cargo bench --bench neural_networks
cargo bench --bench optimized_ops
cargo bench --bench memory_pool
cargo bench --bench memory_optimization
cargo bench --bench gpu_integration

New Benchmark Suites / 新しいベンチマークスイート:

parallel_performance: Parallel vs sequential operations, thread scaling, execution strategies / 並列vs逐次演算、スレッドスケーリング、実行戦略
simd_performance: SIMD vs scalar operations, vectorization effectiveness, instruction sets / SIMDvsスカラー演算、ベクトル化効果、命令セット
memory_strategy_performance: Memory allocation strategies, zero-copy operations, cache optimization / メモリ割り当て戦略、ゼロコピー操作、キャッシュ最適化
gpu_cpu_performance: GPU acceleration vs CPU processing, device selection, memory transfer / GPU加速vsCPU処理、デバイス選択、メモリ転送
integrated_performance: End-to-end performance validation across all optimizations / 全最適化の統合パフォーマンス検証

Matrix Decomposition Benchmarks / 行列分解ベンチマーク:

matrix_decomposition_benchmark: Comprehensive SVD, QR, LU, eigenvalue benchmarks with scaling analysis / SVD、QR、LU、固有値の包括的ベンチマークとスケーリング解析
optimized_matrix_benchmark: Timeout-resistant benchmarks with conservative settings / タイムアウト耐性の保守的設定ベンチマーク
quick_matrix_benchmark: Fast matrix operation benchmarks for development / 開発用高速行列演算ベンチマーク

Legacy Benchmark Categories / レガシーベンチマークカテゴリ:

tensor_ops: Basic tensor operations / 基本テンソル演算
autograd_ops: Automatic differentiation operations / 自動微分演算
neural_networks: Neural network operations / ニューラルネットワーク
optimized_ops: SIMD and parallel optimizations / SIMD・並列最適化
memory_pool: Memory management performance / メモリ管理性能
memory_optimization: Advanced memory strategies / 高度メモリ戦略
gpu_integration: GPU acceleration benchmarks / GPU加速ベンチマーク

📖 Documentation / ドキュメント

For detailed API documentation, please refer to docs.rs/rustorch.
詳細なAPIドキュメントは docs.rs/rustorch をご覧ください。

🚀 Production Deployment / 本番環境デプロイ

Docker Deployment

RusTorch provides production-ready Docker images with multi-stage builds for optimal performance:

# Production deployment
docker build -t rustorch:latest .
docker run -it rustorch:latest

# GPU-enabled deployment (requires NVIDIA Docker)
docker build -f Dockerfile.gpu -t rustorch:gpu .
docker run --gpus all -it rustorch:gpu

# Development environment
docker compose up rustorch-dev

# Complete multi-service stack
docker compose --profile gpu up  # With GPU support
docker compose --profile python up  # With Jupyter notebooks

CI/CD Pipeline

Automated testing and deployment through GitHub Actions:

Multi-platform Testing: Ubuntu, macOS, Windows across Rust stable/beta/nightly
Code Quality: Rustfmt, Clippy, security audits, dependency reviews
Performance Regression: Automated benchmark comparisons
Security Scanning: Trivy vulnerability scanning, CodeQL analysis
Documentation: Auto-generated and deployed to GitHub Pages
Release Automation: Automated crates.io publishing on releases

Production Features

Memory Safety: Zero unsafe code in core functionality
Thread Safety: Full concurrent operation support
Error Handling: Comprehensive error types and recovery
Monitoring: Built-in performance metrics and logging
Scalability: Horizontal scaling with distributed computing support
Security: Regular dependency audits and vulnerability scanning

🏗️ Architecture Overview / アーキテクチャ概要

🏢 Production Stack
├── 🚀 Application Layer
│   ├── High-level APIs (Sequential, Trainer)
│   ├── Model definitions (CNN, RNN, Transformer)
│   └── Training loops and inference
├── 🧠 Neural Network Layer  
│   ├── Core layers (Linear, Conv2d, Attention)
│   ├── Activation functions (ReLU, Softmax, GELU)
│   └── Normalization (BatchNorm, LayerNorm)
├── 🔧 Computation Engine
│   ├── Tensor operations (Math, Broadcasting)
│   ├── Automatic differentiation (Backprop)
│   └── Memory management (Pools, Zero-copy)
├── ⚡ Optimization Layer
│   ├── SIMD vectorization (AVX2, SSE4.1)
│   ├── Parallel processing (Rayon threading)
│   └── GPU acceleration (CUDA, Metal, OpenCL)
└── 🏗️ Infrastructure Layer
    ├── Cross-platform support (Linux, macOS, Windows)
    ├── WebAssembly bindings (Browser deployment)
    └── Docker containerization (Production-ready)

🤝 Contributing / コントリビューション

We welcome contributions from the community! Here are some areas where your help would be especially valuable:

Priority Areas / 優先領域

🎯 Special Functions Precision: Improve numerical accuracy of Bessel functions, error functions, and gamma functions
⚡ Performance Optimization: SIMD improvements, GPU kernel optimization, memory management
🧪 Testing: Add more comprehensive test cases, edge case handling, numerical stability tests
📚 Documentation: Examples, tutorials, API documentation improvements
🌐 Platform Support: WebAssembly optimizations, mobile platforms, embedded systems
🔧 Algorithm Implementations: New neural network layers, optimizers, loss functions

Special Functions Improvements Needed / 特殊関数の改善が必要

Bessel Functions: Current Y₀(x) implementation has significant precision issues
Error Functions: erfinv/erfcinv precision could be improved from 1e-5 to 1e-10
Numerical Stability: Bessel function recurrence relations need stabilization

How to Contribute / コントリビューション方法

Fork the repository and create a feature branch
Write comprehensive tests for your changes
Ensure all tests pass: cargo test --all-features
Follow Rust best practices and maintain zero unsafe code in core functionality
Add documentation for public APIs
Submit a pull request with a clear description

Development Setup / 開発環境セットアップ

# Clone the repository
git clone https://github.com/JunSuzukiJapan/rustorch.git
cd rustorch

# Run tests
cargo test --all-features

# Run benchmarks
cargo bench

# Check formatting
cargo fmt --check

# Run clippy
cargo clippy --all-targets --all-features

We appreciate all contributions, from bug fixes to major feature additions! すべてのコントリビューション（バグ修正から主要機能追加まで）を歓迎します！

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

rustorch 0.4.0