trtx 0.3.0 - Docs.rs

# trtx-rs

> **⚠️ EXPERIMENTAL - NOT FOR PRODUCTION USE**
>
> This project is in early experimental development. The API is unstable and will change.
> This is NOT production-ready software. Use at your own risk.
>
> Published on crates.io to reserve the crate names.

Safe Rust bindings to [NVIDIA TensorRT-RTX](https://github.com/NVIDIA/TensorRT-RTX) for high-performance deep learning inference.

## Overview

This project provides ergonomic Rust bindings to TensorRT-RTX, enabling efficient inference of deep learning models on NVIDIA GPUs with minimal overhead.

### Features

- **Safe API**: RAII-based memory management and type-safe abstractions
- **Two-phase workflow**: Separate build (AOT) and inference (runtime) phases
- **Zero-cost abstractions**: Minimal overhead over C++ API
- **Comprehensive error handling**: Proper Rust error types for all operations
- **Flexible logging**: Customizable log handlers for TensorRT messages

## Project Structure

```
trtx-rs/
├── trtx-sys/       # Raw FFI bindings (unsafe)
└── trtx/           # Safe Rust wrapper (use this!)
```

## Prerequisites

### Required (building) 

1. **NVIDIA TensorRT-RTX 1.3**: Download and install from [NVIDIA Developer](https://developer.nvidia.com/tensorrt)
2. **CUDA Runtime**: Version compatible with your TensorRT-RTX installation
3. **Clang**: Required for autocxx. On Windows: `winget install LLVM.LLVM`
4. **NVIDIA GPU**: Compatible with TensorRT-RTX requirements

TensorRT is by default dynamically loaded. So, the TensorRT SDK is only required for building
with Cargo features `link_tensorrt_rtx`/ `link_tensorrt_onnxparser` which would link the TensorRT libraries.
Use `TENSORRT_RTX_DIR` to point to the TensorRT SDK root directory (the path that contains the `lib` folder with the shared libraries).

### Required (GPU execution) 

1. **NVIDIA TensorRT-RTX**: Download and install from [NVIDIA Developer](https://developer.nvidia.com/tensorrt)
     - The TensorRT libraries should be in a location where they can be dynamically loaded.
       (e.g. by setting PATH on Windows or LD_LIBRARY_PATH on Linux)
     - This crate currently requires TensorRT RTX version 1.3 (see Cargo feature `v_1_3`).
       Other versions, might become available in future.

2. **NVIDIA GPU**: Compatible with TensorRT-RTX requirements


### Development Without TensorRT-RTX (Mock Mode)

If you're developing on a machine without TensorRT-RTX (e.g., macOS, or for testing), you can use the `mock` feature. This enables the **trtx mock layer** (safe Rust stubs in `trtx` that mirror the real API), not the low-level FFI:

```bash
# Build with mock mode
cargo build --features mock

# Run examples with mock mode
cargo run --features mock --example basic_workflow

# Run tests with mock mode
cargo test --features mock
```

Mock mode provides stub implementations that allow you to:
- Verify the API compiles correctly
- Test your application structure
- Develop without needing an NVIDIA GPU
- Run CI/CD pipelines on any platform

**Note:** Mock mode only validates structure and API usage. For actual inference, you need real TensorRT-RTX.

## Cargo features

The `trtx` crate has the following Cargo features:

- `default`: "real", "dlopen_tensorrt_onnxparser", "dlopen_tensorrt_rtx", "onnxparser", "v_1_3"
- `mock`: use this library in mock mode. TensorRT libraries and a Nvidia are no longer necessary for execution
- `real`: opposite of `mock` mode. TensorRT and Nvidia GPU are required for execution
- `dlopen_tensorrt_rtx`: enables dynamic loading of the TensorRT library via `trtx::dynamically_load_tensorrt`
- `dlopen_tensorrt_onnxparser`: enables dynamic loading of the TensorRT ONNX parser library via `trtx::dynamically_load_tensorrt_onnxparser`
- `links_tensorrt_rtx`: links the TensorRT library, `trtx::dynamically_load_tensorrt` is now optional
- `links_tensorrt_onnxparser`: links the TensorRT ONNX parser library, `trtx::dynamically_load_tensorrt_onnxparser` is now optional
- `onnxparser`: Enables the ONNX parser functionality of this crate. Optional if not using ONNX as the input format for TensorRT,
  but using the builder library instead
- `v_1_3`: Needs to be always enabled. Future TensorRT versions might be selectable by higher version numbers in future

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
trtx = "0.3"
```

## Usage

### Build Phase (Creating an Engine)

```rust
use trtx::{Logger, Builder};
use trtx::builder::{network_flags, MemoryPoolType};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Dynamically load TensorRT with optional path
    // when using the crate's dlopen_tensorrt_rtx feature (the default, no-op when link_tensorrt_rtx is also enabled)
    trtx::dynamically_load_tensorrt(None::<String>).unwrap();

    // Create logger
    let logger = Logger::stderr()?;

    // Create builder
    let builder = Builder::new(&logger)?;

    // Create network with explicit batch dimensions
    let network = builder.create_network(network_flags::EXPLICIT_BATCH)?;

    // Configure builder
    let mut config = builder.create_config()?;
    config.set_memory_pool_limit(MemoryPoolType::Workspace, 1 << 30)?; // 1GB

    // Build serialized engine
    let engine_data = builder.build_serialized_network(&network, &config)?;

    // Save to disk
    std::fs::write("model.engine", &engine_data)?;

    Ok(())
}
```

### Inference Phase (Running Inference)

```rust
use trtx::{Logger, Runtime};
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Dynamically load TensorRT with optional path
    // when using the crate's dlopen_tensorrt_rtx feature (the default, no-op when link_tensorrt_rtx is also enabled)
    trtx::dynamically_load_tensorrt(None::<String>).unwrap();

    // Create logger and runtime
    let logger = Logger::stderr()?;
    let runtime = Runtime::new(&logger)?;

    // Load serialized engine
    let engine_data = fs::read("model.engine")?;
    let engine = runtime.deserialize_cuda_engine(&engine_data)?;

    // Create execution context
    let mut context = engine.create_execution_context()?;

    // Query tensor information
    let num_tensors = engine.get_nb_io_tensors()?;
    for i in 0..num_tensors {
        let name = engine.get_tensor_name(i)?;
        println!("Tensor {}: {}", i, name);
    }

    // Set tensor addresses (requires CUDA memory)
    unsafe {
        context.set_tensor_address("input", input_device_ptr)?;
        context.set_tensor_address("output", output_device_ptr)?;
    }

    // Execute inference
    unsafe {
        context.enqueue_v3(cuda_stream)?;
    }

    Ok(())
}
```

### Custom Logging

```rust
use trtx::{Logger, LogHandler, Severity};

struct MyLogger;

impl LogHandler for MyLogger {
    fn log(&self, severity: Severity, message: &str) {
        match severity {
            Severity::Error | Severity::InternalError => {
                eprintln!("ERROR: {}", message);
            }
            Severity::Warning => {
                println!("WARN: {}", message);
            }
            _ => {
                println!("INFO: {}", message);
            }
        }
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let logger = Logger::new(MyLogger)?;
    // Use logger...
    Ok(())
}
```

## API Overview

### Core Types

- **`Logger`**: Captures TensorRT messages with custom handlers
- **`Builder`**: Creates optimized inference engines
- **`NetworkDefinition`**: Defines the computational graph
- **`BuilderConfig`**: Configures optimization parameters
- **`Runtime`**: Deserializes engines for inference
- **`CudaEngine`**: Optimized inference engine
- **`ExecutionContext`**: Manages inference execution

### Error Handling

All fallible operations return `Result<T, Error>`:

```rust
use trtx::Error;

match builder.create_network(0) {
    Ok(network) => {
        // Use network
    }
    Err(Error::InvalidArgument(msg)) => {
        eprintln!("Invalid argument: {}", msg);
    }
    Err(e) => {
        eprintln!("Error: {}", e);
    }
}
```

## Safety

### Safe Operations

Most operations are safe and use RAII for resource management:
- Creating loggers, builders, runtimes
- Building and serializing engines
- Deserializing engines
- Creating execution contexts

### Unsafe Operations

CUDA-related operations require `unsafe`:
- **`set_tensor_address`**: Must point to valid CUDA device memory
- **`enqueue_v3`**: Requires valid CUDA stream and properly bound tensors

## Building from Source

```bash
# Clone the repository
git clone https://github.com/rustnn/trtx-rs.git
cd trtx-rs

# Option 1: Build with TensorRT-RTX (requires NVIDIA GPU)
export TENSORRT_RTX_DIR=/path/to/tensorrt-rtx
cargo build --release
cargo test

# Option 2: Build in mock mode (no GPU required)
cargo build --features mock --release
cargo test --features mock
cargo run --features mock --example basic_workflow
```

## Examples

See the `trtx/examples/` directory for complete examples:

- `basic_workflow.rs`: Build and serialize an engine (optionally from ONNX), then run inference
- `tiny_network.rs`: Build a small ReLU-based network from scratch using the Network API (no ONNX)
- `rustnn_executor.rs`: rustnn-compatible executor integration

## Architecture

### trtx-sys (FFI Layer)

- **autocxx**-generated bindings for the TensorRT-RTX C++ API
- Slim C++ logger bridge for virtual method handling (e.g., log callbacks)
- Optional mock FFI (when `mock` feature is enabled) so the crate can build without TensorRT installed
- No safety guarantees; internal use only

### trtx (Safe Wrapper)

- **Mock layer**: When the `mock` feature is enabled, the trtx crate uses a Rust mock layer (`trtx/src/mock/`) that mirrors the real API—this is the “mock mode” you use for development without GPU. Real implementation lives in `trtx/src/real/`.
- RAII-based resource management
- Type-safe API
- Lifetime tracking
- Comprehensive error handling
- User-facing API

## Troubleshooting

### Build Errors

**Cannot find TensorRT headers:**
```bash
export TENSORRT_RTX_DIR=/path/to/tensorrt-rtx
```

**Linking errors:**
```bash
export LD_LIBRARY_PATH=$TENSORRT_RTX_DIR/lib:$LD_LIBRARY_PATH
```

### Runtime Errors

**CUDA not initialized:**
Ensure CUDA runtime is properly initialized before creating engines or contexts.

**Invalid tensor addresses:**
Verify that all tensor addresses point to valid CUDA device memory with correct sizes.

## Development

### Pre-commit Hooks

To ensure code quality, set up the pre-commit hook:

```bash
cp .githooks/pre-commit .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
```

The hook will automatically run `cargo fmt` and `cargo clippy` before each commit.

### Manual Checks

You can also run checks manually using the Makefile:

```bash
make check-all  # Run fmt, clippy, and tests
make fmt        # Format code
make clippy     # Run lints
make test       # Run tests
```

### GPU Testing

The project includes CI workflows for testing with real NVIDIA GPUs:

- **Mock mode CI**: Runs on every push (ubuntu, macos) - tests API without GPU
- **GPU tests**: Runs on self-hosted Windows runner with T4 GPU - tests real TensorRT-RTX

To set up a GPU runner for real hardware testing, see [GPU Runner Setup Guide](.github/GPU_RUNNER_SETUP.md).

The GPU tests workflow:
- Builds without mock mode (uses real TensorRT-RTX)
- Verifies CUDA and GPU availability
- Runs tests and examples with actual GPU acceleration
- Can be triggered manually or runs automatically on code changes

## Contributing

Contributions are welcome! Please see [docs/DESIGN.md](docs/DESIGN.md) for architecture details.

## License

This project is licensed under the Apache License, Version 2.0 - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- NVIDIA for TensorRT-RTX
- The Rust community for excellent FFI tools

## Status

This project is in early development. APIs may change before 1.0 release.

### Implemented

- ✅ Core FFI layer (autocxx); trtx **mock layer** for development without TensorRT (no GPU)
- ✅ Logger interface with custom handlers
- ✅ Builder API for engine creation
- ✅ Runtime and engine deserialization
- ✅ Execution context
- ✅ Error handling with detailed messages
- ✅ **Network API**: TensorRT-RTX `INetworkDefinition` supported—build networks in Rust without ONNX
- ✅ **ONNX parser bindings** (nvonnxparser integration)
- ✅ **CUDA**: cudarc integration for memory management and device sync
- ✅ **rustnn-compatible executor API** (ready for integration)
- ✅ RAII-based resource management

### Planned

- ⬜ Dynamic shape support
- ⬜ Optimization profiles
- ⬜ Weight refitting
- ⬜ INT8 quantization support
- ⬜ Comprehensive examples with real models
- ⬜ Performance benchmarking
- ⬜ Documentation improvements

## Resources

- [TensorRT-RTX Documentation](https://docs.nvidia.com/deeplearning/tensorrt-rtx/)
- [TensorRT-RTX GitHub](https://github.com/NVIDIA/TensorRT-RTX)
- [CUDA Programming Guide](https://docs.nvidia.com/cuda/)