trtx 0.3.0

Safe Rust bindings to NVIDIA TensorRT-RTX (EXPERIMENTAL - NOT FOR PRODUCTION)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
# trtx-rs

> **⚠️ EXPERIMENTAL - NOT FOR PRODUCTION USE**
>
> This project is in early experimental development. The API is unstable and will change.
> This is NOT production-ready software. Use at your own risk.
>
> Published on crates.io to reserve the crate names.

Safe Rust bindings to [NVIDIA TensorRT-RTX](https://github.com/NVIDIA/TensorRT-RTX) for high-performance deep learning inference.

## Overview

This project provides ergonomic Rust bindings to TensorRT-RTX, enabling efficient inference of deep learning models on NVIDIA GPUs with minimal overhead.

### Features

- **Safe API**: RAII-based memory management and type-safe abstractions
- **Two-phase workflow**: Separate build (AOT) and inference (runtime) phases
- **Zero-cost abstractions**: Minimal overhead over C++ API
- **Comprehensive error handling**: Proper Rust error types for all operations
- **Flexible logging**: Customizable log handlers for TensorRT messages

## Project Structure

```
trtx-rs/
├── trtx-sys/       # Raw FFI bindings (unsafe)
└── trtx/           # Safe Rust wrapper (use this!)
```

## Prerequisites

### Required (building) 

1. **NVIDIA TensorRT-RTX 1.3**: Download and install from [NVIDIA Developer]https://developer.nvidia.com/tensorrt
2. **CUDA Runtime**: Version compatible with your TensorRT-RTX installation
3. **Clang**: Required for autocxx. On Windows: `winget install LLVM.LLVM`
4. **NVIDIA GPU**: Compatible with TensorRT-RTX requirements

TensorRT is by default dynamically loaded. So, the TensorRT SDK is only required for building
with Cargo features `link_tensorrt_rtx`/ `link_tensorrt_onnxparser` which would link the TensorRT libraries.
Use `TENSORRT_RTX_DIR` to point to the TensorRT SDK root directory (the path that contains the `lib` folder with the shared libraries).

### Required (GPU execution) 

1. **NVIDIA TensorRT-RTX**: Download and install from [NVIDIA Developer]https://developer.nvidia.com/tensorrt
     - The TensorRT libraries should be in a location where they can be dynamically loaded.
       (e.g. by setting PATH on Windows or LD_LIBRARY_PATH on Linux)
     - This crate currently requires TensorRT RTX version 1.3 (see Cargo feature `v_1_3`).
       Other versions, might become available in future.

2. **NVIDIA GPU**: Compatible with TensorRT-RTX requirements


### Development Without TensorRT-RTX (Mock Mode)

If you're developing on a machine without TensorRT-RTX (e.g., macOS, or for testing), you can use the `mock` feature. This enables the **trtx mock layer** (safe Rust stubs in `trtx` that mirror the real API), not the low-level FFI:

```bash
# Build with mock mode
cargo build --features mock

# Run examples with mock mode
cargo run --features mock --example basic_workflow

# Run tests with mock mode
cargo test --features mock
```

Mock mode provides stub implementations that allow you to:
- Verify the API compiles correctly
- Test your application structure
- Develop without needing an NVIDIA GPU
- Run CI/CD pipelines on any platform

**Note:** Mock mode only validates structure and API usage. For actual inference, you need real TensorRT-RTX.

## Cargo features

The `trtx` crate has the following Cargo features:

- `default`: "real", "dlopen_tensorrt_onnxparser", "dlopen_tensorrt_rtx", "onnxparser", "v_1_3"
- `mock`: use this library in mock mode. TensorRT libraries and a Nvidia are no longer necessary for execution
- `real`: opposite of `mock` mode. TensorRT and Nvidia GPU are required for execution
- `dlopen_tensorrt_rtx`: enables dynamic loading of the TensorRT library via `trtx::dynamically_load_tensorrt`
- `dlopen_tensorrt_onnxparser`: enables dynamic loading of the TensorRT ONNX parser library via `trtx::dynamically_load_tensorrt_onnxparser`
- `links_tensorrt_rtx`: links the TensorRT library, `trtx::dynamically_load_tensorrt` is now optional
- `links_tensorrt_onnxparser`: links the TensorRT ONNX parser library, `trtx::dynamically_load_tensorrt_onnxparser` is now optional
- `onnxparser`: Enables the ONNX parser functionality of this crate. Optional if not using ONNX as the input format for TensorRT,
  but using the builder library instead
- `v_1_3`: Needs to be always enabled. Future TensorRT versions might be selectable by higher version numbers in future

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
trtx = "0.3"
```

## Usage

### Build Phase (Creating an Engine)

```rust
use trtx::{Logger, Builder};
use trtx::builder::{network_flags, MemoryPoolType};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Dynamically load TensorRT with optional path
    // when using the crate's dlopen_tensorrt_rtx feature (the default, no-op when link_tensorrt_rtx is also enabled)
    trtx::dynamically_load_tensorrt(None::<String>).unwrap();

    // Create logger
    let logger = Logger::stderr()?;

    // Create builder
    let builder = Builder::new(&logger)?;

    // Create network with explicit batch dimensions
    let network = builder.create_network(network_flags::EXPLICIT_BATCH)?;

    // Configure builder
    let mut config = builder.create_config()?;
    config.set_memory_pool_limit(MemoryPoolType::Workspace, 1 << 30)?; // 1GB

    // Build serialized engine
    let engine_data = builder.build_serialized_network(&network, &config)?;

    // Save to disk
    std::fs::write("model.engine", &engine_data)?;

    Ok(())
}
```

### Inference Phase (Running Inference)

```rust
use trtx::{Logger, Runtime};
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Dynamically load TensorRT with optional path
    // when using the crate's dlopen_tensorrt_rtx feature (the default, no-op when link_tensorrt_rtx is also enabled)
    trtx::dynamically_load_tensorrt(None::<String>).unwrap();

    // Create logger and runtime
    let logger = Logger::stderr()?;
    let runtime = Runtime::new(&logger)?;

    // Load serialized engine
    let engine_data = fs::read("model.engine")?;
    let engine = runtime.deserialize_cuda_engine(&engine_data)?;

    // Create execution context
    let mut context = engine.create_execution_context()?;

    // Query tensor information
    let num_tensors = engine.get_nb_io_tensors()?;
    for i in 0..num_tensors {
        let name = engine.get_tensor_name(i)?;
        println!("Tensor {}: {}", i, name);
    }

    // Set tensor addresses (requires CUDA memory)
    unsafe {
        context.set_tensor_address("input", input_device_ptr)?;
        context.set_tensor_address("output", output_device_ptr)?;
    }

    // Execute inference
    unsafe {
        context.enqueue_v3(cuda_stream)?;
    }

    Ok(())
}
```

### Custom Logging

```rust
use trtx::{Logger, LogHandler, Severity};

struct MyLogger;

impl LogHandler for MyLogger {
    fn log(&self, severity: Severity, message: &str) {
        match severity {
            Severity::Error | Severity::InternalError => {
                eprintln!("ERROR: {}", message);
            }
            Severity::Warning => {
                println!("WARN: {}", message);
            }
            _ => {
                println!("INFO: {}", message);
            }
        }
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let logger = Logger::new(MyLogger)?;
    // Use logger...
    Ok(())
}
```

## API Overview

### Core Types

- **`Logger`**: Captures TensorRT messages with custom handlers
- **`Builder`**: Creates optimized inference engines
- **`NetworkDefinition`**: Defines the computational graph
- **`BuilderConfig`**: Configures optimization parameters
- **`Runtime`**: Deserializes engines for inference
- **`CudaEngine`**: Optimized inference engine
- **`ExecutionContext`**: Manages inference execution

### Error Handling

All fallible operations return `Result<T, Error>`:

```rust
use trtx::Error;

match builder.create_network(0) {
    Ok(network) => {
        // Use network
    }
    Err(Error::InvalidArgument(msg)) => {
        eprintln!("Invalid argument: {}", msg);
    }
    Err(e) => {
        eprintln!("Error: {}", e);
    }
}
```

## Safety

### Safe Operations

Most operations are safe and use RAII for resource management:
- Creating loggers, builders, runtimes
- Building and serializing engines
- Deserializing engines
- Creating execution contexts

### Unsafe Operations

CUDA-related operations require `unsafe`:
- **`set_tensor_address`**: Must point to valid CUDA device memory
- **`enqueue_v3`**: Requires valid CUDA stream and properly bound tensors

## Building from Source

```bash
# Clone the repository
git clone https://github.com/rustnn/trtx-rs.git
cd trtx-rs

# Option 1: Build with TensorRT-RTX (requires NVIDIA GPU)
export TENSORRT_RTX_DIR=/path/to/tensorrt-rtx
cargo build --release
cargo test

# Option 2: Build in mock mode (no GPU required)
cargo build --features mock --release
cargo test --features mock
cargo run --features mock --example basic_workflow
```

## Examples

See the `trtx/examples/` directory for complete examples:

- `basic_workflow.rs`: Build and serialize an engine (optionally from ONNX), then run inference
- `tiny_network.rs`: Build a small ReLU-based network from scratch using the Network API (no ONNX)
- `rustnn_executor.rs`: rustnn-compatible executor integration

## Architecture

### trtx-sys (FFI Layer)

- **autocxx**-generated bindings for the TensorRT-RTX C++ API
- Slim C++ logger bridge for virtual method handling (e.g., log callbacks)
- Optional mock FFI (when `mock` feature is enabled) so the crate can build without TensorRT installed
- No safety guarantees; internal use only

### trtx (Safe Wrapper)

- **Mock layer**: When the `mock` feature is enabled, the trtx crate uses a Rust mock layer (`trtx/src/mock/`) that mirrors the real API—this is the “mock mode” you use for development without GPU. Real implementation lives in `trtx/src/real/`.
- RAII-based resource management
- Type-safe API
- Lifetime tracking
- Comprehensive error handling
- User-facing API

## Troubleshooting

### Build Errors

**Cannot find TensorRT headers:**
```bash
export TENSORRT_RTX_DIR=/path/to/tensorrt-rtx
```

**Linking errors:**
```bash
export LD_LIBRARY_PATH=$TENSORRT_RTX_DIR/lib:$LD_LIBRARY_PATH
```

### Runtime Errors

**CUDA not initialized:**
Ensure CUDA runtime is properly initialized before creating engines or contexts.

**Invalid tensor addresses:**
Verify that all tensor addresses point to valid CUDA device memory with correct sizes.

## Development

### Pre-commit Hooks

To ensure code quality, set up the pre-commit hook:

```bash
cp .githooks/pre-commit .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
```

The hook will automatically run `cargo fmt` and `cargo clippy` before each commit.

### Manual Checks

You can also run checks manually using the Makefile:

```bash
make check-all  # Run fmt, clippy, and tests
make fmt        # Format code
make clippy     # Run lints
make test       # Run tests
```

### GPU Testing

The project includes CI workflows for testing with real NVIDIA GPUs:

- **Mock mode CI**: Runs on every push (ubuntu, macos) - tests API without GPU
- **GPU tests**: Runs on self-hosted Windows runner with T4 GPU - tests real TensorRT-RTX

To set up a GPU runner for real hardware testing, see [GPU Runner Setup Guide](.github/GPU_RUNNER_SETUP.md).

The GPU tests workflow:
- Builds without mock mode (uses real TensorRT-RTX)
- Verifies CUDA and GPU availability
- Runs tests and examples with actual GPU acceleration
- Can be triggered manually or runs automatically on code changes

## Contributing

Contributions are welcome! Please see [docs/DESIGN.md](docs/DESIGN.md) for architecture details.

## License

This project is licensed under the Apache License, Version 2.0 - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- NVIDIA for TensorRT-RTX
- The Rust community for excellent FFI tools

## Status

This project is in early development. APIs may change before 1.0 release.

### Implemented

- ✅ Core FFI layer (autocxx); trtx **mock layer** for development without TensorRT (no GPU)
- ✅ Logger interface with custom handlers
- ✅ Builder API for engine creation
- ✅ Runtime and engine deserialization
- ✅ Execution context
- ✅ Error handling with detailed messages
-**Network API**: TensorRT-RTX `INetworkDefinition` supported—build networks in Rust without ONNX
-**ONNX parser bindings** (nvonnxparser integration)
-**CUDA**: cudarc integration for memory management and device sync
-**rustnn-compatible executor API** (ready for integration)
- ✅ RAII-based resource management

### Planned

- ⬜ Dynamic shape support
- ⬜ Optimization profiles
- ⬜ Weight refitting
- ⬜ INT8 quantization support
- ⬜ Comprehensive examples with real models
- ⬜ Performance benchmarking
- ⬜ Documentation improvements

## Resources

- [TensorRT-RTX Documentation]https://docs.nvidia.com/deeplearning/tensorrt-rtx/
- [TensorRT-RTX GitHub]https://github.com/NVIDIA/TensorRT-RTX
- [CUDA Programming Guide]https://docs.nvidia.com/cuda/