# Cloudini Python Decoder
Python decoder for Cloudini-compressed point clouds that uses WebAssembly (via wasmtime) and outputs structured NumPy arrays with named fields.
## Features
✅ **Structured NumPy Arrays**: Returns point clouds with named fields (x, y, z, intensity, etc.)
✅ **Efficient Memory Management**: Pre-allocated 32MB output buffer for optimal performance
✅ **Type-Safe**: Automatic mapping from Cloudini types to NumPy dtypes
✅ **Handles Complex Layouts**: Supports non-contiguous field offsets
✅ **Reusable Decoder**: Single instance can decode multiple messages
✅ **MCAP Integration**: Command-line tool for batch processing MCAP files
## File Structure
- **`cloudini_decoder.py`**: Reusable `CloudiniDecoder` class for decoding CompressedPointCloud2 DDS messages
- **`decode_mcap.py`**: Command-line tool for decoding from MCAP files with statistics
- **`requirements.txt`**: Python dependencies (wasmtime, mcap, numpy)
- **`IMPLEMENTATION_NOTES.md`**: Technical documentation and architecture details
- **`REFACTORING_SUMMARY.md`**: Development history and design decisions
## Requirements
- Python 3.8+
- Cloudini WASM module (built from cloudini_lib)
- Emscripten SDK (for building WASM module)
## Setup
1. **Build the WASM module** (if not already built):
To build the Python-compatible WASM module:
```bash
cd /path/to/cloudini
cmake -B build_wasm -S cloudini_lib \
-DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake
cmake --build build_wasm --target cloudini_wasm
```
This creates:
- `build_wasm/cloudini_wasm.wasm` - Pure WASM binary (for Wasmtime)
- `build_wasm/cloudini_wasm.js` - JavaScript glue code
2. **Create a virtual environment**:
```bash
cd cloudini_py
python3 -m venv venv
source venv/bin/activate
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
```
## Usage
### Command-Line Tool
Decode compressed point clouds from MCAP files:
```bash
# Decode all messages from an MCAP file
python decode_mcap.py /path/to/encoded.mcap
# Decode only first 10 messages
python decode_mcap.py /path/to/encoded.mcap --max-messages 10
# Use custom WASM module path
python decode_mcap.py /path/to/encoded.mcap --wasm ./cloudini_wasm.wasm
```
**Output Example:**
```
--- Message 1 ---
Topic: /points
Timestamp: 1715873853.286s
✓ Decoded successfully!
Data type: [('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', '<f4'), ('ring', '<u2'), ('timestamp', '<f8')]
Fields: x, y, z, intensity, ring, timestamp
First point: (0., 0., 0., 0., 0, 1.71460504e+09)
Compression ratio: 1.70x
```
### As a Python Library
Use the `CloudiniDecoder` class directly in your Python code:
```python
from cloudini_decoder import CloudiniDecoder
import numpy as np
# Initialize the decoder once
decoder = CloudiniDecoder("cloudini_wasm.wasm")
# Decode a raw DDS message (CompressedPointCloud2)
# This could come from MCAP, ROS2, network, file, etc.
point_cloud, header = decoder.decode_message(compressed_dds_message)
# point_cloud is a structured numpy array with named fields
print(f"Number of points: {len(point_cloud)}")
print(f"Field names: {point_cloud.dtype.names}")
print(f"First point: {point_cloud[0]}")
# Access individual fields (returns 1D arrays)
x = point_cloud['x'] # All X coordinates
y = point_cloud['y'] # All Y coordinates
z = point_cloud['z'] # All Z coordinates
intensity = point_cloud['intensity']
# Perform operations on fields
distances = np.sqrt(x**2 + y**2 + z**2)
print(f"Average distance from origin: {distances.mean():.3f}m")
# Filter points (boolean indexing)
nearby_points = point_cloud[distances < 5.0]
print(f"Points within 5m: {len(nearby_points)}")
# Convert to regular arrays for libraries that don't support structured arrays
xyz = np.column_stack([point_cloud['x'],
point_cloud['y'],
point_cloud['z']]) # Shape: (N, 3)
# Reuse the decoder for multiple messages (efficient)
for msg in message_stream:
point_cloud, header = decoder.decode_message(msg)
# Process point_cloud...
```
### Output Format
The decoder returns a tuple of `(point_cloud, header)`:
- **`point_cloud`**: NumPy structured array with named fields
- Shape: `(num_points,)`
- Each point is accessed by field name: `point_cloud['x']`, `point_cloud['intensity']`, etc.
- Supports all NumPy operations: slicing, boolean indexing, arithmetic
- **`header`**: Dictionary with metadata
- `width`, `height`: Point cloud dimensions
- `point_step`: Bytes per point
- `encoding_opt`: Encoding option (LOSSY, LOSSLESS, NONE)
- `compression_opt`: Compression option (LZ4, ZSTD, NONE)
- `fields`: List of field definitions (name, offset, type, resolution)
## How It Works
### Architecture Overview
```
MCAP File
↓
decode_mcap.py (iterates messages)
↓
CloudiniDecoder.decode_message()
↓
┌─────────────────────────────────────┐
│ WASM Module (cloudini_wasm) │
│ │
│ 1. cldn_GetHeaderAsYAMLFromDDS │
│ → Extract header as YAML │
│ │
│ 2. cldn_DecodeCompressedMessage │
│ → Decompress point cloud data │
│ → DDS CompressedPointCloud2 │
│ → DDS PointCloud2 │
└─────────────────────────────────────┘
↓
Parse YAML header (get field definitions)
↓
Extract point data from PointCloud2 DDS
↓
Create structured NumPy array
↓
Return (point_cloud, header)
```
### Key Steps
1. **WASM Initialization**:
- Load WASM module using wasmtime
- Configure WASI (stdin/stdout/stderr)
- Create stub functions for C++ runtime imports
- Pre-allocate 32MB output buffer for efficiency
2. **Header Extraction**:
- Call `cldn_GetHeaderAsYAMLFromDDS` on compressed message
- Parse YAML to extract field definitions (name, offset, type, resolution)
- Build field metadata dictionary
3. **Decompression**:
- Call `cldn_DecodeCompressedMessage` with pre-allocated buffer
- Decompress and decode point cloud data
- Extract raw bytes from PointCloud2 DDS message
4. **NumPy Conversion**:
- Map Cloudini types (FLOAT32, UINT16, etc.) to NumPy dtypes
- Create structured dtype with explicit field offsets
- Handle non-contiguous field layouts (e.g., gaps between fields)
- Return structured array with named fields
### Supported Field Types
- `FLOAT32` → `np.float32`
- `FLOAT64` → `np.float64`
- `UINT8` → `np.uint8`
- `UINT16` → `np.uint16`
- `UINT32` → `np.uint32`
- `INT8` → `np.int8`
- `INT16` → `np.int16`
- `INT32` → `np.int32`
## Performance
### Benchmark Results
Tested on real-world MCAP files:
| dexory_encoded.mcap | 131,072 (1024×128) | ~712 KB | 4.19 MB | **5.89x** |
| bount_encoded.mcap | 64,000 (64000×1) | ~979 KB | 1.66 MB | **1.70x** |
### Optimizations
- **Pre-allocated buffer**: 32MB output buffer allocated once at initialization
- **Zero-copy reads**: Direct memory access from WASM to Python
- **Reusable decoder**: Single instance processes unlimited messages
- **Efficient structured arrays**: No data duplication for field access
### Memory Usage
- **WASM heap**: ~32-40 MB (includes pre-allocated output buffer)
- **Per message**: Minimal additional allocation (input buffer only)
- **NumPy array**: Point step × number of points (1-5 MB typical)
## Troubleshooting
### WASM module not found
```
Error: WASM module not found at /path/to/cloudini_wasm.wasm
```
**Solution**: Build the WASM module first:
```bash
cmake -B build_wasm -S cloudini_lib \
-DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake
cmake --build build_wasm --target cloudini_wasm
```
### Import errors
```
ModuleNotFoundError: No module named 'wasmtime'
```
**Solution**: Install dependencies:
```bash
pip install -r requirements.txt
```
### Memory errors with large point clouds
```
RuntimeError: Failed to allocate memory
```
**Solution**: The decoder pre-allocates 32MB. For larger point clouds, edit `cloudini_decoder.py` line 123:
```python
self.output_ptr = self.allocate(64 * 1024 * 1024) # Increase to 64MB
```
### Decoding failures
```
RuntimeError: Failed to convert compressed message to PointCloud2
```
**Possible causes**:
- Corrupted MCAP file
- Wrong message type (not CompressedPointCloud2)
- Incompatible Cloudini version
**Debug steps**:
1. Verify message type: `ros2 topic info /topic_name` (for ROS2 bags)
2. Test with C++ tool: `./mcap_header_inspector -f file.mcap`
3. Check WASM module version matches compressed data
## Additional Resources
- **Technical details**: See [IMPLEMENTATION_NOTES.md](IMPLEMENTATION_NOTES.md)
- **Development history**: See [REFACTORING_SUMMARY.md](REFACTORING_SUMMARY.md)
- **Cloudini library**: See [../cloudini_lib/](../cloudini_lib/)
- **NumPy structured arrays**: https://numpy.org/doc/stable/user/basics.rec.html