# Security Features
This document outlines the security and integrity features implemented in xgboost-rust.
## SHA256 Checksum Verification
All downloaded files are verified against known SHA256 checksums before use:
### Header Files
- `c_api.h` - Verified against known checksums for each XGBoost version
- `base.h` - Verified against known checksums for each XGBoost version
Supported versions with verified checksums:
- 3.1.1 (latest)
- 3.0.5
- 2.1.4
- 1.7.6
- 1.4.2
If you need to use a different version, you must add its checksums to `build.rs`:
```rust
checksums.insert("YOUR_VERSION", (
"c_api.h_sha256_here",
"base.h_sha256_here"
));
```
To compute checksums:
```bash
curl -s "https://raw.githubusercontent.com/dmlc/xgboost/vVERSION/include/xgboost/c_api.h" | shasum -a 256
## Download Safety
### Retry Logic with Exponential Backoff
- Downloads retry up to 3 times on failure
- Exponential backoff: 100ms, 200ms, 400ms
- Prevents transient network errors from failing builds
### Atomic File Writes
All files are written atomically using the pattern:
1. Write to temporary file (`.tmp` extension)
2. Sync to disk with `sync_all()`
3. Rename to final destination (atomic on POSIX systems)
This prevents partial/corrupted files if the build is interrupted.
## Caching Strategy
### Wheel Caching
- Wheels are cached in `OUT_DIR/wheel/`
- Cached wheels are reused if they exist
- Reduces build times and network usage
### Library Caching
- Extracted libraries are cached in `OUT_DIR/libs/`
- If library exists, extraction is skipped entirely
- Dramatically speeds up incremental builds
## Input Validation
### Prediction Data Validation
The `predict()` function validates inputs before calling XGBoost:
1. **Dimension Check**: Verifies `data.len() == num_rows * num_features`
2. **Overflow Check**: Uses `checked_mul()` to detect integer overflow
3. **Output Validation**: Checks that XGBoost returns non-null, non-empty results
Example error messages:
```
Data length mismatch: expected 1000 elements (100×10), got 999
Integer overflow: num_rows (1000000000) * num_features (1000000000) exceeds usize::MAX
XGBoost returned null or empty prediction result
```
## Memory Safety
### RAII Guards for Resource Cleanup
DMatrix handles are automatically freed using RAII pattern:
```rust
struct DMatrixGuard(sys::DMatrixHandle);
impl Drop for DMatrixGuard {
fn drop(&mut self) {
unsafe { sys::XGDMatrixFree(self.0); }
}
}
```
This ensures DMatrix handles are **always** freed, even if:
- Prediction fails
- An error occurs
- Function returns early
### Booster Cleanup
Booster handles are freed in the `Drop` implementation, ensuring cleanup when the object goes out of scope.
## Thread Safety
Thread safety is **version-aware** and automatically configured at build time:
- **XGBoost ≥ 1.4**: `Send + Sync` automatically implemented
- **XGBoost < 1.4**: `Send + Sync` NOT implemented (requires explicit synchronization)
See [README.md](README.md#thread-safety) for usage examples.
## Build-Time Verification
The build script (`build.rs`) performs several security checks:
1. ✅ Downloads from official sources only
- GitHub: https://raw.githubusercontent.com/dmlc/xgboost/
- PyPI: https://files.pythonhosted.org/packages/py3/x/xgboost/
2. ✅ Verifies SHA256 checksums for all header files
3. ✅ Uses HTTPS for all downloads (enforced by URL scheme)
4. ✅ Atomic file operations to prevent corruption
5. ✅ Clear error messages for debugging
## Reporting Security Issues
If you discover a security vulnerability, please email security issues to the maintainer rather than creating a public issue.