xgboost-rust 0.1.0

Rust bindings for XGBoost, a gradient boosting library for machine learning. Downloads XGBoost binaries at build time for cross-platform compatibility.
# Security Features

This document outlines the security and integrity features implemented in xgboost-rust.

## SHA256 Checksum Verification

All downloaded files are verified against known SHA256 checksums before use:

### Header Files
- `c_api.h` - Verified against known checksums for each XGBoost version
- `base.h` - Verified against known checksums for each XGBoost version

Supported versions with verified checksums:
- 3.1.1 (latest)
- 3.0.5
- 2.1.4
- 1.7.6
- 1.4.2

If you need to use a different version, you must add its checksums to `build.rs`:

```rust
checksums.insert("YOUR_VERSION", (
    "c_api.h_sha256_here",
    "base.h_sha256_here"
));
```

To compute checksums:
```bash
curl -s "https://raw.githubusercontent.com/dmlc/xgboost/vVERSION/include/xgboost/c_api.h" | shasum -a 256
curl -s "https://raw.githubusercontent.com/dmlc/xgboost/vVERSION/include/xgboost/base.h" | shasum -a 256
```

## Download Safety

### Retry Logic with Exponential Backoff
- Downloads retry up to 3 times on failure
- Exponential backoff: 100ms, 200ms, 400ms
- Prevents transient network errors from failing builds

### Atomic File Writes
All files are written atomically using the pattern:
1. Write to temporary file (`.tmp` extension)
2. Sync to disk with `sync_all()`
3. Rename to final destination (atomic on POSIX systems)

This prevents partial/corrupted files if the build is interrupted.

## Caching Strategy

### Wheel Caching
- Wheels are cached in `OUT_DIR/wheel/`
- Cached wheels are reused if they exist
- Reduces build times and network usage

### Library Caching
- Extracted libraries are cached in `OUT_DIR/libs/`
- If library exists, extraction is skipped entirely
- Dramatically speeds up incremental builds

## Input Validation

### Prediction Data Validation
The `predict()` function validates inputs before calling XGBoost:

1. **Dimension Check**: Verifies `data.len() == num_rows * num_features`
2. **Overflow Check**: Uses `checked_mul()` to detect integer overflow
3. **Output Validation**: Checks that XGBoost returns non-null, non-empty results

Example error messages:
```
Data length mismatch: expected 1000 elements (100×10), got 999
Integer overflow: num_rows (1000000000) * num_features (1000000000) exceeds usize::MAX
XGBoost returned null or empty prediction result
```

## Memory Safety

### RAII Guards for Resource Cleanup
DMatrix handles are automatically freed using RAII pattern:

```rust
struct DMatrixGuard(sys::DMatrixHandle);
impl Drop for DMatrixGuard {
    fn drop(&mut self) {
        unsafe { sys::XGDMatrixFree(self.0); }
    }
}
```

This ensures DMatrix handles are **always** freed, even if:
- Prediction fails
- An error occurs
- Function returns early

### Booster Cleanup
Booster handles are freed in the `Drop` implementation, ensuring cleanup when the object goes out of scope.

## Thread Safety

Thread safety is **version-aware** and automatically configured at build time:

- **XGBoost ≥ 1.4**: `Send + Sync` automatically implemented
- **XGBoost < 1.4**: `Send + Sync` NOT implemented (requires explicit synchronization)

See [README.md](README.md#thread-safety) for usage examples.

## Build-Time Verification

The build script (`build.rs`) performs several security checks:

1. ✅ Downloads from official sources only
   - GitHub: https://raw.githubusercontent.com/dmlc/xgboost/
   - PyPI: https://files.pythonhosted.org/packages/py3/x/xgboost/

2. ✅ Verifies SHA256 checksums for all header files

3. ✅ Uses HTTPS for all downloads (enforced by URL scheme)

4. ✅ Atomic file operations to prevent corruption

5. ✅ Clear error messages for debugging

## Reporting Security Issues

If you discover a security vulnerability, please email security issues to the maintainer rather than creating a public issue.