rcf3 0.4.0

A Rust implementation of the Random Cut Forest algorithm for anomaly detection.
Documentation
# Forest API

`Forest` is the Random Cut Forest detector in `rcf3`. It operates on numerical observations and supports anomaly detection, feature attribution, neighborhood search, missing-value imputation, and time-series forecasting.

## Configuration

| Parameter                 | Type            | Default      | Description                                                             | Constraints                        |
| ------------------------- | --------------- | ------------ | ----------------------------------------------------------------------- | ---------------------------------- |
| `input_dim`               | `usize` / `int` | **Required** | Number of base feature dimensions per observation                       | Must be `> 0`                      |
| `shingle_size`            | `usize` / `int` | `1`          | Temporal window size                                                    | Must be `> 0`                      |
| `capacity`                | `usize` / `int` | `256`        | Maximum number of points stored per tree                                | Must be `> 0`                      |
| `num_trees`               | `usize` / `int` | `50`         | Number of trees in the ensemble                                         | Must be `> 0`                      |
| `time_decay`              | `f64` / `float` | `0.0`        | Exponential time-decay rate; `0.0` uses `0.1 / capacity`                | Must be finite and `>= 0.0`        |
| `output_after`            | `usize` / `int` | `0`          | Minimum updates before non-trivial outputs; `0` uses `1 + capacity / 4` | None                               |
| `internal_shingling`      | `bool`          | `true`       | Whether the forest manages its rolling shingle buffer internally        | None                               |
| `initial_accept_fraction` | `f64` / `float` | `0.125`      | Warm-up sampler acceptance fraction                                     | Must be finite and in `[0.0, 1.0]` |

## Rust API

### Creating a forest

```rust
use rcf3::Forest;

let forest = Forest::builder(2)
    .shingle_size(1)
    .num_trees(50)
    .capacity(256)
    .build()?;
```

With time-series shingling:

```rust
let forest = Forest::builder(4)
    .shingle_size(8)
    .num_trees(100)
    .capacity(512)
    .time_decay(0.01)
    .build()?;
```

From a config object:

```rust
use rcf3::{Forest, RcfConfig};

let config = RcfConfig::new(3)
    .with_num_trees(75)
    .with_capacity(512)
    .with_shingle_size(4);

let forest = Forest::from_config(&config)?;
```

### Basic operations

For online anomaly detection, score first and then update:

```rust
let point = vec![1.5, 2.3];

if forest.is_ready() {
    let score = forest.score(&point)?;
    println!("Anomaly score: {score}");
}

forest.update(&point)?;
println!("Entries seen: {}", forest.entries_seen());
```

### Scoring and interpretation

```rust
let point = vec![1.5, 2.3, -0.5];
let score = forest.score(&point)?;
let displacement = forest.displacement_score(&point)?;
let density = forest.density(&point)?;
```

Feature attribution:

```rust
let point = vec![1.5, 2.3, 100.0];
let attribution = forest.attribution(&point)?;

for (i, attr) in attribution.iter().enumerate() {
    println!("Dimension {i}: below={}, above={}", attr.below, attr.above);
}
```

- `above`: contribution from cuts above the query value
- `below`: contribution from cuts below the query value

### Neighborhood search

`percentile` controls per-tree traversal aggressiveness and must be in `[0, 100]`.
Lower values visit more branches and usually return more candidates.

```rust
let neighbors = forest.near_neighbors(&[1.5, 2.3], 10, 50)?;

for neighbor in neighbors {
    println!("distance={}, score={}", neighbor.distance, neighbor.score);
}
```

### Missing-value imputation

```rust
let point = vec![1.5, f32::NAN, 3.0];
let missing = vec![1];
let imputed = forest.impute(&point, &missing, 1.0)?;
```

### Time-series forecasting

Forecasting requires `internal_shingling = true` and `shingle_size > 1`:

```rust
let mut forest = Forest::builder(4)
    .shingle_size(8)
    .build()?;

for point in stream {
    forest.update(&point)?;
}

let predictions = forest.extrapolate(5)?;
```

`extrapolate(look_ahead)` returns a flat vector of length `look_ahead * input_dim`.

### Serialization

```rust
let json_str = forest.to_json()?;
forest.save_json("forest.json")?;

let loaded = Forest::from_json(&json_str)?;
let loaded_from_file = Forest::load_json("forest.json")?;
```

## Python API

### Creating a forest

```python
from rcf3 import Forest

forest = Forest(
    input_dim=2,
    shingle_size=1,
    num_trees=50,
    capacity=256,
)
```

With time-series shingling:

```python
forest = Forest(
    input_dim=4,
    shingle_size=8,
    num_trees=100,
    capacity=512,
    time_decay=0.01,
    internal_shingling=True,
)
```

### Basic operations

```python
point = [1.5, 2.3]

if forest.is_ready():
    score = forest.score(point)
    print(f"Anomaly score: {score}")

forest.update(point)
print(f"Entries seen: {forest.entries_seen()}")
```

### Scoring and interpretation

```python
point = [1.5, 2.3, -0.5]

score = forest.score(point)
displacement = forest.displacement_score(point)
density = forest.density(point)
```

Feature attribution:

```python
point = [1.5, 2.3, 100.0]
attribution = forest.attribution(point)

for i, attr in enumerate(attribution):
    print(f"Dimension {i}: below={attr['below']}, above={attr['above']}")
```

### Neighborhood search

`percentile` controls per-tree traversal aggressiveness and must be in `[0, 100]`.
Lower values visit more branches and usually return more candidates.

```python
neighbors = forest.near_neighbors([1.5, 2.3], top_k=10, percentile=50)
```

### Missing-value imputation

```python
point = [1.5, float("nan"), 3.0]
missing = [1]
imputed = forest.impute(point, missing, centrality=1.0)
```

### Time-series forecasting

```python
forest = Forest(input_dim=4, shingle_size=8, internal_shingling=True)

for point in stream:
    forest.update(point)

predictions = forest.extrapolate(5)
```

### Serialization

```python
json_str = forest.to_json()
forest.save_json("forest.json")

loaded = Forest.from_json(json_str)
loaded_from_file = Forest.load_json("forest.json")
```

Python objects also support pickle round-trips.

## End-to-end anomaly example

### Rust

```rust
use rcf3::Forest;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut forest = Forest::builder(3)
        .capacity(256)
        .num_trees(50)
        .build()?;

    for i in 0..200 {
        let val = (i as f32) * 0.01;
        forest.update(&[1.0 + val, 2.0 + val, 3.0 + val])?;
    }

    let data = vec![
        vec![1.0, 2.0, 3.0],
        vec![1.1, 2.1, 3.1],
        vec![100.0, 200.0, 300.0],
    ];

    for point in data {
        if forest.is_ready() {
            let score = forest.score(&point)?;
            println!("Point: {point:?}, score={score}");
        }
        forest.update(&point)?;
    }

    Ok(())
}
```

### Python

```python
from rcf3 import Forest

forest = Forest(input_dim=3, capacity=256, num_trees=50)

for i in range(200):
    val = i * 0.01
    forest.update([1.0 + val, 2.0 + val, 3.0 + val])

for point in ([1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [100.0, 200.0, 300.0]):
    if forest.is_ready():
        print(f"Point: {point}, score={forest.score(point)}")
    forest.update(point)
```