diskann-label-filter 0.49.0

# Label Filter Lib

A Rust library for parsing and evaluating filters against JSON meta data.

[label-data-format-rfc.md](../docs/rfcs/cy2025/label-data-format-rfc.md)

## Usage

```rust
use serde_json::json;
use diskann_label_filter::{parse_query_filter, eval_query_expr};

// Create a JSON label
let label = json!({
    "a": 1,
    "b": 2,
    "specs": { "cpu": "i7" },
    "tags": ["red", "blue", "green"]
});

// Create a filter that matches labels with a=1 AND b>1 AND specs.cpu="i7" AND tags contains "blue"
let filter = json!({
    "$and": [
        {"a": {"$eq": 1}},
        {"b": {"$gt": 1}},
        {"specs.cpu": {"$eq": "i7"}},
        {"tags": {"$in": ["blue"]}}
    ]
});

// Parse the filter into an AST
let ast = match parse_query_filter(&filter) {
    Ok(ast) => ast,
    Err(e) => {
        eprintln!("Failed to parse filter: {}", e);
        return;
    }
};

// Evaluate the filter against the label
let matches = eval_query_expr(&ast, &label);
assert!(matches);
```

### Examples

Parse AST and output it as simple query expression

```
cargo run --example print_query
```


Process and evaluate JSON line formatted files with:

```
cargo run --example jsonl_reader_example
```

Convert old txt based format into json based file

```

converter <base_input_file> <query_input_file> <base_output_file> <query_output_file>

cargo run --example converter ..\tests\data\disk_index_search\data.256.label ..\tests\data\disk_index_search\query.128.label ..\tests\data\disk_index_search\data.256.label.jsonl ..\tests\data\disk_index_search\query.128.label.jsonl
```

## Running Benchmarks

The project includes a comprehensive benchmarking suite that can be run with:

```bash
cargo bench
```

Benchmarks are organized in modules under the `benches/benchmarks/` directory:
- `parser_bench.rs`: Evaluates the performance of parsing
- `evaluator_bench.rs`: Evaluates the query evaluation performance

## Implementation Details

### Architecture Overview

The `label-filter` library is built around three core components:

1. **Abstract Syntax Tree (AST)**: A hierarchical representation of query filters
2. **Parser**: Converts JSON query filters to the AST representation
3. **Evaluator**: Evaluates the AST against JSON labels

### Abstract Syntax Tree (AST)

The AST is defined in `ast.rs` and consists of:

```rust
pub enum ASTExpr {
    And(Vec<ASTExpr>),          // Logical AND of sub-expressions
    Or(Vec<ASTExpr>),           // Logical OR of sub-expressions
    Not(Box<ASTExpr>),          // Logical NOT of a sub-expression
    Compare { field: String, op: CompareOp }, // Field comparison
}
```

The `CompareOp` enum uses type-safe representations for different comparison operators:

```rust
pub enum CompareOp {
    Eq(Value),       // Equal to any JSON value
    Ne(Value),       // Not equal to any JSON value
    Lt(f64),         // Less than (numeric only)
    Lte(f64),        // Less than or equal (numeric only)
    Gt(f64),         // Greater than (numeric only)
    Gte(f64),        // Greater than or equal (numeric only)
    In(Vec<Value>),  // Value is in array
    Nin(Vec<Value>), // Value is not in array
}
```

The type-safe design ensures that each operator only accepts appropriate value types, enforcing correctness at compile time.

### Parser

The parser (`parser.rs`) converts JSON filter specifications into the AST. Key features:

- Support for logical operators (`$and`, `$or`, `$not`)
- Support for comparison operators (`$eq`, `$ne`, `$lt`, `$lte`, `$gt`, `$gte`, `$in`, `$nin`)
- Automatic handling of implicit `$and` for multiple field conditions
- Support for dot notation to access nested fields (`user.profile.age`)
- Enforced nesting depth limit
- Type checking for operators (e.g., numeric operators require numeric values)

### Evaluator

The evaluator (`evaluator.rs`) applies the AST against JSON labels to determine if they match:

- Recursive traversal of the AST
- Type-aware comparison operations
- Support for array field values with `$in` and `$nin` operators

### Visitor Pattern

The library implements the Visitor pattern to enable extensible operations on the AST:

- `ASTVisitor` trait defines the interface for visitors
- `PrintVisitor` implementation converts AST to human-readable format
- Display implementation for easy debugging and logging