Module tantivy::aggregation

Expand description

Aggregations

An aggregation summarizes your data as statistics on buckets or metrics.

Aggregations can provide answer to questions like:

What is the average price of all sold articles?
How many errors with status code 500 do we have per day?
What is the average listing price of cars grouped by color?

There are two categories: Metrics and Buckets.

To use aggregations, build an aggregation request by constructing Aggregations. Create an AggregationCollector from this request. AggregationCollector implements the Collector trait and can be passed as collector into searcher.search().

JSON Format

Aggregations request and result structures de/serialize into elasticsearch compatible JSON.

let agg_req: Aggregations = serde_json::from_str(json_request_string).unwrap();
let collector = AggregationCollector::from_aggs(agg_req);
let searcher = reader.searcher();
let agg_res = searcher.search(&term_query, &collector).unwrap_err();
let json_response_string: String = &serde_json::to_string(&agg_res)?;

Limitations

Currently aggregations work only on single value fast fields of type u64, f64 and i64.

Example

Compute the average metric, by building agg_req::Aggregations, which is built from an (String, agg_req::Aggregation) iterator.

use tantivy::aggregation::agg_req::{Aggregations, Aggregation, MetricAggregation};
use tantivy::aggregation::AggregationCollector;
use tantivy::aggregation::metric::AverageAggregation;
use tantivy::query::AllQuery;
use tantivy::aggregation::agg_result::AggregationResults;
use tantivy::IndexReader;

fn aggregate_on_index(reader: &IndexReader) {
    let agg_req: Aggregations = vec![
    (
            "average".to_string(),
            Aggregation::Metric(MetricAggregation::Average(
                AverageAggregation::from_field_name("score".to_string()),
            )),
        ),
    ]
    .into_iter()
    .collect();

    let collector = AggregationCollector::from_aggs(agg_req);

    let searcher = reader.searcher();
    let agg_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
}

Example JSON

Requests are compatible with the elasticsearch json request format.

use tantivy::aggregation::agg_req::Aggregations;

let elasticsearch_compatible_json_req = r#"
{
  "average": {
    "avg": { "field": "score" }
  },
  "range": {
    "range": {
      "field": "score",
      "ranges": [
        { "to": 3.0 },
        { "from": 3.0, "to": 7.0 },
        { "from": 7.0, "to": 20.0 },
        { "from": 20.0 }
      ]
    },
    "aggs": {
      "average_in_range": { "avg": { "field": "score" } }
    }
  }
}
"#;
let agg_req: Aggregations = serde_json::from_str(elasticsearch_compatible_json_req).unwrap();

Code Organization

Check the README on github to see how the code is organized.

Nested Aggregation

Buckets can contain sub-aggregations. In this example we create buckets with the range aggregation and then calculate the average on each bucket.

use tantivy::aggregation::agg_req::{Aggregations, Aggregation, BucketAggregation,
MetricAggregation, BucketAggregationType};
use tantivy::aggregation::metric::AverageAggregation;
use tantivy::aggregation::bucket::RangeAggregation;
let sub_agg_req_1: Aggregations = vec![(
   "average_in_range".to_string(),
        Aggregation::Metric(MetricAggregation::Average(
            AverageAggregation::from_field_name("score".to_string()),
        )),
)]
.into_iter()
.collect();

let agg_req_1: Aggregations = vec![
    (
        "range".to_string(),
        Aggregation::Bucket(BucketAggregation {
            bucket_agg: BucketAggregationType::Range(RangeAggregation{
                field: "score".to_string(),
                ranges: vec![(3f64..7f64).into(), (7f64..20f64).into()],
            }),
            sub_aggregation: sub_agg_req_1.clone(),
        }),
    ),
]
.into_iter()
.collect();

Distributed Aggregation

When the data is distributed on different crate::Index instances, the DistributedAggregationCollector provides functionality to merge data between independent search calls by returning IntermediateAggregationResults. IntermediateAggregationResults provides the merge_fruits method to merge multiple results. The merged result can then be converted into agg_result::AggregationResults via the Into trait.

Modules

agg_req

Contains the aggregation request tree. Used to build an AggregationCollector.

agg_result

Contains the final aggregation tree. This tree can be converted via the into() method from IntermediateAggregationResults. This conversion computes the final result. For example: The intermediate result contains intermediate average results, which is the sum and the number of values. The actual average is calculated on the step from intermediate to final aggregation result tree.

bucket

Module for all bucket aggregations.

intermediate_agg_result

Contains the intermediate aggregation tree, that can be merged. Intermediate aggregation results can be used to merge results between segments or between indices.

metric

Module for all metric aggregations.

Structs

AggregationCollector

Collector for aggregations.

AggregationSegmentCollector

AggregationSegmentCollector does the aggregation collection on a segment.

DistributedAggregationCollector

Collector for distributed aggregations.

Enums

Key

The key to identify a bucket.

Type Definitions

SerializedKey

The serialized key is used in a HashMap.