flusso-query 0.10.1

Backend-neutral OpenSearch/Elasticsearch query client for flusso indexes.
Documentation

flusso-query — the typed query client

flusso keeps an OpenSearch index in sync with Postgres from a declarative schema (the write side; see README.md). That schema is a contract: it fixes the shape of every document — which fields exist, their types, which are nested arrays, which are scalars. flusso enforces that contract on the write side; flusso-query enforces it on the read side.

flusso-query does not generate the document types for you. You write the document struct by hand and keep full control — its derives, field types, which fields it projects, how doc keys map to Rust names. #[derive(FlussoDocument)] then, at compile time and with no database, does two things against the resolved schema:

  1. Validates that every struct field lines up with the schema — the field exists, its Rust type matches, its nullability matches. A struct that has drifted stops compiling, pointing at the offending field.
  2. Generates the typed query surface from the schema — field handles, get/query, the schema hash — so a service queries the index like a typed function: field names checked at compile time, operators that only exist for the field types that support them, results that deserialize into your struct.

The query surface comes from the full schema, not from the struct — so you can filter or sort on a field even if your struct doesn't deserialize it. When the schema changes and the index is rebuilt, anything that no longer fits stops compiling at cargo build.

Quick reference

Looking for… Jump to
A full worked example (structs + query) A query, from the caller's seat
Which operators each field type exposes What the field type buys you
and/or/not, clause vs combinator style, count/ids Composing queries
Per-query options, compound/scoring, standalone, sort, search-level Query options and extra query types
Nested arrays — filter by vs filter of (any/all, filter_nested) Filtering nested collections
Several searches, one round-trip _msearch
One blended result list across indexes Combined search
How the macro finds the schema; path for nested structs Binding to the schema
flusso type → Rust type → handle flusso types → Rust types
Anything the typed builder can't express The escape hatch
Reading a prefixed deployment Resolving the index name

A query, from the caller's seat

A service that searches the users index from the schema guide. You write the structs; #[derive(FlussoDocument)] validates them and generates the query surface. The only input is the index name — it finds flusso.toml itself (see Binding to the schema).

The document structs

use flusso_query::{Client, FlussoDocument};

/// A `users` document — *you* write this. It's a **projection**: it deserializes
/// the fields below and omits the rest of the index (addresses, profile,
/// avgOrderValue, …), which the derive allows. The derive checks every field
/// against the `users` schema and hangs the typed query surface off `User`.
#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users")]                     // ← the only input: which index
pub struct User {
    pub id: i32,                                // primary key (integer) — never null
    pub email: String,                          // keyword, required → never null
    #[serde(rename = "fullName")]
    pub full_name: Option<String>,              // text, not required → nullable
    pub account: Account,                       // an object — always assembled
    pub orders: Vec<Order>,                     // has_many join → nested, never null
    #[serde(rename = "orderCount")]
    pub order_count: i64,                        // count aggregate → long, never null
    #[serde(rename = "lifetimeValue")]
    pub lifetime_value: Option<f64>,            // sum aggregate → nullable
}

/// The `account` object — a same-row sub-object. A nested/object struct validates
/// against its `path` in the same index; it has no entry points of its own.
#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users", path = "account")]
pub struct Account {
    pub tier: String,                           // enum → keyword, required
    pub country: Option<String>,                // keyword, not required
    #[serde(rename = "createdAt")]
    pub created_at: time::OffsetDateTime,       // timestamp, required
}

#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users", path = "orders")]
pub struct Order {
    pub status: String,                         // enum, required
    pub total: Decimal,                         // decimal column (`decimal` feature); or `f64`
    #[serde(rename = "placedAt")]
    pub placed_at: time::OffsetDateTime,        // timestamp, required
    pub items: Vec<Item>,                       // a deeper has_many → nested
}

#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users", path = "orders.items")]
pub struct Item {
    #[serde(rename = "productId")]
    pub product_id: i32,
    pub quantity: i32,
    #[serde(rename = "unitPrice")]
    pub unit_price: Decimal,
}

The query

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Transport. Points at OpenSearch, not at flusso — the engine is write-only;
    // reads go straight to the index it maintains.
    let client = Client::connect("https://localhost:9200")?
        .basic_auth("admin", std::env::var("OS_PASSWORD")?);

    // Fetch one document by its id. `id` is typed as the root table's primary
    // key (i32 here), and the result is Option<User> — None when absent.
    let user: Option<User> = User::get(&client, 42).await?;

    // A typed search. Every `User::field()` is a handle that only exposes the
    // operators its mapping supports (see "What the field type buys you"). The
    // handles cover the *whole schema* — so `User::addresses()` filters fine even
    // though this projection never deserializes addresses.
    let page = User::query()
        .filter(User::email().eq("ada@example.com"))      // keyword → exact
        .filter(User::order_count().gte(5))               // long → range
        .filter(User::account().tier().eq("gold"))        // into the object's handles
        .query(User::full_name().matches("ada lovelace")) // text → analyzed match
        .filter(User::orders().any(Order::status().eq("delivered")))   // nested, via your Order struct
        .filter(User::addresses().any(AddressFields::city().eq("Boston"))) // not projected — generated namespace
        .sort(User::order_count().desc())
        .from(0)
        .size(20)
        .send(&client)
        .await?;

    println!("{} total matches", page.total);
    for hit in page.hits {
        // hit.source is a fully-typed User. hit.id and hit.score come from the
        // search envelope, not the document body.
        let u: &User = &hit.source;
        println!("{:.3}  {}  ({} orders)", hit.score, u.email, u.order_count);
        for order in &u.orders {                          // Vec<Order>
            println!("    order — {}{}", order.total, order.status);
        }
    }

    Ok(())
}

What you can't write

The compiler refuses the mistakes:

  • User::email().matches(..)matches is a text operator and email is a keyword.
  • User::full_name().eq(..)full_name is analyzed text, so it has no exact eq.
  • User::nmae() — there is no such handle.
  • hit.source.totl — no such field.

And the struct itself can't drift. Declaring email: i32, or email: Option<String> when the field is required, or a field totl the schema doesn't have, is a compile error from the derive. The schema has been lifted into the type system, so the compiler checks both your queries and your struct against it.

OS_PASSWORD is just a variable name picked for the example. The full env-var story (secrets, the reserved deployment overrides, FLUSSO_CONFIG) lives in the configuration guide.


What the field type buys you

The handle a schema field generates determines the operators in scope: an operator that doesn't make sense for a field's type doesn't exist on its handle, so the mistake is a compile error rather than a 400 from OpenSearch.

Handle Operators
Keyword eq, any_of, prefix, wildcard, regexp, fuzzy, exists
Text matches, match_phrase, match_phrase_prefix, matches_fuzzy, any_of (exact, via .keyword), existsno exact eq (it's analyzed)
Bool eq, exists, asc/desc
Number eq, any_of, lt, lte, gt, gte, between, exists
Date eq, any_of, lt, lte, gt, gte, between, exists
Object<S> exists (a same-document sub-object — an object field or a to-one join (belongs_to/has_one); S is the enclosing scope). Its sub-fields are flattened, so query them via the child struct's dotted-path handles (Account::tier()), not through this handle
Nested<S, T> any(q) / all(q) to match parents and lift the child query into scope Sq is a child query built from T's handles (merging); matching(q) (with .sort/.size) to shape what's returned — see Filtering nested collections; plus exists
Geo within(Distance::km(12.0), center), within_box, within_polygon, exists; sort with distance_from(center) (nearest-first sugar) or distance_sort(center, order, DistanceUnit)
TextMap key(k) → a Text leaf for one key; search(q) (cross-key, .prefer(key, weight) / .only_preferred()); has_key(k), exists
KeywordMap key(k) → a Keyword leaf for one key; has_key(k), existsno search (exact-match, use key(..).eq(..))
NumberMap key(k) → a Number leaf for one key (eq/lt/gte/between/…); has_key(k), exists
DateMap key(k) → a Date leaf for one key (eq/gte/between/…); has_key(k), exists
Binary exists — base64-encoded, not searchable
Json exists, raw(serde_json::Value) — the untyped fallback

Each operator's argument is typed by kind, not by one fixed Rust type — a handle accepts any value implementing FlussoValue<kind::…> (which requires serde::Serialize):

  • Numerics are split per type (kind::Byte/Short/Integer/Long/Float/ Double/Decimal), and a value is accepted only if it widens into that kind without loss. So a Long field takes any integer, a Double takes f32/ f64 or a small int, a Decimal takes Decimal (decimal feature) or an integer — but a float on an integer field, or an i64 on a Short, is a compile error. Bare literals work where lossless: eq(5) on long/integer/ double/decimal, eq(5.0) on float/double. A byte/short field needs a typed literal (eq(5i16)), since 5 defaults to i32 (which would narrow).
  • Keyword takes a String/&str or a #[derive(FlussoValue)] keyword enum/newtype (matched against its serde string form).
  • Bool takes a bool or a bool #[derive(FlussoValue)] newtype.
  • Date takes a String/&str ISO-8601 literal or — with the chrono feature — a NaiveDate / NaiveDateTime / DateTime<Utc>.

A custom money/quantity type queries with no cast — Order::total().eq(Money(d)) — as long as it's a FlussoValue of the field's kind (see below). A value of the wrong kind is a compile error (Order::created_at().gte(5), Order::age().eq(1.5)).

Subfield accessors. flusso's sink auto-enriches text/keyword fields (auto_subfields, on by default) with exact / sortable / searchable subfields, and the handles expose them — no Keyword::at("code.keyword") string path:

User::full_name()                      // Text   → analyzed full-text match
User::full_name().keyword()            // Keyword → exact / wildcard / prefix
User::full_name().keyword_lowercase()  // Keyword → case-insensitive match / sort
User::email().text()                   // Text    → full-text over a keyword field

(A wildcard belongs on .keyword(), not the analyzed handle, which matches tokens not the whole value.) These accessors exist only when the subfield is actually provisioned — and the derive enforces it at compile time: a text/keyword handle is stamped with subfields only when every OpenSearch sink has auto_subfields on and the field declares no custom fields. Otherwise the handle is …<NoSubfields> and .keyword() / .text() / .keyword_lowercase() (and the Text::any_of / Text::asc sugar built on them) simply don't exist — calling one is a compile error, not a runtime 400.

Sorting is similar — sort(…) accepts handles whose type is sortable (numbers, dates, keywords, booleans, and text). Text::asc/desc is sugar that sorts via the case-insensitive .keyword_lowercase subfield automatically; reach for User::full_name().keyword().desc() when you want an exact-case sort instead. A geo_point sorts by proximity with User::location().distance_from(center).

A few clauses span more than one field, so they're free functions: multi_match("ada", [User::full_name(), User::bio()]) runs one analyzed query across several Text fields (weight one with User::full_name().boosted(3.0)).

The typed surface is broad — see Query options and extra query types for the per-query options (boost/case_insensitive/fuzziness/…), the compound/scoring queries (constant_score/dis_max/function_score/boosting), the standalone ones (ids/query_string/simple_query_string/script_score/…), and the search-level controls (min_score/collapse/search_after/highlight/…). What's left for the raw hatch is knn, geo_shape, span and parent/child queries — types with no corresponding flusso field.

Composing queries

A handle's operator produces a Query<S> — carrying the scope it was built in. The root document and any flattened object/to-one-join sub-field share the scope Root; only a nested array introduces a fresh scope, tagged with its element type (Query<Order>).

Queries compose with and / or / not within a scope, and the Search builder exposes the bool-query clauses directly:

// Combinator style.
let q = User::email().eq("ada@example.com")
    .and(User::order_count().gte(5))
    .and(User::orders().any(Order::status().eq("delivered").and(Order::total().gt(0.0))));

User::query().query(q).send(&client).await?;

// Clause style — `filter`/`must_not` don't score, `query`(=must)/`should` do.
User::query()
    .query(User::full_name().matches("ada"))       // scored
    .filter(User::order_count().gte(5))            // filtered, cached, no score
    .must_not(User::email().prefix("test-"))
    .should(User::orders().any(Order::status().eq("delivered")))
    .send(&client)
    .await?;

query/filter/must_not/should accept anything that is a Query<Root>, so the two styles mix freely. Which clause to reach for:

Clause Scores? Bool slot Use for
query yes must full-text relevance you want ranked
filter no (cached) filter exact constraints — terms, ranges, nested any
must_not no must_not exclusions
should yes should optional boosts; a real constraint with min_should_match

A built search can finish as a count instead of a page: .count() sends the same query to _count and returns the number of matching documents (u64) — cheaper than send() when the hits aren't needed. Sort, from/size, and filter_nested projections are ignored; they never change which documents match.

let open_orders: u64 = User::query()
    .filter(User::orders().any(Order::status().eq("open")))
    .count(&client)
    .await?;

Or as an id page: .ids() runs the same search with _source: false and returns Vec<String> — the matching document ids (root primary keys, stringified), in order, no sources fetched. Sort and from/size apply as in send(); filter_nested projections are dropped. The cheap way to feed another lookup — e.g. search in OpenSearch, then load the full rows from Postgres:

let user_ids: Vec<String> = User::query()
    .filter(User::orders().any(Order::status().eq("open")))
    .sort(User::order_count().desc())
    .size(100)
    .ids(&client)
    .await?;

Query options and extra query types

Each leaf operator returns a small builder that carries that query's options plus the universal boost(f32) and name(&str) (_name, surfaced in a hit's matched_queries). With no option set it renders the DSL shorthand; set one and it expands. A builder drops straight into a clause (it's an AsQuery), so no .build() is needed:

User::query()
    .should(User::full_name().matches("acme").boost(2.0))         // weighted text
    .should(User::full_name().keyword().wildcard("*acme*").case_insensitive())
    .should(User::full_name().matches("acme").fuzziness(Fuzziness::Auto))  // typo-tolerant
    .min_should_match(1)                                          // make should a real filter
    .filter(User::owner_id().eq(owner_uuid))                      // uuid keyword (feature)
    .filter(User::tier().eq(Tier::Pro))                           // enum keyword
    .sort(User::created_at().desc().missing_first())              // null-aware sort
    .send(&client).await?;

The options per query type (all optional; plus the universal boost/name on every builder):

Query Options
term / prefix / wildcard / regexp case_insensitive
prefix / wildcard rewrite
regexp flags, max_determinized_states
fuzzy fuzziness, prefix_length, max_expansions, transpositions
matches fuzziness, operator, minimum_should_match, prefix_length, analyzer, zero_terms_query, lenient
phrase matches slop, analyzer
multi_match type, operator, fuzziness, tie_breaker, minimum_should_match
range format, time_zone, relation
within (geo) distance_type, validation_method
nested any score_mode, ignore_unmapped

The enumerable options are closed enums, not strings — a typo is a compile error, not a 400. operator/default_operator take Operator { And, Or }; fuzziness takes Fuzziness { Auto, AutoBounds(u32, u32), Edits(u32) }; multi_match's type takes MultiMatchType; zero_terms_query takes ZeroTermsQuery { None, All }; a range relation takes RangeRelation { Intersects, Contains, Within }; function_score's score_mode/boost_mode take ScoreMode/BoostMode; a nested score_mode takes NestedScoreMode (which, unlike ScoreMode, has None for a filter-only nested clause). Geo's distance_type/validation_method take DistanceType/ValidationMethod, and a sort's numeric_type / Sort::script type take NumericType / ScriptSortType. minimum_should_match takes a MinimumShouldMatch (2/.into() for a count, MinimumShouldMatch::percent(75), or ::raw("3<90%") for the combining mini-language). Genuinely open-ended params (analyzer, format, time_zone, unmapped_type, flags) stay String.

.or() / .and() on a builder need use flusso_query::AsQuery; in scope (the combinators are provided methods on that trait). Composing via the Search clauses (.should()/.filter()/…) needs no import.

Bool / compound & scoring. Search::min_should_match(n) (or Query::min_should_match on an or-group) turns a top-level free-text should group into a real constraint instead of scoring-only. The scoring wrappers are free functions: constant_score(filter), dis_max([..]).tie_breaker(..), boosting(positive, negative, negative_boost), and function_score(query).weight(..)/.weight_when(.., filter)/.boost_mode(..).

Standalone query types (free functions, each an AsQuery): ids([..]) (get-many-by-_id), query_string(..) / simple_query_string(..) / combined_fields(.., [fields]) (user-facing full-text), and the relevance ones script(..), script_score(query, source), distance_feature(..), rank_feature(..), more_like_this([fields], [like]). match_bool_prefix is a Text operator (search-as-you-type).

Sort. .asc()/.desc() on a sortable handle (number, date, keyword, bool, or text — the last routing through .keyword_lowercase) return a Sort builder: chain .missing_first()/.missing_last()/.missing(v), .mode(SortMode::..), .unmapped_type(..)/.numeric_type(..)/.format(..), or .nested(path)/.nested_filtered(path, q). Also Sort::score() (by _score), Sort::script(ScriptSortType, source, order), and Geo::distance_from(center) / Geo::distance_sort(center, order, DistanceUnit) for proximity sorting. A geo radius is a typed Distance (Distance::km(12.0) / ::miles(5.0) / ::meters(800.0) / …), so a malformed "12 km" can't reach the query.

Search-level controls on the Search builder: min_score, track_total_hits, track_scores, search_after([..]) (deep pagination), collapse(field), post_filter(q), and highlight(Highlight::new().field(..)).

Queries are values; the client appears once

Type::query() takes no client: a Search<T> is a plain, Cloneable value with no lifetime. Build it anywhere — a helper, a struct field, a cached "prepared search" — and hand a &Client to the terminal (send / ids / count, all &self) when it's time to run. One built query can run many times, against different clients, or with per-call tweaks via clone():

fn busy_users() -> Search<User> {                      // no client in sight
    User::query().filter(User::order_count().gte(5))
}

let page = busy_users().send(&client).await?;          // run it
let next = busy_users().from(20).send(&client).await?; // tweak a copy

Several searches, one round-trip (_msearch)

Independent typed searches — different indexes, different document types — can share one HTTP round-trip. client.msearch(…) takes a tuple of &Search<T> (arity 1–8) and returns one typed SearchResponse per slot, in order:

let users_q  = User::query().query(User::full_name().matches(&q)).size(10);
let orders_q = Order::query().filter(Order::status().eq("open")).size(5);

let (users, orders) = client.msearch((&users_q, &orders_q)).await?;

The "search page with separate sections" primitive: each slot keeps its own query, sort, pagination, and filter_nested projections, and decodes into its own type. A slot-level failure fails the whole call with an error naming the slot (no partial results). For many searches of one type there's client.msearch_all(&searches)Vec<SearchResponse<T>>.

One blended result list (combined search)

The other multi-index shape: one query over several indexes, hits ranked together in a single list — the "one search box over everything" primitive. You declare which document types blend by writing an enum with one variant per type:

/// One item in the storefront's global search.
#[derive(Debug, FlussoMultiDocument)]          // the `derive` feature, like FlussoDocument
enum StoreItem {
    User(User),
    Order(Order),
}

let page = StoreItem::query()                  // MultiSearch<StoreItem> — client-free too
    .query(multi_match("ada", [User::full_name(), Order::customer_name()]))
    .size(20)
    .send(&client)
    .await?;

for hit in page.hits {
    match hit.source {                         // dispatched by the hit's `_index`
        StoreItem::User(u) => render_user(u, hit.score),
        StoreItem::Order(o) => render_order(o, hit.score),
    }
}

Root-scope queries compose across document types (Query<Root> carries no document type), and each hit is decoded into the variant that owns its index. The query goes through the stable {INDEX}_{HASH} alias, but OpenSearch reports the concrete index behind it ({INDEX}_{HASH}_{generation}) in each hit — so the crate normalizes that generation suffix before dispatch; you just match on hit.source. A hit from an index no variant claims is an error, not a skip. count(&client) works on the union too.

Two semantics to know:

  • A query on a field that exists in only one of the indexes is fine — it just doesn't match in the others.
  • A sort on such a field is rejected by OpenSearch unless it carries an unmapped_type. Sort the blended list by relevance, or on fields all the union's indexes share.

The derive validates the enum's shape (single-field tuple variants, no duplicate payload types) and generates the trait's two members. Without the derive feature, the impl is two short members written by hand — a TARGETS const listing each variant's (INDEX, SCHEMA_HASH) and a decode that matches on Type::physical_index().

Building a child filter and merging it into the parent

Because the scope is part of the type, a query is a value you can build, name, store, and reuse. A nested child struct (one whose path ends in a nested array) carries its own field handles tagged with the child scope — so they produce Query<Order>, not Query<Root>:

// Built from Order's own handles. Reusable — a plain function returning a query:
fn big_delivered() -> Query<Order> {
    Order::status().eq("delivered")
        .and(Order::total().gt(100.0))
}

To merge a child filter into a parent, lift it through the nesting that holds it: User::orders().any(child) (or .all(child)) takes a Query<Order> and returns a Query<Root> — a nested clause at the orders path — which composes with parent-scope queries like any other:

let q = User::email().eq("ada@example.com")
    .and(User::orders().any(big_delivered()));   // Query<Order> → lifted → Query<Root>

User::query().filter(q).send(&client).await?;

The scope tag keeps this honest: User::email().and(Order::status().eq(…)) does not compile — you can't and a Query<Root> with a Query<Order>; the child query has to be lifted through User::orders() first. A child constraint can never be silently applied at the wrong level.

Lifting composes through depth: Order::items().any(Item::quantity().gt(1)) is a Query<Order>, which User::orders().any(…) then lifts the rest of the way to Query<Root>.

Optional filters

Callers build queries from optional inputs — request params, form fields — and the if let Some(x) = … { q = q.filter(…) } dance breaks the fluent chain. The primitive that fixes it: Option<Q> is itself a Query, where None contributes nothing in any clause — must_not(None) excludes nothing, and(None) is the identity. So every clause and combinator accepts an optional; you .map the value into the handle:

User::query()
    .filter(params.email.map(|e| User::email().eq(e)))          // skipped when None
    .filter(params.min_orders.map(|n| User::order_count().gte(n)))
    .send(&client).await?;

// Composes inside and/or too — a None branch just drops out:
let q = User::email().eq("ada@example.com")
    .and(params.tier.map(|t| User::account().tier().eq(t)));    // None → just the email clause

A named filter_some(value, |v| …) sugar that drops the .map is an obvious follow-on, left out of the first cut.


Filtering nested collections

orders is a nested array, and there are two independent things you might filter — flusso keeps them separate:

  • Filter by nested — choose which users come back, based on their orders. The any/all you've already seen: it's a Query, so it goes in filter/query/etc. A matching user still carries its whole orders array.
  • Filter of nested — shape the orders array each user comes back with, without changing which users return. A separate clause, filter_nested.

They compose: use either alone, or both together (often with the same predicate).

filter_nested — shaping the returned array

let page = User::query()
    // filter BY: only users with a delivered order
    .filter(User::orders().any(Order::status().eq("delivered")))
    // filter OF: and within each, keep only the delivered orders, newest first, ≤5
    .filter_nested(
        User::orders()
            .matching(Order::status().eq("delivered"))
            .sort(Order::placed_at().desc())
            .size(5),
    )
    .send(&client).await?;

for hit in &page.hits {
    // `source.orders` IS the filtered subset — no extra accessor:
    for order in &hit.source.orders {       // delivered, newest first, ≤ 5
        println!("{}{}", order.total, order.status);
    }
}

User::orders().matching(q) is a nested projection: q is a Query<Order> built from Order's handles, plus optional .sort(Order::field().desc()), .size(n), .from(n). matching itself is optional — drop it to keep every child but still sort or cap the array.

Because filter_nested does not touch which parents match, a user with no delivered orders still comes back — with orders: []. Pair it with a filter(User::orders().any(…)) when you also want to drop those users.

filter_nested always replaces hit.source.<path> with the matched subset: the client fetches the nested matches and substitutes them for that field before deserializing.

Not yet built: a .keep_source() opt-out that leaves the stored array intact and a typed hit.nested(handle) side accessor. Today filter_nested always replaces the array in source.

Depth

filter_nested shapes one nested level — orders. You can still match on deeper nesting from inside the predicate (Order::items().any(Item::quantity().gt(1))), and the returned orders honor it; returning a filtered items array inside each returned order is a deeper inner-hits case left to the raw hatch for now.


Results

pub struct SearchResponse<T> {
    pub total: u64,            // total matches (not the page size)
    pub max_score: Option<f32>,
    pub hits: Vec<Hit<T>>,
    pub took: std::time::Duration,
}

pub struct Hit<T> {
    pub id: String,            // the document id (root primary key, stringified)
    pub score: f32,
    pub source: T,             // the fully-typed document
}

get returns Option<T>; search returns SearchResponse<T>, where T is the struct you wrote. There is no serde_json::Value in the common path.


Binding to the schema

The macro validates against the resolved mapping — flusso's IndexMapping: every field with a concrete type, whether it is nullable, its nested children, and the schema hash.

flusso's schemas are self-describing: every leaf declares its type and whether it's required, and joins/objects/aggregates have structural types — so the mapping resolves with no database, exactly as flusso build does when it writes flusso.lock. The client reuses that resolution. (See the schema guide for the schema format and the configuration guide for how the index is written.)

The one input: the index name

You never point the macro at a file. You name the index, and the macro finds the schema:

#[derive(serde::Deserialize, FlussoDocument)]
#[flusso(index = "users")]
pub struct User { /**/ }

At compile time the macro:

  1. Locates flusso.toml by walking up from the consuming crate's CARGO_MANIFEST_DIR. (Override with #[flusso(index = "users", config = "…")] or the FLUSSO_CONFIG env var — see the configuration guide.)
  2. Selects the [[index]] whose name matches "users" — the reason an index name is required, since one flusso.toml defines several.
  3. Loads that index's schema: file (resolved relative to flusso.toml) and resolves the IndexMapping in-process — the same resolution flusso build performs. Hermetic: no database, no network.
  4. Tracks flusso.toml and every schema file it read as build inputs (via include_bytes!), so editing the config or a schema retriggers compilation and a drifted struct fails the next build.

The resolved mapping's content hash is the binding's SCHEMA_HASH — the same hash flusso build writes into flusso.lock and the engine folds into the physical index name, so binding and index are provably the same schema version.

There is no build.rs, no generated .rs file to include!, and no committed mapping artifact to keep in sync — the struct is the only file you maintain.

A nested or object struct names its path

A same-row object, a to-one join, or a nested join is its own struct, validated against a dotted path into the same index — account, orders, orders.items. It declares the same index so the macro resolves the same config, then walks to that path's children:

#[flusso(index = "users", path = "orders.items")]
pub struct Item { /**/ }

These contribute field validation and their own field handles — Order::status(), Order::total(), … — producing Query<Order> values you can compose, store, and lift into a parent query (see Building a child filter). Only the root struct gets entry points (get/query) and SCHEMA_HASH.

What the derive expands to

#[derive(FlussoDocument)] on the root User emits (roughly):

impl User {
    // Entry points.
    pub fn get(client: &Client, id: i32) -> impl Future<Output = Result<Option<User>>>;
    pub fn query() -> Search<User>;   // client-free: a plain, reusable value

    // Field handles — one per *schema* field, carrying its type. These are what
    // the query builder consumes. They exist for every field in the mapping,
    // whether or not `User` projects it.
    pub fn id() -> Number<kind::Integer> { /**/ }
    pub fn email() -> Keyword { /**/ }
    pub fn full_name() -> Text { /**/ }
    pub fn account() -> Object { /**/ }                  // object/to-one join → `Object<Root>` (scope-only; `.exists()`)
    pub fn addresses() -> Nested<Root, AddressFields> { /**/ } // not projected — generated namespace
    pub fn orders() -> Nested<Root, Order> { /**/ }      // projected — `Nested<enclosing scope, your struct>`
    pub fn order_count() -> Number<kind::Long> { /**/ }
    pub fn lifetime_value() -> Number<kind::Double> { /**/ }
    pub fn avg_order_value() -> Number<kind::Double> { /**/ } // not projected by `User`
    pub fn last_order_at() -> Date { /**/ }                // not projected by `User`
    // …one per schema field.

    /// The physical index this binds to — `get`/`query` use it.
    pub const INDEX: &str = "users_3f2a1b9c…";
    /// The schema hash this binding was generated from (the `INDEX` suffix).
    pub const SCHEMA_HASH: &str = "3f2a1b9c…";
}

// Each nested path has ONE handle namespace whose functions build a `Query` in
// that path's scope. When you wrote a struct for the path, that struct IS the
// namespace — its derive adds the handles, covering the full sub-schema (not just
// the fields it deserializes), producing `Query<Order>`. A `nested` array
// introduces its own scope: `Order`'s handles are tagged `<Order>` (the root and
// flattened objects stay `<Root>`); they must be lifted before joining a root query.
impl Order {
    pub fn status() -> Keyword<Order> { /**/ }
    pub fn total() -> Number<kind::Decimal, Order> { /**/ }
    pub fn placed_at() -> Date<Order> { /**/ }
    pub fn items() -> Nested<Order, Item> { /**/ }   // deeper nested: enclosing scope `Order`, child `Item`
    // …one per field at the `orders` path.
}

// For a nested path you DIDN'T give a struct, the root derive generates a
// handles-only namespace named `<Path>Fields`, so it's still queryable:
pub struct AddressFields;
impl AddressFields {
    pub fn city() -> Keyword<AddressFields> { /**/ }       // nested scope, like `Order`
    pub fn postal_code() -> Keyword<AddressFields> { /**/ }
    // …one per field at the `addresses` path.
}

What the derive checks

For each field the struct declares, the macro resolves the matching schema field by its document key — honoring #[serde(rename = "…")] and a container #[serde(rename_all = "…")], so the struct's serde config and flusso's validation agree — then checks three things:

Check Pass Compile error
field exists the doc key is in the schema no field `totl` in index `users` (span on the field)
type matches leaf Rust type matches the field's type email is `keyword` → expected `String`, found `i32`
nullability matches Option<_> iff the field is nullable email is required → expected `String`, found `Option<String>`

The rules that make this full control rather than a straitjacket:

  • Partial projections are allowed. Leaving schema fields off your struct is fine — you only deserialize the subset you declare. Only the three checks above fail.
  • Type matching is by leaf identifier + Option shape. The macro can't resolve arbitrary type aliases, so it compares the final path segment (String, i32, f64, OffsetDateTime, …) and the Option<_> wrapper against the type table. For an object field it expects a struct, for a nested field a Vec<_>, and defers the inner field checks to that struct's own FlussoDocument derive.
  • Escape hatches. A field typed serde_json::Value opts out of type checking. #[flusso(skip)] drops a field from validation entirely — for a computed or app-only field not backed by the index (pair it with #[serde(skip)] or #[serde(default)]).

flusso types → Rust types

The type the derive expects for each schema type (the same bridge the schema guide defines, with the Rust side added). Declare something else and it won't compile (modulo the leaf-identifier rule above).

flusso type OpenSearch Rust type Field handle
text text String Text
identifier text String Text
keyword keyword String (or a FlussoValue newtype) Keyword
enum keyword String or a #[derive(FlussoValue)] enum Keyword
uuid keyword String, or uuid::Uuid (uuid feature) Keyword
boolean boolean bool Bool
short short i16 Number<kind::Short>
integer integer i32 Number<kind::Integer>
long long i64 Number<kind::Long>
float float f32 Number<kind::Float>
double double f64 Number<kind::Double>
decimal double Decimal (decimal feature) or f64 Number<kind::Decimal>
date date time::Date (feature) Date
timestamp date time::OffsetDateTime (feature) Date
binary binary String (base64) Binary
json object serde_json::Value Json
geo_point geo_point GeoPoint ({ lat, lon }) Geo
custom { opensearch } (given) matching scalar, else serde_json::Value by OS type
object object a struct Object
join belongs_to / has_one object Option< a struct > Object
join has_many / many_to_many nested Vec< a struct > Nested<S, T>

Decimals query without a cast. type: decimal maps to OpenSearch double (lossy in storage) but gets a Number<kind::Decimal> handle, distinct from a true double's Number<kind::Double> — so it accepts a Decimal or a losslessly-widening integer (eq(5)), but not an f64 (that's lossy, a compile error). To pass a rust_decimal::Decimal, enable the decimal feature on flusso-query (it re-exports Decimal and implements FlussoValue<kind::Decimal> for it); the document field can be Decimal or f64 (both deserialize from the stored double). Query precision is f64-bound (serde_json has no arbitrary precision) — fine for a double-stored column; if you need exact querying, declare a custom scaled_float (also kind::Decimal) and note this limit.

Dates are behind a feature so a caller picks time or chrono (or String for raw ISO-8601); the derive accepts whichever leaf type the chosen feature settles on.

Custom value types — #[derive(FlussoValue)]. A newtype inherits its inner type's kinds automatically, so struct Money(Decimal) is a decimal value and struct Sku(String) a keyword + text value — no kind tag needed, and each is queryable and rejected exactly where the inner type would be. FlussoValue requires serde::Serialize (a supertrait), so derive it too. An enum has no inner type, so it needs an explicit string kind — #[flusso(keyword)] (default) or #[flusso(text)]; numeric/date tags don't exist (use a newtype, which inherits). Enum keyword fields stay typed — never #[flusso(skip)] them.

// Newtype: inherits Decimal's kinds — a `decimal`/`scaled_float` value.
#[derive(Clone, Copy, serde::Serialize, serde::Deserialize, FlussoValue)]
struct Money(Decimal);

// Enum: string-valued, so a kind tag is required (keyword default).
#[derive(serde::Serialize, serde::Deserialize, FlussoValue)]
#[serde(rename_all = "camelCase")]
#[flusso(keyword)]
enum Tier { Pro, Enterprise, Free }

#[derive(serde::Deserialize, FlussoDocument)]
#[flusso(index = "customers")]
struct Customer { tier: Tier, balance: Money /**/ }

Customer::tier().eq(Tier::Pro);              // term against "pro"
Customer::balance().gte(Money(Decimal::ONE)); // a decimal value, no cast

uuid::Uuid is a keyword value behind the uuid feature — id / foreign-key fields stay in the struct as Uuid (no #[flusso(skip)], no Keyword::at("…")), and Customer::owner_id().eq(some_uuid) works without .to_string().

Nullability is declared, not guessed

A field is T or Option<T>, and the derive checks it against the resolved mapping. Nullability comes straight from the schema with no database round-trip: a leaf states it with required, and joins, objects, and aggregates carry it structurally. ResolvedField records the resulting nullable: bool; the derive requires nullable: false → T, nullable: true → Option<T>.

Field source nullable Why
root primary_key column false forced non-null — it backs the document id
join primary_key field false forced non-null, just like the root key
leaf column, required: true false declared non-null
leaf column, required: false true nullable by default
object false always assembled from the same row
join belongs_to / has_one (object) true there may be no related row
join has_many / many_to_many false a Vec, empty when there are none, never null
aggregate count false a non-null long — zero rows is 0, not null
aggregate avg true a nullable double — null over zero rows
aggregate sum / min / max true null over zero rows; the result mirrors the column

required is rejected by the schema on joins and aggregates precisely because their nullability is structural — so there's nothing for the author to declare.


The escape hatch

Anything the typed builder can't express stays reachable, and still deserializes into the typed struct:

let page: SearchResponse<User> = User::query()
    .raw(serde_json::json!({
        "knn": { "embedding": { "vector": [/**/], "k": 10 } }
    }))
    .send(&client)
    .await?;

raw takes the OpenSearch query DSL verbatim — the pressure-release valve for the few types with no flusso field (knn/vector, geo_shape, span and parent/child queries) and for percolators — without dropping to an untyped client or losing typed results. Most of what used to need it (function_score, script, constant_score, query_string, search_after, …) is now in the typed surface.


Resolving the index name

The physical index carries the hash suffix (users_3f2a1b9c) — exactly what the OpenSearch sink writes — and rotates on a structural schema change.

Because the binding is generated from the schema at compile time, the derive knows that hash and emits it as a const: User::INDEX is the physical name, and get/query use it. So User::query() addresses the right index directly, with the hash hidden from the caller — no convenience alias required.

This is self-correcting: a structural schema change rotates the hash and changes the resolved mapping, so the next cargo build regenerates the binding against the new physical index. (User::INDEX and User::SCHEMA_HASH are exposed for logging, admin, or a hand-built Search.)

The convenience alias (users → current generation) is still worthwhile for clients that don't recompile against the schema — dynamic/scripting use, dashboards. For a derived binding it's unnecessary: the compile-time hash is the stable name.

Reading a prefixed deployment

If flusso runs with an index prefix (FLUSSO_INDEX_PREFIX, e.g. dev_ so it writes dev_users_<hash>), tell the client the same prefix:

let client = Client::connect("https://localhost:9200")?
    .index_prefix(std::env::var("FLUSSO_INDEX_PREFIX").unwrap_or_default());

The prefix is applied at runtime, on the transport — the derive still bakes the unprefixed User::INDEX/SCHEMA_HASH, and the client prepends the prefix to every request path. So one compiled consumer binary serves every environment: point it at dev or staging by setting FLUSSO_INDEX_PREFIX, no rebuild. It must match the writer's prefix exactly, or queries hit an empty (or wrong) index.


Out of scope for the first cut

  • Search aggregations (facets, histograms, cardinality). The typed surface is filter/query/sort + typed hits first; aggregations need their own typed result tree, and the raw hatch covers them in the meantime.
  • Writes. flusso owns the index; the client never upserts or deletes.
  • Correlating hits across indexes. Both multi-index shapes ship — one blended result list and independent searches via _msearch — but correlating hits across indexes remains the caller's job.
  • Scroll pagination. from/size and search_after (deep pagination) ship; a scroll cursor is a follow-on.
  • Generating the document struct. By design — the developer owns the struct.

Where this lands in the workspace

Crate Role
flusso-query Runtime: the Client transport, the field-handle/Query/Search builder, SearchResponse. Generic over the developer's document types. Targets OpenSearch / Elasticsearch (shared DSL). Re-exports the derive behind a derive feature, so callers use flusso_query::FlussoDocument.
flusso-query-derive The #[derive(FlussoDocument)] proc-macro crate. At compile time it discovers flusso.toml, resolves the named index's IndexMapping from the self-describing schema (no database), validates the annotated struct, and emits the field handles, entry points, and schema hash. Reuses schema-config-toml, schema-index-yaml, and schema-core.

Both crates sit above schema-core and depend only downward — no dependency on the engine, the sources, or the sinks. They share only the domain model, the one thing the read and write sides must agree on.