# flusso-query — the typed query client
flusso keeps an OpenSearch index in sync with Postgres from a declarative schema
(the write side; see [`README.md`](https://github.com/alias2k/flusso/blob/main/README.md)). That schema is a contract: it
fixes the shape of every document — which fields exist, their types, which are
nested arrays, which are scalars. flusso enforces that contract on the **write**
side; `flusso-query` enforces it on the **read** side.
`flusso-query` does **not** generate the document types for you. You write the
document struct by hand and keep full control — its derives, field types, which
fields it projects, how doc keys map to Rust names. `#[derive(FlussoDocument)]`
then, at compile time and with no database, does two things against the resolved
schema:
1. **Validates** that every struct field lines up with the schema — the field
exists, its Rust type matches, its nullability matches. A struct that has
drifted *stops compiling*, pointing at the offending field.
2. **Generates the typed query surface** from the schema — field handles,
`get`/`query`, the schema hash — so a service queries the index like a typed
function: field names checked at compile time, operators that only exist for
the field types that support them, results that deserialize into your struct.
The query surface comes from the **full schema**, not from the struct — so you can
filter or sort on a field even if your struct doesn't deserialize it. When the
schema changes and the index is rebuilt, anything that no longer fits stops
compiling at `cargo build`.
## Quick reference
| A full worked example (structs + query) | [A query, from the caller's seat](#a-query-from-the-callers-seat) |
| Which operators each field type exposes | [What the field type buys you](#what-the-field-type-buys-you) |
| `and`/`or`/`not`, clause vs combinator style, `count`/`ids` | [Composing queries](#composing-queries) |
| Per-query options, compound/scoring, standalone, sort, search-level | [Query options and extra query types](#query-options-and-extra-query-types) |
| Nested arrays — filter *by* vs filter *of* (`any`/`all`, `filter_nested`) | [Filtering nested collections](#filtering-nested-collections) |
| Several searches, one round-trip | [`_msearch`](#several-searches-one-round-trip-_msearch) |
| One blended result list across indexes | [Combined search](#one-blended-result-list-combined-search) |
| How the macro finds the schema; `path` for nested structs | [Binding to the schema](#binding-to-the-schema) |
| flusso `type` → Rust type → handle | [flusso types → Rust types](#flusso-types--rust-types) |
| Anything the typed builder can't express | [The escape hatch](#the-escape-hatch) |
| Reading a prefixed deployment | [Resolving the index name](#resolving-the-index-name) |
---
## A query, from the caller's seat
A service that searches the `users` index from the [schema guide](https://alias2k.github.io/flusso/guides/schema-authoring.html). You write
the structs; `#[derive(FlussoDocument)]` validates them and generates the query
surface. The only input is the **index name** — it finds `flusso.toml` itself (see
[Binding to the schema](#binding-to-the-schema)).
### The document structs
```rust
use flusso_query::{Client, FlussoDocument};
/// A `users` document — *you* write this. It's a **projection**: it deserializes
/// the fields below and omits the rest of the index (addresses, profile,
/// avgOrderValue, …), which the derive allows. The derive checks every field
/// against the `users` schema and hangs the typed query surface off `User`.
#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users")] // ← the only input: which index
pub struct User {
pub id: i32, // primary key (integer) — never null
pub email: String, // keyword, required → never null
#[serde(rename = "fullName")]
pub full_name: Option<String>, // text, not required → nullable
pub account: Account, // an object — always assembled
pub orders: Vec<Order>, // has_many join → nested, never null
#[serde(rename = "orderCount")]
pub order_count: i64, // count aggregate → long, never null
#[serde(rename = "lifetimeValue")]
pub lifetime_value: Option<f64>, // sum aggregate → nullable
}
/// The `account` object — a same-row sub-object. A nested/object struct validates
/// against its `path` in the same index; it has no entry points of its own.
#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users", path = "account")]
pub struct Account {
pub tier: String, // enum → keyword, required
pub country: Option<String>, // keyword, not required
#[serde(rename = "createdAt")]
pub created_at: time::OffsetDateTime, // timestamp, required
}
#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users", path = "orders")]
pub struct Order {
pub status: String, // enum, required
pub total: Decimal, // decimal column (`decimal` feature); or `f64`
#[serde(rename = "placedAt")]
pub placed_at: time::OffsetDateTime, // timestamp, required
pub items: Vec<Item>, // a deeper has_many → nested
}
#[derive(Debug, Clone, serde::Deserialize, FlussoDocument)]
#[flusso(index = "users", path = "orders.items")]
pub struct Item {
#[serde(rename = "productId")]
pub product_id: i32,
pub quantity: i32,
#[serde(rename = "unitPrice")]
pub unit_price: Decimal,
}
```
### The query
```rust
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Transport. Points at OpenSearch, not at flusso — the engine is write-only;
// reads go straight to the index it maintains.
let client = Client::connect("https://localhost:9200")?
.basic_auth("admin", std::env::var("OS_PASSWORD")?);
// Fetch one document by its id. `id` is typed as the root table's primary
// key (i32 here), and the result is Option<User> — None when absent.
let user: Option<User> = User::get(&client, 42).await?;
// A typed search. Every `User::field()` is a handle that only exposes the
// operators its mapping supports (see "What the field type buys you"). The
// handles cover the *whole schema* — so `User::addresses()` filters fine even
// though this projection never deserializes addresses.
let page = User::query()
.filter(User::email().eq("ada@example.com")) // keyword → exact
.filter(User::order_count().gte(5)) // long → range
.filter(User::account().tier().eq("gold")) // into the object's handles
.query(User::full_name().matches("ada lovelace")) // text → analyzed match
.filter(User::orders().any(Order::status().eq("delivered"))) // nested, via your Order struct
.filter(User::addresses().any(AddressFields::city().eq("Boston"))) // not projected — generated namespace
.sort(User::order_count().desc())
.from(0)
.size(20)
.send(&client)
.await?;
println!("{} total matches", page.total);
for hit in page.hits {
// hit.source is a fully-typed User. hit.id and hit.score come from the
// search envelope, not the document body.
let u: &User = &hit.source;
println!("{:.3} {} ({} orders)", hit.score, u.email, u.order_count);
for order in &u.orders { // Vec<Order>
println!(" order — {} — {}", order.total, order.status);
}
}
Ok(())
}
```
### What you *can't* write
The compiler refuses the mistakes:
- `User::email().matches(..)` — `matches` is a text operator and `email` is a
`keyword`.
- `User::full_name().eq(..)` — `full_name` is analyzed `text`, so it has no exact
`eq`.
- `User::nmae()` — there is no such handle.
- `hit.source.totl` — no such field.
And the struct itself can't drift. Declaring `email: i32`, or `email:
Option<String>` when the field is `required`, or a field `totl` the schema doesn't
have, is a **compile error from the derive**. The schema has been lifted into the
type system, so the compiler checks both your queries and your struct against it.
> `OS_PASSWORD` is just a variable name picked for the example. The full env-var
> story (secrets, the reserved deployment overrides, `FLUSSO_CONFIG`) lives in the
> [configuration guide](https://alias2k.github.io/flusso/guides/configuration.html).
---
## What the field type buys you
The handle a schema field generates determines the operators in scope: an operator
that doesn't make sense for a field's type *doesn't exist* on its handle, so the
mistake is a compile error rather than a 400 from OpenSearch.
| `Keyword` | `eq`, `any_of`, `prefix`, `wildcard`, `regexp`, `fuzzy`, `exists` |
| `Text` | `matches`, `match_phrase`, `match_phrase_prefix`, `matches_fuzzy`, `any_of` (exact, via `.keyword`), `exists` — *no* exact `eq` (it's analyzed) |
| `Bool` | `eq`, `exists`, `asc`/`desc` |
| `Number` | `eq`, `any_of`, `lt`, `lte`, `gt`, `gte`, `between`, `exists` |
| `Date` | `eq`, `any_of`, `lt`, `lte`, `gt`, `gte`, `between`, `exists` |
| `Object<S>` | `exists` (a same-document sub-object — an `object` field or a to-one join (`belongs_to`/`has_one`); `S` is the enclosing scope). Its sub-fields are *flattened*, so query them via the child struct's dotted-path handles (`Account::tier()`), not through this handle |
| `Nested<S, T>` | `any(q)` / `all(q)` to match parents and **lift** the child query into scope `S` — `q` is a child query built from `T`'s handles ([merging](#building-a-child-filter-and-merging-it-into-the-parent)); `matching(q)` (with `.sort`/`.size`) to shape what's returned — see [Filtering nested collections](#filtering-nested-collections); plus `exists` |
| `Geo` | `within(Distance::km(12.0), center)`, `within_box`, `within_polygon`, `exists`; sort with `distance_from(center)` (nearest-first sugar) or `distance_sort(center, order, DistanceUnit)` |
| `TextMap` | `key(k)` → a `Text` leaf for one key; `search(q)` (cross-key, `.prefer(key, weight)` / `.only_preferred()`); `has_key(k)`, `exists` |
| `KeywordMap` | `key(k)` → a `Keyword` leaf for one key; `has_key(k)`, `exists` — *no* `search` (exact-match, use `key(..).eq(..)`) |
| `NumberMap` | `key(k)` → a `Number` leaf for one key (`eq`/`lt`/`gte`/`between`/…); `has_key(k)`, `exists` |
| `DateMap` | `key(k)` → a `Date` leaf for one key (`eq`/`gte`/`between`/…); `has_key(k)`, `exists` |
| `Binary` | `exists` — base64-encoded, not searchable |
| `Json` | `exists`, `raw(serde_json::Value)` — the untyped fallback |
Each operator's argument is typed by **kind**, not by one fixed Rust type — a
handle accepts any value implementing `FlussoValue<kind::…>` (which requires
`serde::Serialize`):
- **Numerics are split per type** (`kind::Byte`/`Short`/`Integer`/`Long`/`Float`/
`Double`/`Decimal`), and a value is accepted only if it widens into that kind
**without loss**. So a `Long` field takes any integer, a `Double` takes `f32`/
`f64` or a small int, a `Decimal` takes `Decimal` (`decimal` feature) or an
integer — but a float on an integer field, or an `i64` on a `Short`, is a
*compile error*. Bare literals work where lossless: `eq(5)` on `long`/`integer`/
`double`/`decimal`, `eq(5.0)` on `float`/`double`. A `byte`/`short` field needs
a typed literal (`eq(5i16)`), since `5` defaults to `i32` (which would narrow).
- `Keyword` takes a `String`/`&str` or a `#[derive(FlussoValue)]` keyword
enum/newtype (matched against its serde string form).
- `Bool` takes a `bool` or a bool `#[derive(FlussoValue)]` newtype.
- `Date` takes a `String`/`&str` ISO-8601 literal or — with the `chrono` feature
— a `NaiveDate` / `NaiveDateTime` / `DateTime<Utc>`.
A custom money/quantity type queries with no cast — `Order::total().eq(Money(d))`
— as long as it's a `FlussoValue` of the field's kind (see below). A value of the
wrong kind is a compile error (`Order::created_at().gte(5)`, `Order::age().eq(1.5)`).
**Subfield accessors.** flusso's sink auto-enriches `text`/`keyword` fields
(`auto_subfields`, on by default) with exact / sortable / searchable subfields,
and the handles expose them — no `Keyword::at("code.keyword")` string path:
```rust
User::full_name() // Text → analyzed full-text match
User::full_name().keyword() // Keyword → exact / wildcard / prefix
User::full_name().keyword_lowercase() // Keyword → case-insensitive match / sort
User::email().text() // Text → full-text over a keyword field
```
(A `wildcard` belongs on `.keyword()`, not the analyzed handle, which matches
tokens not the whole value.) These accessors exist only when the subfield is
actually provisioned — and the derive **enforces it at compile time**: a
`text`/`keyword` handle is stamped with subfields only when every OpenSearch
sink has `auto_subfields` on and the field declares no custom `fields`.
Otherwise the handle is `…<NoSubfields>` and `.keyword()` / `.text()` /
`.keyword_lowercase()` (and the `Text::any_of` / `Text::asc` sugar built on
them) simply don't exist — calling one is a compile error, not a runtime 400.
Sorting is similar — `sort(…)` accepts handles whose type is sortable (numbers,
dates, keywords, booleans, and `text`). `Text::asc`/`desc` is sugar that sorts
via the case-insensitive `.keyword_lowercase` subfield automatically; reach for
`User::full_name().keyword().desc()` when you want an exact-case sort instead.
A `geo_point` sorts by proximity with `User::location().distance_from(center)`.
A few clauses span more than one field, so they're free functions:
`multi_match("ada", [User::full_name(), User::bio()])` runs one analyzed query
across several `Text` fields (weight one with `User::full_name().boosted(3.0)`).
The typed surface is broad — see [Query options and extra query
types](#query-options-and-extra-query-types) for the per-query options
(`boost`/`case_insensitive`/`fuzziness`/…), the compound/scoring queries
(`constant_score`/`dis_max`/`function_score`/`boosting`), the standalone ones
(`ids`/`query_string`/`simple_query_string`/`script_score`/…), and the
search-level controls (`min_score`/`collapse`/`search_after`/`highlight`/…).
What's left for the [`raw`](#the-escape-hatch) hatch is `knn`, `geo_shape`, span
and parent/child queries — types with no corresponding flusso field.
### Composing queries
A handle's operator produces a `Query<S>` — carrying the **scope** it was built
in. The root document and any flattened `object`/to-one-join sub-field share the
scope `Root`; only a `nested` array introduces a fresh scope, tagged with its
element type (`Query<Order>`).
Queries compose with `and` / `or` / `not` *within a scope*, and the `Search`
builder exposes the bool-query clauses directly:
```rust
// Combinator style.
let q = User::email().eq("ada@example.com")
.and(User::order_count().gte(5))
.and(User::orders().any(Order::status().eq("delivered").and(Order::total().gt(0.0))));
User::query().query(q).send(&client).await?;
// Clause style — `filter`/`must_not` don't score, `query`(=must)/`should` do.
User::query()
.query(User::full_name().matches("ada")) // scored
.filter(User::order_count().gte(5)) // filtered, cached, no score
.must_not(User::email().prefix("test-"))
.should(User::orders().any(Order::status().eq("delivered")))
.send(&client)
.await?;
```
`query`/`filter`/`must_not`/`should` accept anything that is a `Query<Root>`, so
the two styles mix freely. Which clause to reach for:
| `query` | yes | `must` | full-text relevance you want ranked |
| `filter` | no (cached) | `filter` | exact constraints — terms, ranges, nested `any` |
| `must_not` | no | `must_not` | exclusions |
| `should` | yes | `should` | optional boosts; a real constraint with `min_should_match` |
A built search can finish as a **count** instead of a page: `.count()` sends the
same query to `_count` and returns the number of matching documents (`u64`) —
cheaper than `send()` when the hits aren't needed. Sort, `from`/`size`, and
`filter_nested` projections are ignored; they never change which documents match.
```rust
let open_orders: u64 = User::query()
.filter(User::orders().any(Order::status().eq("open")))
.count(&client)
.await?;
```
Or as an **id page**: `.ids()` runs the same search with `_source: false` and
returns `Vec<String>` — the matching document ids (root primary keys,
stringified), in order, no sources fetched. Sort and `from`/`size` apply as in
`send()`; `filter_nested` projections are dropped. The cheap way to feed another
lookup — e.g. search in OpenSearch, then load the full rows from Postgres:
```rust
let user_ids: Vec<String> = User::query()
.filter(User::orders().any(Order::status().eq("open")))
.sort(User::order_count().desc())
.size(100)
.ids(&client)
.await?;
```
### Query options and extra query types
Each leaf operator returns a small **builder** that carries that query's options
plus the universal `boost(f32)` and `name(&str)` (`_name`, surfaced in a hit's
`matched_queries`). With no option set it renders the DSL shorthand; set one and
it expands. A builder drops straight into a clause (it's an `AsQuery`), so no
`.build()` is needed:
```rust
User::query()
.should(User::full_name().matches("acme").boost(2.0)) // weighted text
.should(User::full_name().keyword().wildcard("*acme*").case_insensitive())
.should(User::full_name().matches("acme").fuzziness(Fuzziness::Auto)) // typo-tolerant
.min_should_match(1) // make should a real filter
.filter(User::owner_id().eq(owner_uuid)) // uuid keyword (feature)
.filter(User::tier().eq(Tier::Pro)) // enum keyword
.sort(User::created_at().desc().missing_first()) // null-aware sort
.send(&client).await?;
```
The options per query type (all optional; plus the universal `boost`/`name` on every builder):
| `term` / `prefix` / `wildcard` / `regexp` | `case_insensitive` |
| `prefix` / `wildcard` | `rewrite` |
| `regexp` | `flags`, `max_determinized_states` |
| `fuzzy` | `fuzziness`, `prefix_length`, `max_expansions`, `transpositions` |
| `matches` | `fuzziness`, `operator`, `minimum_should_match`, `prefix_length`, `analyzer`, `zero_terms_query`, `lenient` |
| phrase matches | `slop`, `analyzer` |
| `multi_match` | `type`, `operator`, `fuzziness`, `tie_breaker`, `minimum_should_match` |
| range | `format`, `time_zone`, `relation` |
| `within` (geo) | `distance_type`, `validation_method` |
| nested `any` | `score_mode`, `ignore_unmapped` |
The enumerable options are **closed enums**, not strings — a typo is a compile
error, not a 400. `operator`/`default_operator` take `Operator { And, Or }`;
`fuzziness` takes `Fuzziness { Auto, AutoBounds(u32, u32), Edits(u32) }`;
`multi_match`'s `type` takes `MultiMatchType`; `zero_terms_query` takes
`ZeroTermsQuery { None, All }`; a range `relation` takes `RangeRelation
{ Intersects, Contains, Within }`; `function_score`'s `score_mode`/`boost_mode`
take `ScoreMode`/`BoostMode`; a nested `score_mode` takes `NestedScoreMode`
(which, unlike `ScoreMode`, has `None` for a filter-only nested clause). Geo's
`distance_type`/`validation_method` take `DistanceType`/`ValidationMethod`, and
a sort's `numeric_type` / `Sort::script` type take `NumericType` /
`ScriptSortType`. `minimum_should_match` takes a `MinimumShouldMatch`
(`2`/`.into()` for a count, `MinimumShouldMatch::percent(75)`, or `::raw("3<90%")`
for the combining mini-language). Genuinely open-ended params (`analyzer`,
`format`, `time_zone`, `unmapped_type`, `flags`) stay `String`.
> **`.or()` / `.and()` on a builder** need `use flusso_query::AsQuery;` in scope
> (the combinators are provided methods on that trait). Composing via the
> `Search` clauses (`.should()`/`.filter()`/…) needs no import.
**Bool / compound & scoring.** `Search::min_should_match(n)` (or
`Query::min_should_match` on an `or`-group) turns a top-level free-text `should`
group into a real constraint instead of scoring-only. The scoring wrappers are
free functions: `constant_score(filter)`, `dis_max([..]).tie_breaker(..)`,
`boosting(positive, negative, negative_boost)`, and
`function_score(query).weight(..)/.weight_when(.., filter)/.boost_mode(..)`.
**Standalone query types** (free functions, each an `AsQuery`): `ids([..])`
(get-many-by-`_id`), `query_string(..)` / `simple_query_string(..)` /
`combined_fields(.., [fields])` (user-facing full-text), and the relevance ones
`script(..)`, `script_score(query, source)`, `distance_feature(..)`,
`rank_feature(..)`, `more_like_this([fields], [like])`. `match_bool_prefix` is a
`Text` operator (search-as-you-type).
**Sort.** `.asc()`/`.desc()` on a sortable handle (number, date, keyword, bool,
or `text` — the last routing through `.keyword_lowercase`) return a `Sort`
builder: chain `.missing_first()`/`.missing_last()`/`.missing(v)`,
`.mode(SortMode::..)`, `.unmapped_type(..)`/`.numeric_type(..)`/`.format(..)`, or
`.nested(path)`/`.nested_filtered(path, q)`. Also `Sort::score()` (by `_score`),
`Sort::script(ScriptSortType, source, order)`, and `Geo::distance_from(center)`
/ `Geo::distance_sort(center, order, DistanceUnit)` for proximity sorting. A
geo radius is a typed `Distance` (`Distance::km(12.0)` / `::miles(5.0)` /
`::meters(800.0)` / …), so a malformed `"12 km"` can't reach the query.
**Search-level** controls on the `Search` builder: `min_score`,
`track_total_hits`, `track_scores`, `search_after([..])` (deep pagination),
`collapse(field)`, `post_filter(q)`, and `highlight(Highlight::new().field(..))`.
### Queries are values; the client appears once
`Type::query()` takes no client: a `Search<T>` is a plain, `Clone`able value with
no lifetime. Build it anywhere — a helper, a struct field, a cached "prepared
search" — and hand a `&Client` to the terminal (`send` / `ids` / `count`, all
`&self`) when it's time to run. One built query can run many times, against
different clients, or with per-call tweaks via `clone()`:
```rust
fn busy_users() -> Search<User> { // no client in sight
User::query().filter(User::order_count().gte(5))
}
let page = busy_users().send(&client).await?; // run it
let next = busy_users().from(20).send(&client).await?; // tweak a copy
```
### Several searches, one round-trip (`_msearch`)
Independent typed searches — different indexes, different document types — can
share one HTTP round-trip. `client.msearch(…)` takes a tuple of `&Search<T>`
(arity 1–8) and returns one typed `SearchResponse` per slot, in order:
```rust
let users_q = User::query().query(User::full_name().matches(&q)).size(10);
let orders_q = Order::query().filter(Order::status().eq("open")).size(5);
let (users, orders) = client.msearch((&users_q, &orders_q)).await?;
```
The "search page with separate sections" primitive: each slot keeps its own query,
sort, pagination, and `filter_nested` projections, and decodes into its own type.
A slot-level failure fails the whole call with an error naming the slot (no partial
results). For many searches of *one* type there's
`client.msearch_all(&searches)` → `Vec<SearchResponse<T>>`.
### One blended result list (combined search)
The other multi-index shape: **one** query over several indexes, hits ranked
together in a single list — the "one search box over everything" primitive. You
declare which document types blend by writing an enum with one variant per type:
```rust
/// One item in the storefront's global search.
#[derive(Debug, FlussoMultiDocument)] // the `derive` feature, like FlussoDocument
enum StoreItem {
User(User),
Order(Order),
}
let page = StoreItem::query() // MultiSearch<StoreItem> — client-free too
.query(multi_match("ada", [User::full_name(), Order::customer_name()]))
.size(20)
.send(&client)
.await?;
for hit in page.hits {
match hit.source { // dispatched by the hit's `_index`
StoreItem::User(u) => render_user(u, hit.score),
StoreItem::Order(o) => render_order(o, hit.score),
}
}
```
Root-scope queries compose across document types (`Query<Root>` carries no
document type), and each hit is decoded into the variant that owns its index.
The query goes through the stable `{INDEX}_{HASH}` alias, but OpenSearch reports
the concrete index behind it (`{INDEX}_{HASH}_{generation}`) in each hit — so the
crate normalizes that generation suffix before dispatch; you just match on
`hit.source`. A hit from an index no variant claims is an error, not a skip.
`count(&client)` works on the union too.
Two semantics to know:
- A *query* on a field that exists in only one of the indexes is fine — it just
doesn't match in the others.
- A *sort* on such a field is **rejected by OpenSearch** unless it carries an
`unmapped_type`. Sort the blended list by relevance, or on fields all the
union's indexes share.
The derive validates the enum's shape (single-field tuple variants, no duplicate
payload types) and generates the trait's two members. Without the `derive`
feature, the impl is two short members written by hand — a `TARGETS` const listing
each variant's `(INDEX, SCHEMA_HASH)` and a `decode` that matches on
`Type::physical_index()`.
### Building a child filter and merging it into the parent
Because the scope is part of the type, a query is a value you can build, name,
store, and reuse. A **nested** child struct (one whose `path` ends in a `nested`
array) carries its own field handles tagged with the child scope — so they produce
`Query<Order>`, not `Query<Root>`:
```rust
// Built from Order's own handles. Reusable — a plain function returning a query:
fn big_delivered() -> Query<Order> {
Order::status().eq("delivered")
.and(Order::total().gt(100.0))
}
```
To merge a child filter into a parent, **lift** it through the nesting that holds
it: `User::orders().any(child)` (or `.all(child)`) takes a `Query<Order>` and
returns a `Query<Root>` — a nested clause at the `orders` path — which composes
with parent-scope queries like any other:
```rust
let q = User::email().eq("ada@example.com")
.and(User::orders().any(big_delivered())); // Query<Order> → lifted → Query<Root>
User::query().filter(q).send(&client).await?;
```
The scope tag keeps this honest: `User::email().and(Order::status().eq(…))` **does
not compile** — you can't `and` a `Query<Root>` with a `Query<Order>`; the child
query has to be lifted through `User::orders()` first. A child constraint can never
be silently applied at the wrong level.
Lifting composes through depth: `Order::items().any(Item::quantity().gt(1))` is a
`Query<Order>`, which `User::orders().any(…)` then lifts the rest of the way to
`Query<Root>`.
### Optional filters
Callers build queries from optional inputs — request params, form fields — and the
`if let Some(x) = … { q = q.filter(…) }` dance breaks the fluent chain. The
primitive that fixes it: **`Option<Q>` is itself a `Query`**, where `None`
contributes nothing in any clause — `must_not(None)` excludes nothing, `and(None)`
is the identity. So every clause and combinator accepts an optional; you `.map` the
value into the handle:
```rust
User::query()
.filter(params.email.map(|e| User::email().eq(e))) // skipped when None
.filter(params.min_orders.map(|n| User::order_count().gte(n)))
.send(&client).await?;
// Composes inside and/or too — a None branch just drops out:
let q = User::email().eq("ada@example.com")
.and(params.tier.map(|t| User::account().tier().eq(t))); // None → just the email clause
```
A named `filter_some(value, |v| …)` sugar that drops the `.map` is an obvious
follow-on, left out of the first cut.
---
## Filtering nested collections
`orders` is a nested array, and there are two **independent** things you might
filter — flusso keeps them separate:
- **Filter *by* nested** — choose which *users* come back, based on their orders.
The `any`/`all` you've already seen: it's a `Query`, so it goes in
`filter`/`query`/etc. A matching user still carries its **whole** `orders` array.
- **Filter *of* nested** — shape the `orders` array each user comes back with,
without changing which users return. A separate clause, `filter_nested`.
They compose: use either alone, or both together (often with the same predicate).
### `filter_nested` — shaping the returned array
```rust
let page = User::query()
// filter BY: only users with a delivered order
.filter(User::orders().any(Order::status().eq("delivered")))
// filter OF: and within each, keep only the delivered orders, newest first, ≤5
.filter_nested(
User::orders()
.matching(Order::status().eq("delivered"))
.sort(Order::placed_at().desc())
.size(5),
)
.send(&client).await?;
for hit in &page.hits {
// `source.orders` IS the filtered subset — no extra accessor:
for order in &hit.source.orders { // delivered, newest first, ≤ 5
println!("{} — {}", order.total, order.status);
}
}
```
`User::orders().matching(q)` is a nested **projection**: `q` is a `Query<Order>`
built from `Order`'s handles, plus optional `.sort(Order::field().desc())`,
`.size(n)`, `.from(n)`. `matching` itself is optional — drop it to keep every child
but still sort or cap the array.
Because `filter_nested` does **not** touch which parents match, a user with no
delivered orders still comes back — with `orders: []`. Pair it with a
`filter(User::orders().any(…))` when you also want to drop those users.
`filter_nested` always **replaces** `hit.source.<path>` with the matched subset:
the client fetches the nested matches and substitutes them for that field before
deserializing.
> **Not yet built:** a `.keep_source()` opt-out that leaves the stored array intact
> and a typed `hit.nested(handle)` side accessor. Today `filter_nested` always
> replaces the array in `source`.
### Depth
`filter_nested` shapes one nested level — `orders`. You can still *match* on deeper
nesting from inside the predicate
(`Order::items().any(Item::quantity().gt(1))`), and the returned orders honor it;
returning a filtered `items` array *inside* each returned order is a deeper
inner-hits case left to the `raw` hatch for now.
---
## Results
```rust
pub struct SearchResponse<T> {
pub total: u64, // total matches (not the page size)
pub max_score: Option<f32>,
pub hits: Vec<Hit<T>>,
pub took: std::time::Duration,
}
pub struct Hit<T> {
pub id: String, // the document id (root primary key, stringified)
pub score: f32,
pub source: T, // the fully-typed document
}
```
`get` returns `Option<T>`; `search` returns `SearchResponse<T>`, where `T` is the
struct you wrote. There is no `serde_json::Value` in the common path.
---
## Binding to the schema
The macro validates against the **resolved mapping** — flusso's
[`IndexMapping`](https://github.com/alias2k/flusso/blob/main/libs/0-core/src/config/index_mapping.rs): every field with a
concrete type, whether it is **nullable**, its nested `children`, and the schema
`hash`.
flusso's schemas are **self-describing**: every leaf declares its `type` and
whether it's `required`, and joins/objects/aggregates have structural types — so the
mapping resolves with **no database**, exactly as `flusso build` does when it writes
`flusso.lock`. The client reuses that resolution. (See the
[schema guide](https://alias2k.github.io/flusso/guides/schema-authoring.html) for the
schema format and the [configuration guide](https://alias2k.github.io/flusso/guides/configuration.html)
for how the index is written.)
### The one input: the index name
You never point the macro at a file. You name the **index**, and the macro finds
the schema:
```rust
#[derive(serde::Deserialize, FlussoDocument)]
#[flusso(index = "users")]
pub struct User { /* … */ }
```
At compile time the macro:
1. **Locates `flusso.toml`** by walking up from the consuming crate's
`CARGO_MANIFEST_DIR`. (Override with `#[flusso(index = "users", config = "…")]`
or the `FLUSSO_CONFIG` env var — see the
[configuration guide](https://alias2k.github.io/flusso/guides/configuration.html).)
2. **Selects the `[[index]]`** whose `name` matches `"users"` — the reason an index
name is required, since one `flusso.toml` defines several.
3. **Loads that index's `schema:` file** (resolved relative to `flusso.toml`) and
**resolves the `IndexMapping`** in-process — the same resolution `flusso build`
performs. Hermetic: no database, no network.
4. **Tracks `flusso.toml` and every schema file it read** as build inputs (via
`include_bytes!`), so editing the config or a schema retriggers compilation and
a drifted struct fails the next build.
The resolved mapping's content hash is the binding's `SCHEMA_HASH` — the **same
hash** `flusso build` writes into `flusso.lock` and the engine folds into the
physical index name, so binding and index are provably the same schema version.
There is no `build.rs`, no generated `.rs` file to `include!`, and no committed
mapping artifact to keep in sync — the struct is the only file you maintain.
### A nested or object struct names its path
A same-row `object`, a to-one join, or a `nested` join is its own struct, validated
against a dotted **`path`** into the same index — `account`, `orders`,
`orders.items`. It declares the same `index` so the macro resolves the same config,
then walks to that path's `children`:
```rust
#[flusso(index = "users", path = "orders.items")]
pub struct Item { /* … */ }
```
These contribute field validation **and** their own field handles —
`Order::status()`, `Order::total()`, … — producing `Query<Order>` values you can
compose, store, and lift into a parent query (see [Building a child
filter](#building-a-child-filter-and-merging-it-into-the-parent)). Only the
**root** struct gets entry points (`get`/`query`) and `SCHEMA_HASH`.
### What the derive expands to
`#[derive(FlussoDocument)]` on the root `User` emits (roughly):
```rust
impl User {
// Entry points.
pub fn get(client: &Client, id: i32) -> impl Future<Output = Result<Option<User>>>;
pub fn query() -> Search<User>; // client-free: a plain, reusable value
// Field handles — one per *schema* field, carrying its type. These are what
// the query builder consumes. They exist for every field in the mapping,
// whether or not `User` projects it.
pub fn id() -> Number<kind::Integer> { /* … */ }
pub fn email() -> Keyword { /* … */ }
pub fn full_name() -> Text { /* … */ }
pub fn account() -> Object { /* … */ } // object/to-one join → `Object<Root>` (scope-only; `.exists()`)
pub fn addresses() -> Nested<Root, AddressFields> { /* … */ } // not projected — generated namespace
pub fn orders() -> Nested<Root, Order> { /* … */ } // projected — `Nested<enclosing scope, your struct>`
pub fn order_count() -> Number<kind::Long> { /* … */ }
pub fn lifetime_value() -> Number<kind::Double> { /* … */ }
pub fn avg_order_value() -> Number<kind::Double> { /* … */ } // not projected by `User`
pub fn last_order_at() -> Date { /* … */ } // not projected by `User`
// …one per schema field.
/// The physical index this binds to — `get`/`query` use it.
pub const INDEX: &str = "users_3f2a1b9c…";
/// The schema hash this binding was generated from (the `INDEX` suffix).
pub const SCHEMA_HASH: &str = "3f2a1b9c…";
}
// Each nested path has ONE handle namespace whose functions build a `Query` in
// that path's scope. When you wrote a struct for the path, that struct IS the
// namespace — its derive adds the handles, covering the full sub-schema (not just
// the fields it deserializes), producing `Query<Order>`. A `nested` array
// introduces its own scope: `Order`'s handles are tagged `<Order>` (the root and
// flattened objects stay `<Root>`); they must be lifted before joining a root query.
impl Order {
pub fn status() -> Keyword<Order> { /* … */ }
pub fn total() -> Number<kind::Decimal, Order> { /* … */ }
pub fn placed_at() -> Date<Order> { /* … */ }
pub fn items() -> Nested<Order, Item> { /* … */ } // deeper nested: enclosing scope `Order`, child `Item`
// …one per field at the `orders` path.
}
// For a nested path you DIDN'T give a struct, the root derive generates a
// handles-only namespace named `<Path>Fields`, so it's still queryable:
pub struct AddressFields;
impl AddressFields {
pub fn city() -> Keyword<AddressFields> { /* … */ } // nested scope, like `Order`
pub fn postal_code() -> Keyword<AddressFields> { /* … */ }
// …one per field at the `addresses` path.
}
```
### What the derive checks
For each field the struct declares, the macro resolves the matching schema field by
its **document key** — honoring `#[serde(rename = "…")]` and a container
`#[serde(rename_all = "…")]`, so the struct's serde config and flusso's validation
agree — then checks three things:
| **field exists** | the doc key is in the schema | `` no field `totl` in index `users` `` (span on the field) |
| **type matches** | leaf Rust type matches the field's `type` | `` email is `keyword` → expected `String`, found `i32` `` |
| **nullability matches** | `Option<_>` iff the field is nullable | `` email is required → expected `String`, found `Option<String>` `` |
The rules that make this **full control rather than a straitjacket**:
- **Partial projections are allowed.** Leaving schema fields off your struct is
fine — you only deserialize the subset you declare. Only the three checks above
fail.
- **Type matching is by leaf identifier + `Option` shape.** The macro can't resolve
arbitrary type aliases, so it compares the final path segment (`String`, `i32`,
`f64`, `OffsetDateTime`, …) and the `Option<_>` wrapper against the [type
table](#flusso-types--rust-types). For an `object` field it expects a struct,
for a `nested` field a `Vec<_>`, and defers the inner field checks to *that*
struct's own `FlussoDocument` derive.
- **Escape hatches.** A field typed `serde_json::Value` opts out of type checking.
`#[flusso(skip)]` drops a field from validation entirely — for a computed or
app-only field not backed by the index (pair it with `#[serde(skip)]` or
`#[serde(default)]`).
### flusso types → Rust types
The type the derive **expects** for each schema `type` (the same bridge the
[schema guide](https://alias2k.github.io/flusso/guides/schema-authoring.html) defines, with the Rust side added). Declare something
else and it won't compile (modulo the leaf-identifier rule above).
| `text` | `text` | `String` | `Text` |
| `identifier` | `text` | `String` | `Text` |
| `keyword` | `keyword` | `String` (or a `FlussoValue` newtype) | `Keyword` |
| `enum` | `keyword` | `String` or a `#[derive(FlussoValue)]` enum | `Keyword` |
| `uuid` | `keyword` | `String`, or `uuid::Uuid` (`uuid` feature) | `Keyword` |
| `boolean` | `boolean` | `bool` | `Bool` |
| `short` | `short` | `i16` | `Number<kind::Short>` |
| `integer` | `integer` | `i32` | `Number<kind::Integer>` |
| `long` | `long` | `i64` | `Number<kind::Long>` |
| `float` | `float` | `f32` | `Number<kind::Float>` |
| `double` | `double` | `f64` | `Number<kind::Double>` |
| `decimal` | `double` | `Decimal` (`decimal` feature) or `f64` | `Number<kind::Decimal>` |
| `date` | `date` | `time::Date` (feature) | `Date` |
| `timestamp` | `date` | `time::OffsetDateTime` (feature) | `Date` |
| `binary` | `binary` | `String` (base64) | `Binary` |
| `json` | `object` | `serde_json::Value` | `Json` |
| `geo_point` | `geo_point`| `GeoPoint` (`{ lat, lon }`) | `Geo` |
| `custom { opensearch }` | (given) | matching scalar, else `serde_json::Value` | by OS type |
| `object` | `object` | a struct | `Object` |
| join `belongs_to` / `has_one` | `object` | `Option<` a struct `>` | `Object` |
| join `has_many` / `many_to_many` | `nested` | `Vec<` a struct `>` | `Nested<S, T>` |
**Decimals query without a cast.** `type: decimal` maps to OpenSearch `double`
(lossy in storage) but gets a `Number<kind::Decimal>` handle, distinct from a
true `double`'s `Number<kind::Double>` — so it accepts a `Decimal` or a
losslessly-widening integer (`eq(5)`), but *not* an `f64` (that's lossy, a
compile error). To pass a `rust_decimal::Decimal`, **enable the `decimal`
feature** on `flusso-query` (it re-exports `Decimal` and implements
`FlussoValue<kind::Decimal>` for it); the document field can be `Decimal` or
`f64` (both deserialize from the stored double). Query precision is `f64`-bound
(serde_json has no arbitrary precision) — fine for a `double`-stored column; if
you need exact querying, declare a `custom` `scaled_float` (also `kind::Decimal`)
and note this limit.
**Dates** are behind a feature so a caller picks `time` or `chrono` (or `String`
for raw ISO-8601); the derive accepts whichever leaf type the chosen feature settles
on.
**Custom value types — `#[derive(FlussoValue)]`.** A newtype **inherits its inner
type's kinds** automatically, so `struct Money(Decimal)` is a `decimal` value and
`struct Sku(String)` a keyword + text value — *no kind tag needed*, and each is
queryable and rejected exactly where the inner type would be. `FlussoValue`
requires `serde::Serialize` (a supertrait), so derive it too. An **enum** has no
inner type, so it needs an explicit string kind — `#[flusso(keyword)]` (default)
or `#[flusso(text)]`; numeric/date tags don't exist (use a newtype, which
inherits). Enum keyword fields stay typed — never `#[flusso(skip)]` them.
```rust
// Newtype: inherits Decimal's kinds — a `decimal`/`scaled_float` value.
#[derive(Clone, Copy, serde::Serialize, serde::Deserialize, FlussoValue)]
struct Money(Decimal);
// Enum: string-valued, so a kind tag is required (keyword default).
#[derive(serde::Serialize, serde::Deserialize, FlussoValue)]
#[serde(rename_all = "camelCase")]
#[flusso(keyword)]
enum Tier { Pro, Enterprise, Free }
#[derive(serde::Deserialize, FlussoDocument)]
#[flusso(index = "customers")]
struct Customer { tier: Tier, balance: Money /* … */ }
Customer::tier().eq(Tier::Pro); // term against "pro"
Customer::balance().gte(Money(Decimal::ONE)); // a decimal value, no cast
```
**`uuid::Uuid` is a keyword value behind the `uuid` feature** — id / foreign-key
fields stay in the struct as `Uuid` (no `#[flusso(skip)]`, no
`Keyword::at("…")`), and `Customer::owner_id().eq(some_uuid)` works without
`.to_string()`.
### Nullability is declared, not guessed
A field is `T` or `Option<T>`, and the derive **checks** it against the resolved
mapping. Nullability comes straight from the schema with no database round-trip: a
leaf states it with `required`, and joins, objects, and aggregates carry it
structurally. `ResolvedField` records the resulting `nullable: bool`; the derive
requires `nullable: false → T`, `nullable: true → Option<T>`.
| root `primary_key` column | `false` | forced non-null — it backs the document id |
| join `primary_key` field | `false` | forced non-null, just like the root key |
| leaf column, `required: true` | `false` | declared non-null |
| leaf column, `required: false` | `true` | nullable by default |
| `object` | `false` | always assembled from the same row |
| join `belongs_to` / `has_one` (`object`) | `true` | there may be no related row |
| join `has_many` / `many_to_many` | `false` | a `Vec`, empty when there are none, never null |
| aggregate `count` | `false` | a non-null `long` — zero rows is `0`, not null |
| aggregate `avg` | `true` | a nullable `double` — null over zero rows |
| aggregate `sum` / `min` / `max` | `true` | null over zero rows; the result mirrors the column |
`required` is rejected by the schema on joins and aggregates precisely because
their nullability is structural — so there's nothing for the author to declare.
---
## The escape hatch
Anything the typed builder can't express stays reachable, and still deserializes
into the typed struct:
```rust
let page: SearchResponse<User> = User::query()
.raw(serde_json::json!({
"knn": { "embedding": { "vector": [/* … */], "k": 10 } }
}))
.send(&client)
.await?;
```
`raw` takes the OpenSearch query DSL verbatim — the pressure-release valve for the
few types with no flusso field (`knn`/vector, `geo_shape`, span and parent/child
queries) and for percolators — without dropping to an untyped client or losing
typed results. Most of what used to need it (`function_score`, `script`,
`constant_score`, `query_string`, `search_after`, …) is now in the typed surface.
---
## Resolving the index name
The **physical** index carries the hash suffix (`users_3f2a1b9c`) — exactly what
the OpenSearch sink writes — and rotates on a structural schema change.
Because the binding is **generated from the schema at compile time**, the derive
knows that hash and emits it as a `const`: `User::INDEX` is the physical name, and
`get`/`query` use it. So `User::query()` addresses the right index directly, with
the hash hidden from the caller — no convenience alias required.
This is self-correcting: a structural schema change rotates the hash *and* changes
the resolved mapping, so the next `cargo build` regenerates the binding against the
new physical index. (`User::INDEX` and `User::SCHEMA_HASH` are exposed for logging,
admin, or a hand-built `Search`.)
> The convenience alias (`users` → current generation) is still worthwhile for
> clients that *don't* recompile against the schema — dynamic/scripting use,
> dashboards. For a derived binding it's unnecessary: the compile-time hash is the
> stable name.
### Reading a prefixed deployment
If flusso runs with an [index prefix](https://alias2k.github.io/flusso/guides/configuration.html) (`FLUSSO_INDEX_PREFIX`,
e.g. `dev_` so it writes `dev_users_<hash>`), tell the client the same prefix:
```rust
let client = Client::connect("https://localhost:9200")?
.index_prefix(std::env::var("FLUSSO_INDEX_PREFIX").unwrap_or_default());
```
The prefix is applied **at runtime**, on the transport — the derive still bakes the
unprefixed `User::INDEX`/`SCHEMA_HASH`, and the client prepends the prefix to every
request path. So **one compiled consumer binary serves every environment**: point it
at dev or staging by setting `FLUSSO_INDEX_PREFIX`, no rebuild. It must match the
writer's prefix exactly, or queries hit an empty (or wrong) index.
---
## Out of scope for the first cut
- **Search aggregations** (facets, histograms, cardinality). The typed surface is
filter/query/sort + typed hits first; aggregations need their own typed result
tree, and the `raw` hatch covers them in the meantime.
- **Writes.** flusso owns the index; the client never upserts or deletes.
- **Correlating hits across indexes.** Both multi-index shapes ship — [one blended
result list](#one-blended-result-list-combined-search) and independent searches
via [`_msearch`](#several-searches-one-round-trip-_msearch) — but *correlating*
hits across indexes remains the caller's job.
- **Scroll pagination.** `from`/`size` and `search_after` (deep pagination) ship;
a scroll cursor is a follow-on.
- **Generating the document struct.** By design — the developer owns the struct.
---
## Where this lands in the workspace
| `flusso-query` | Runtime: the `Client` transport, the field-handle/`Query`/`Search` builder, `SearchResponse`. Generic over the developer's document types. Targets OpenSearch / Elasticsearch (shared DSL). Re-exports the derive behind a `derive` feature, so callers `use flusso_query::FlussoDocument`. |
| `flusso-query-derive` | The `#[derive(FlussoDocument)]` proc-macro crate. At compile time it discovers `flusso.toml`, resolves the named index's [`IndexMapping`](https://github.com/alias2k/flusso/blob/main/libs/0-core/src/config/index_mapping.rs) from the self-describing schema (no database), validates the annotated struct, and emits the field handles, entry points, and schema hash. Reuses `schema-config-toml`, `schema-index-yaml`, and `schema-core`. |
Both crates sit above `schema-core` and depend only downward — no dependency on the
engine, the sources, or the sinks. They share only the domain model, the one thing
the read and write sides must agree on.