Skip to main content

Module select

Module select 

Source
Expand description

S3 Select — server-side SQL filter on object body (v0.6 #41).

Implements the SelectObjectContent surface as a small, self-contained module. The primary entry point is run_select_csv / run_select_jsonlines which take a SQL string and the in-memory body bytes (the caller is responsible for fetching + decompressing + decrypting the object — at the handler level we delegate to S4’s existing GET path so SSE-C / SSE-S4 / SSE-KMS / S4 codec all work transparently).

§Supported SQL subset

  • SELECT col1, col2 FROM s3object — projection by header name when the CSV has a header line.
  • SELECT _1, _3 FROM s3object — positional projection (1-based, AWS convention; _1 is the leftmost column).
  • SELECT * FROM s3object — all columns in input order.
  • WHERE col = 'value', WHERE col > 100, WHERE col LIKE 'foo%'.
  • AND / OR / NOT boolean composition.
  • String / integer / float literals.
  • Equality / inequality (=, <>, <, >, <=, >=) and LIKE.

§Explicitly unsupported (rejected with SelectError::UnsupportedFeature)

  • Aggregates (COUNT, SUM, AVG, …) and GROUP BY / HAVING.
  • JOIN / subqueries.
  • ORDER BY / LIMIT (Select-on-S3 streams in input order; aggregating would defeat the streaming model and is outside this v0.6 scope).
  • Parquet input (Parquet decode is intentionally out of scope; CSV / JSON Lines are the v0.6 deliverables).

§Output framing

EventStreamWriter emits the AWS event-stream binary protocol — one Records frame per non-empty payload, an optional Stats frame, and a terminating End frame. Each frame is [total_len BE u32][headers_len BE u32][prelude CRC32][headers][payload][message CRC32] per the AWS appendix. The handler in service.rs feeds the produced events into s3s::dto::SelectObjectContentEventStream, which performs equivalent framing on the wire — EventStreamWriter exists primarily so the frame format itself can be unit-tested and asserted-on by the integration test without spinning up a full client.

Structs§

CsvRow
CSV input row. Columns indexed by 0-based position OR by header name (when the InputFormat says has_header = true).
EventStreamWriter
Emits AWS event-stream binary frames for a Select response. Each frame is [total_len BE u32][headers_len BE u32][prelude CRC32][headers...][payload][message CRC32].
SelectQuery

Enums§

SelectError
SelectInputFormat
SelectOutputFormat

Functions§

evaluate_row
Apply WHERE + projection to a single row. Returns Ok(Some(values)) for matched rows (one String per SELECT item, in declaration order), Ok(None) if WHERE excluded the row, Err(...) only on runtime evaluation problems (a projected column not in the row, etc).
parse_select
Parse and validate a S3 Select SQL expression.
run_select_csv
Run a Select against a CSV-bytes body in-memory. Returns the concatenated output bytes in output format (CSV: rfc4180 single CRLF rows / JSON: one JSON-object-per-line).
run_select_jsonlines
Run a Select against a JSON-Lines body ({...}\n{...}\n...). One row per top-level JSON object. Nested values are stringified for CSV output; for JSON output, the projected fields are re-emitted with their original JSON literal.
select_gpu
GPU acceleration entry point — replaced the v0.6 #41 stub in v0.8 #51. Returns Some(filtered_csv_bytes) when the GPU path both can and did handle the query (single-column compare on header-bearing CSV); None means the caller should run the CPU path. The CPU path is always the source of truth for output formatting / projection / multi-condition WHERE / JSON Lines.