Expand description
S3 Select — server-side SQL filter on object body (v0.6 #41).
Implements the SelectObjectContent surface as a small,
self-contained module. The primary entry point is run_select_csv /
run_select_jsonlines which take a SQL string and the in-memory body
bytes (the caller is responsible for fetching + decompressing +
decrypting the object — at the handler level we delegate to S4’s
existing GET path so SSE-C / SSE-S4 / SSE-KMS / S4 codec all work
transparently).
§Supported SQL subset
SELECT col1, col2 FROM s3object— projection by header name when the CSV has a header line.SELECT _1, _3 FROM s3object— positional projection (1-based, AWS convention;_1is the leftmost column).SELECT * FROM s3object— all columns in input order.WHERE col = 'value',WHERE col > 100,WHERE col LIKE 'foo%'.AND/OR/NOTboolean composition.- String / integer / float literals.
- Equality / inequality (
=,<>,<,>,<=,>=) andLIKE.
§Explicitly unsupported (rejected with SelectError::UnsupportedFeature)
- Aggregates (
COUNT,SUM,AVG, …) andGROUP BY/HAVING. JOIN/ subqueries.ORDER BY/LIMIT(Select-on-S3 streams in input order; aggregating would defeat the streaming model and is outside this v0.6 scope).- Parquet input (Parquet decode is intentionally out of scope; CSV / JSON Lines are the v0.6 deliverables).
§Output framing
EventStreamWriter emits the AWS event-stream binary protocol —
one Records frame per non-empty payload, an optional Stats frame,
and a terminating End frame. Each frame is
[total_len BE u32][headers_len BE u32][prelude CRC32][headers][payload][message CRC32]
per the AWS appendix. The handler in service.rs feeds
the produced events into s3s::dto::SelectObjectContentEventStream,
which performs equivalent framing on the wire — EventStreamWriter
exists primarily so the frame format itself can be unit-tested and
asserted-on by the integration test without spinning up a full client.
Structs§
- CsvRow
- CSV input row. Columns indexed by 0-based position OR by header name
(when the InputFormat says
has_header = true). - Event
Stream Writer - Emits AWS event-stream binary frames for a Select response. Each frame
is
[total_len BE u32][headers_len BE u32][prelude CRC32][headers...][payload][message CRC32]. - Select
Query
Enums§
Functions§
- evaluate_
row - Apply WHERE + projection to a single row. Returns
Ok(Some(values))for matched rows (oneStringperSELECTitem, in declaration order),Ok(None)if WHERE excluded the row,Err(...)only on runtime evaluation problems (a projected column not in the row, etc). - parse_
select - Parse and validate a S3 Select SQL expression.
- run_
select_ csv - Run a Select against a CSV-bytes body in-memory. Returns the
concatenated output bytes in
outputformat (CSV: rfc4180 single CRLF rows / JSON: one JSON-object-per-line). - run_
select_ jsonlines - Run a Select against a JSON-Lines body (
{...}\n{...}\n...). One row per top-level JSON object. Nested values are stringified for CSV output; for JSON output, the projected fields are re-emitted with their original JSON literal. - select_
gpu - GPU acceleration entry point — replaced the v0.6 #41 stub in
v0.8 #51. Returns
Some(filtered_csv_bytes)when the GPU path both can and did handle the query (single-column compare on header-bearing CSV);Nonemeans the caller should run the CPU path. The CPU path is always the source of truth for output formatting / projection / multi-condition WHERE / JSON Lines.