Crate prqlc

Source
Expand description

§prqlc

Compiler for PRQL language. Targets SQL and exposes PL and RQ abstract syntax trees.

You probably want to start with compile wrapper function.

For more granular access, refer to this diagram:

           PRQL

   (parse) │ ▲
prql_to_pl │ │ pl_to_prql
           │ │
           ▼ │      json::from_pl
                  ────────►
          PL AST            PL JSON
                  ◄────────
           │        json::to_pl
           │
 (resolve) │
  pl_to_rq │
           │
           │
           ▼        json::from_rq
                  ────────►
          RQ AST            RQ JSON
                  ◄────────
           │        json::to_rq
           │
 rq_to_sql │
           ▼

           SQL

§prqlc Architecture

The PRQL compiler operates in the following stages:

stagesub-stageAbstract Syntax Tree (AST) Type used
parselexerstring -> LR — Lexer Representation
parseparserLR -> PR — Parser Representation
semanticast_expandPR -> PL — Pipelined Language
semanticresolverPL
semanticflattenPL
semanticloweringPL -> RQ — Resolved Query
sqlpreprocessRQ
sqlpq-compilerRQ -> PQ — Partitioned Query
sqlpostprocessPQ
sqlsql-compilerPQ -> sqlparser::ast
sqlcodegensqlparser::ast -> string
  1. Lexing & Parsing: PRQL source text is split into tokens with the Chumsky parser named “lexer”. The stream of tokens, as Lexer Representation (LR), is then parsed into an Abstract Syntax Tree (AST) called Parser Representation (PR).

  2. Semantic Analysis: This stage resolves names (identifiers), extracts declarations, and determines frames (table columns in each step). A Context is declared containing the root module, which maps accessible names to their declarations.

    The resolving process involves the following operations:

    • Assign an ID to each node (Expr and Stmt).
    • Extract function declarations and variable definitions into the appropriate Module, accessible from Context::root_mod.
    • Look up identifiers in the module and find the associated declaration. The identifier is replaced with a fully qualified name that guarantees a unique name in root_mod. In some cases, Expr::target is also set.
    • Convert function calls to transforms (from, derive, filter) from FuncCall to TransformCall, which is more convenient for later processing.
    • Determine the type of expressions. If an expression is a reference to a table, use the frame of the table as the type. If it is a TransformCall, apply the transform to the input frame to obtain the resulting type. For simple expressions, try to infer from ExprKind.
    • Lowering: This stage converts the PL into RQ, which is more strictly typed and contains less information but is convenient for translating into SQL or other backends.
  3. SQL Backend: This stage converts RQ into PQ, an intermediate AST, before finally converting to SQL. Each relation is transformed into an SQL query. Pipelines are analyzed and split into “AtomicPipelines” at appropriate positions, which can be represented by a single SELECT statement.

    Splitting is performed back-to-front. First, a list of all output columns is created. The pipeline is then traversed backwards, and splitting occurs when an incompatible transform with those already present in the pipeline is encountered. Splitting can also be triggered by encountering an expression that cannot be materialized where it is used (e.g., a window function in a WHERE clause).

    This process is also called anchoring, as it anchors a column definition to a specific location in the output query.

    During this process, sql::context keeps track of:

    • Table instances in the query (to prevent mixing up multiple instances of the same table)
    • Column definitions, whether computed or a reference to a table column
    • Column names, as defined in RQ or generated

§Common use-cases

  • Compile PRQL queries to SQL at run time.

    let sql = prqlc::compile(
        "from albums | select {title, artist_id}",
         &prqlc::Options::default().no_format()
    )?;
    assert_eq!(&sql[..35], "SELECT title, artist_id FROM albums");
  • Compile PRQL queries to SQL at build time.

    For inline strings, use the prqlc-macros crate; for example:

    let sql: &str = prql_to_sql!("from albums | select {title, artist_id}");

    For compiling whole files (.prql to .sql), call prqlc from build.rs. See this example project.

  • Compile, format & debug PRQL from command line.

    $ cargo install --locked prqlc
    $ prqlc compile query.prql

§Feature flags

The following feature flags are available:

  • cli: enables the prqlc CLI binary. This is enabled by default. When consuming this crate from another rust library, it can be disabled.
  • test-dbs: enables the prqlc in-process test databases as part of the crate’s tests. This significantly increases compile times so is not enabled by default.
  • test-dbs-external: enables the prqlc external test databases, requiring a docker container with the test databases to be running. Check out the integration tests for more details.
  • serde_yaml: Enables serialization and deserialization of ASTs to YAML.

§Large binary sizes

For Linux users, the binary size contributed by this crate will probably be quite large (>20MB) by default. That is because it includes a lot of debuginfo symbols from our parser. They can be removed by adding the following to Cargo.toml, reducing the contribution to around 7MB:

[profile.release.package.prqlc]
strip = "debuginfo"

Modules§

debug
internal
Debugging and unstable API functions
ir
Intermediate Representations of Abstract Syntax Tree
json
JSON serialization and deserialization functions
lr
parser
pr
PR, or “Parser Representation” is an AST representation of parsed PRQL. It takes LR tokens and converts them into a more structured form which understands expressions, such as tuples & functions.
semantic
Semantic resolver (name resolution, type checking and lowering to RQ)
sql
Backend for translating RQ into SQL
utils

Structs§

Error
A prqlc error. Used internally, exposed as prqlc::ErrorMessage.
ErrorMessage
ErrorMessages
Errors
Multiple prqlc errors. Used internally, exposed as prqlc::ErrorMessages.
Options
Compilation options for SQL backend of the compiler.
SourceLocation
Location within the source file. Tuples contain:
SourceTree
All paths are relative to the project root.
Span

Enums§

DisplayOptions
ErrorSource
MessageKind
Compile message kind. Currently only Error is implemented.
Reason
Target

Traits§

WithErrorInfo

Functions§

compile
Compile a PRQL string into a SQL string.
compiler_version
Get the version of the compiler. This is determined by the first of:
pl_to_prql
Generate PRQL code from PL AST
pl_to_rq
Perform semantic analysis and convert PL to RQ.
pl_to_rq_tree
Perform semantic analysis and convert PL to RQ.
prql_to_pl
Parse PRQL into a PL AST
prql_to_pl_tree
Parse PRQL into a PL AST
prql_to_tokens
Lex PRQL source into Lexer Representation.
rq_to_sql
Generate SQL from RQ.

Type Aliases§

Result