[][src]Crate postgres_parser

postgres-parser is a safe wrapper around Postgres' SQL query parser.

Its primary purpose is to easily syntax check SQL statements (individually or en masse) in addition to providing a parse tree that can be walked, examined, mutated, or transformed.

Technical Details

First things first, this crate, as part of its build process, downloads the Postgres source code, builds Postgres to LLVM IR (and then bitcode), which is ultimately statically linked into the resulting Rust library (rlib), relying on LLVM's "link time optimization" (LTO) features to reduce the Postgres LLVM bitcode to just the symbols/code required by this crate.

That's a lot of work, and it requires that building this create, or even crates that use this crate as a dependency, have the LLVM toolchain on the system $PATH.

The justification for this is that, despite the build complexity, we can always stay current with Postgres as it evolves its SQL support and, thus, its parser.

What's in the Box?

There's three primary things. The first two are safe interfaces into parsing SQL statements and evaluating the resulting parse trees. The third is the set of completely unsafe functions and structs upon which the first two are built.

parse_query() and the nodes module

The parse_query() function parses a string of SQL statements and returns a Vec of parsed nodes, or a parse error

A quick example:

use postgres_parser::{parse_query, PgParserError};
let parsetree = parse_query("SELECT * FROM my_table WHERE id = 42; SELECT 2;");
match parsetree {
    Ok(nodes) => {
        // one node for each statement parsed from the input string above
        for node in nodes {
            // debug-print the node for this query
            println!("{:#?}", node);
        }
    }

    // one possible error, for the first mal-formed SQL statement
    Err(e) => {
        panic!(e);
    }      
}

The nodes represented in the parse tree live in the postgres_parser::nodes module. The top-level node is simply called Node and is an enum with a variant for every possible node type.

An example of walking a parsetree and examining the expected Node:

use postgres_parser::{parse_query, Node, join_name_list};
let parsetree = parse_query("DROP TABLE my_schema.my_table;");
match parsetree {
    Ok(mut nodes) => {
        let node = nodes.pop().unwrap(); // we know we only have 1 node here
        match node {
            Node::DropStmt(dropstmt) => {
                // dropstmt.object is a Vec<Node>, where each Node is a Node::List of
                // ultimately, Node::Value, where each value is a String
                for object in dropstmt.objects.unwrap() {
                    // join_name_list() will figure out the hard part for us
                    // this is a common pattern throughout Postgres' parsetree
                    let name = join_name_list(&object).unwrap();
                    assert_eq!(name, "my_schema.my_table");
                }
            }

            _ => panic!("unexpected node: {:#?}", node),
        }
    }

    // one possible error, for the first mal-formed SQL statement
    Err(e) => {
        panic!(e);
    }
}

The sys module

The sys module is a 100% "bindgen"-generated module from Postgres' header files. In general, it's not expected that users of this crate will interact with this module.

It is upon the items in this module that the rest of postgres-parser is built. The module is public for completeness only.

SqlStatementScanner

The SqlStatementScanner is a simple type intended to work as an iterator over scanning and parsing a single string of multiple SQL statements, one at a time.

This is particullary useful to report statement-level parse errors, as opposed to the parse_query() function that simply reports one error for the entire string.

A quick example:

use postgres_parser::SqlStatementScanner;
let mut scanner = SqlStatementScanner::new("SELECT 1;\nSELECT 2;").into_iter();

let first = scanner.next().expect("no first query");
assert_eq!(first.sql, "SELECT 1;\n"); // note trailing \n -- trailing whitespace after ';' is included
assert!(first.payload.is_none());
assert!(first.parsetree.is_ok());

let second = scanner.next().expect("no second query");
assert_eq!(second.sql, "SELECT 2;");
assert!(second.payload.is_none());
assert!(second.parsetree.is_ok());

assert!(scanner.next().is_none());

Serde Support

All the parse tree Node structures supported are Deserialize, Serialize, and as such, can be directly used by any of the serde serializers, including serde_json.

 use postgres_parser::parse_query;
 let as_json = serde_json::to_string_pretty(&parse_query("SELECT 1;")).expect("failed to convert to json");
 println!("{}", as_json);

The above would output:

{"SelectStmt":{"targetList":[{"ResTarget":{"val":{"A_Const":{"val":{"int":1},"location":7}},"location":7}}],"op":"SETOP_NONE","all":false}}

Notes on Thread Safety

Postgres is, by design, not thread safe. Rust, on the other hand, is. As we're literally statically linking against the compiled Postgres code, this presents an interesting problem.

The solution postgres-parser has taken is that the parse_query() function (which is also used by SqlStatementScanner) is guarded under a Rust Mutex. As such, only one query can be parsed at a time.

Re-exports

pub use nodes::Node;

Modules

nodes

Generated types to represent a parse tree in a safe manner as returned from parse_query()

sys

Generated types and constants from Postgres' header files necessary to represent a parse tree as raw "C" structures. Also contains various enum types used by this module and the nodes module

Structs

ScannedStatement

An individual SQL statement scanned from a larger set of SQL statements

SqlStatementScanner

The SqlStatementScanner allows for scanning a blob of SQL statements and ultimately iterating over each statement, one at a time, producing a ScannedStatement that includes the raw SQL, that SQL's parsetree, and optional "COPY ... FROM stdin;" payload data.

SqlStatementScannerIterator

Iterator for the SqlStatementScanner

Enums

PgParserError

Represents various errors that can occur while parsing a SQL statement.

Functions

join_name_list

A common pattern in Postgres' parse trees, when it needs to represent the name of a thing (a table, a field, a view, etc), especially when that name can be qualified, is to represent that name as a List of string Values.

parse_query

Parse a string of delimited SQL statements.