aqp3 0.2.5

Congress.gov legislation text query syntax parser.
Documentation

Opinionated parser for Congress.gov's legislation text search query syntax.

Congress.gov's query parser is fairly permissive and allows some queries whose semantics are unclear. This library is opinionated in that it rules out some queries that will both parse and run on Congress.gov. For example, double negatives, nested MUST and SHOULD queries, and MUST/SHOULD groups inside proximity queries will "work" on Congress.gov, but it's not clear what such queries are supposed to mean. Additionally, NOT queries inside of MUST/SHOULD groups will parse and run, but it appears that the Congress.gov parser ignores or removes the ! in those cases. So this library only allows negating terms at the top level.

This library also has some built-in functionality for simplifying queries, specifically removing redundant terms, extraneous parentheses, and the MUST operator, the latter because the default connective for Congress.gov search is AND, so the MUST is always unnecessary. Simplification also includes grouping consecutive SHOULD terms, like ~a ~b, into ~(a b).

aqp stands for "Advanced Query Parser". Background on that is available from this Solr Jira ticket. The 3 in the crate name is because this my third attempt at putting together this crate.

Below is a grammar for the query syntax as implemented by this package, though the implementation may have drifted from what's described below. The implementation should be considered the normative version of the syntax for the purposes of this crate. Paste the grammar into the Ohm Editor to experiment with it and test example queries.

Query {

   Exp = ( ParenExp | Prox | Boolean | term | not )+
    
   ParenExp =  "(" Exp ")"
        
   Prox = ( "n" | "N" |  "w" | "W" ) "/" digit+ ParenProxArgs
    
   ParenProxArgs = "(" ( ProxArgs | ParenProxArgs ) ")"
        
   ProxArgs = literal+ | ( Prox | Boolean | nonliteral )+
        
   Boolean = ("+" | "~") ( BoolArgs | ParenBoolArgs )
    
   ParenBoolArgs = "(" ( BoolArgs+ | ParenBoolArgs+ ) ")"
    
   BoolArgs =  Prox | Boolean | term
    
   // tokens
        
   term = nonliteral | literal
        
   nonliteral = wildcard | phrase | bare
        
   phrase = "\"" ( bare | space )+ "\""
        
   literal = "'" bare "'"
        
   wildcard = bare "*"
        
   not = "!" term
    
   // may need to add more punctuation

   bare = ( alnum | space | "," | "." | "%" | "$" )+ ~"/"
        
}