flowscope-core 0.4.1

Core SQL lineage analysis engine
Documentation

flowscope-core

Core SQL lineage analysis engine for FlowScope.

Overview

flowscope-core is a Rust library that performs static analysis on SQL queries to extract table and column-level lineage information. It serves as the foundation for the FlowScope ecosystem, powering the WebAssembly bindings and JavaScript packages.

Features

  • Multi-Dialect Parsing: Built on sqlparser-rs, supporting PostgreSQL, Snowflake, BigQuery, DuckDB, Redshift, MySQL, SQLite, Databricks, ClickHouse, and Generic ANSI SQL.
  • Deep Lineage Extraction:
    • Table-level dependencies (SELECT, INSERT, UPDATE, MERGE, COPY, UNLOAD, etc.)
    • Column-level data flow (including transformations)
    • Cross-statement lineage tracking (CREATE TABLE AS, INSERT INTO ... SELECT)
  • dbt/Jinja Templating: Preprocess SQL with Jinja or dbt-style templates before analysis, with built-in stubs for ref(), source(), config(), var(), and is_incremental().
  • Complex SQL Support: Handles CTEs (Common Table Expressions), Subqueries, Joins, Unions, Window Functions, and lateral column aliases.
  • Schema Awareness: Utilize provided schema metadata to validate column references and resolve wildcards (SELECT *).
  • Type Inference: Infer expression types with dialect-aware type compatibility checking.
  • SQL Linting: 72 lint rules across 9 families (AL, AM, CP, CV, JJ, LT, RF, ST, TQ) with AST-driven semantic checks and token-aware formatting checks. Rules include autofix metadata with safe/unsafe classification.
  • Diagnostics: Returns structured issues (errors, warnings) with source spans for precise highlighting.

Structure

src/
├── analyzer.rs              # Main analysis orchestration
├── analyzer/
│   ├── context.rs           # Per-statement state and scope management
│   ├── schema_registry.rs   # Schema metadata and name resolution
│   ├── visitor.rs           # AST visitor for lineage extraction
│   ├── query.rs             # Query analysis (SELECT, subqueries)
│   ├── expression.rs        # Expression and column lineage
│   ├── select_analyzer.rs   # SELECT clause analysis
│   ├── statements.rs        # Statement-level analysis
│   ├── ddl.rs               # DDL statement handling (CREATE, ALTER)
│   ├── cross_statement.rs   # Cross-statement lineage tracking
│   ├── diagnostics.rs       # Issue reporting
│   ├── input.rs             # Input merging and deduplication
│   └── helpers/             # Utility functions
├── linter/                  # SQL lint engine
│   ├── mod.rs               # Linter orchestration
│   ├── config.rs            # Rule configuration
│   ├── document.rs          # Document model (shared tokens)
│   ├── rule.rs              # Rule trait and context
│   ├── visit.rs             # AST visitor for rules
│   └── rules/               # 72 rule implementations
├── parser/                  # SQL dialect handling
├── types/                   # Request/response types
└── lineage/                 # Lineage graph construction

Usage

use flowscope_core::{analyze, AnalyzeRequest, Dialect};

fn main() {
    let request = AnalyzeRequest {
        sql: "SELECT u.name, o.id FROM users u JOIN orders o ON u.id = o.user_id".to_string(),
        dialect: Dialect::Postgres,
        schema: None, // Optional schema metadata
        file_path: None,
    };

    let result = analyze(&request);

    // Access table lineage
    for statement in result.statements {
        println!("Tables: {:?}", statement.nodes);
        println!("Edges: {:?}", statement.edges);
    }
}

Linting

use flowscope_core::linter::{Linter, LintConfig, LintDocument};

let config = LintConfig::default();
let linter = Linter::new(config);
let document = LintDocument::new(sql, dialect);
let issues = linter.check_document(&document);

Testing

cargo test

License

Apache 2.0