Analysis Architecture
=====================
SqC uses a multi-pass analysis architecture:
::
Source Files
|
v
[Tree-sitter Parser] --> AST (per-file)
|
v
[Pre-scan Pass] --> Cross-file context (function defs, summaries, macros,
| struct types, global states, call graph, call-site args)
v
[CFG Construction] --> Per-function control-flow graphs
|
v
[Dataflow Analysis] --> Null state, value range, reaching defs, init state
|
v
[Rule Evaluation] --> 285 CERT C rules applied to AST + CFG + context
|
v
[Suppression Filter] --> Hash-based + wildcard (glob/prefix) suppression
|
v
[Export] --> CSV, XLSX, JSON, SARIF
Analysis Modules
----------------
**Tree-sitter parsing** (``src/analyze/mod.rs``).
Fast, incremental, error-tolerant C parsing. Each ``.c`` file is parsed into
an AST; the orchestrator coordinates prescan, CFG construction, dataflow, and
per-rule evaluation with optional Rayon parallelism.
**Cross-file pre-scan** (``src/analyze/prescan.rs``, ``context.rs``).
Walks ``-d`` directories collecting function definitions, header prototypes,
function summaries, call graphs, macro constants/aliases, struct field types,
global constants, and global pointer null states. Second pass aggregates
call-site argument null states and propagates transitive frees through
parameter pass-through chains (max 8 iterations). Results stored in
``ProjectContext``, optionally cached to binary (``--save-prescan`` /
``--load-prescan``). Consumed by 15+ rules.
**Function summaries** (``src/analyze/function_summary.rs``).
Lightweight inter-procedural summaries computed during prescan:
``frees_params``, ``can_return_null``, ``returns_allocation``,
``checks_null_params``, ``modifies_params``, ``dereferences_params``,
``never_returns``, ``callsite_param_null_states`` (aggregated from all call
sites), ``callsite_param_field_null_states`` (struct field propagation),
``callsite_param_pointee_null_states`` (pointer-to-pointer propagation),
``return_range`` (VRA inter-procedural), ``param_passthroughs`` (transitive
free tracking). Consumed by 7 rules.
**Control-flow graphs** (``src/analyze/cfg.rs``).
Per-function CFG with basic blocks, typed edges (Fallthrough, TrueBranch,
FalseBranch, BackEdge, Return, Break, Continue, Goto), and
``condition_range`` metadata for path-sensitive edge refinement. Optional
macro-constant-aware construction for dead-branch elimination. Consumed by
8 rules (INT30/31/32/33/34-C, EXP33/34-C, MEM01-C).
**Null state dataflow** (``src/analyze/null_state.rs``).
Forward dataflow on CFG with NullState lattice (Unknown → DefinitelyNull /
PossiblyNull / NotNull). Edge refinement on branch conditions supports
compound ``||`` / ``&&`` expressions. Seeded from global pointer states,
call-site parameter states, and function summaries. Primary consumer:
EXP34-C; also used by API00-C.
**Value range analysis** (``src/analyze/value_range.rs``).
CFG-based forward value-range dataflow for integer variables. Tracks
``TypedRange`` (interval + signedness/bit-width) per variable. Handles
sequential assignments, conditional narrowing, loop bounds, and early-return
guards. Inter-procedural return ranges from function summaries. Consumed
by INT30/31/32/33/34-C.
**Constant evaluation** (``src/analyze/const_eval.rs``).
Syntactic constant folding of ``#define`` macro constants and arithmetic
expressions. Includes built-in C99 ``<limits.h>``/``<stdint.h>`` macros
(LP64 model). ``try_evaluate_range()`` computes value ranges from
constants + variables + loop bounds via ancestor walks. Consumed by 11
rules (INT, ENV, ERR, FIO, FLP, STR families).
**Reaching definitions** (``src/analyze/dataflow.rs``).
Standard iterative worklist algorithm computing which definitions
(Declaration, Assignment, Parameter, NullAssignment, FreeCall, NullableCall)
reach each program point. Supports use-after-free and null dereference
queries. Primary consumer: MEM01-C.
**Initialization state** (``src/analyze/init_state.rs``).
Forward dataflow tracking initialization status with malloc-aware semantics
(Uninitialized, MaybeUninitialized, Initialized, MallocUninitialized,
MallocInitialized). Detects partial-init patterns in loops. Primary
consumer: EXP33-C.
**Standard function database** (``src/utility/cert_c/std_functions.rs``).
~370 C11, POSIX, and Windows API functions recognized to suppress false
positives on standard library calls (DCL31-C, DCL07-C).
**Suppression system** (``src/analyze/suppression.rs``).
Inline ``// SQC-SUPPRESS`` comments and ``.sqc-suppress.toml`` files.
SHA-256 hash-based point suppressions and glob/prefix wildcard suppressions.
Current Capabilities
--------------------
==================================== =====================================================
Capability Implementation
==================================== =====================================================
Local variable/type inference Per-function ``collect_variable_types``
Preprocessor block traversal ``preproc_*`` node recursion
Standard function database ~370 C11/POSIX/Windows functions
Cross-file function scanning ``-d`` flag pre-scan with binary cache
CFG construction Per-function with ``condition_range`` metadata
Reaching definitions Iterative worklist dataflow (MEM01-C)
Inter-procedural summaries Null returns, freed params, no-return, return
ranges, dereferences, pass-throughs
CFG-based null state dataflow Forward dataflow with NullState lattice, compound
condition support, global/call-site seeding
Value range analysis CFG-based forward dataflow, inter-procedural
return ranges, type-aware intervals
Initialization state analysis Forward dataflow with malloc-aware semantics
Constant evaluation Macro resolution, built-in limits, sizeof types
Call-site null propagation Aggregated argument states across all callers
Transitive free propagation Parameter pass-through chains (MEM31-C)
Global pointer null state Cross-file extern pointer tracking (EXP34-C)
Struct field type resolution Prescan-collected struct definitions
Taint tracking Intra-function (FIO30-C, STR02-C)
Dead-branch elimination Macro-constant-aware CFG construction
==================================== =====================================================
Known Limitations
-----------------
============================== ====================================================
Gap Impact
============================== ====================================================
No preprocessor expansion Macros appear as function calls; partially mitigated
by ``collect_macro_aliases``
No alias analysis Pointer aliasing unresolved; field-scoped alias
collection causes cross-function issues
No symbolic execution Complex path conditions not evaluated
No SSA form No use-def chains beyond reaching definitions
VRA intra-procedural only Inter-procedural argument ranges and field-sensitive
VRA not implemented; return ranges available
Limited taint tracking Intra-function only (STR02-C, FIO30-C);
cross-function taint for injection CWEs planned
Struct field tracking limited Prescan-visible structs only (INT32-C/INT30-C);
no field-level free or null tracking
No ownership model Cross-function memory ownership untracked;
limits MEM31-C/MEM30-C precision
============================== ====================================================
Architectural Ceiling
---------------------
Current TP rate: **67.5%** (Juliet, v0.3.119, 74 CWEs). The remaining gaps
are concentrated in CWEs requiring deeper analysis:
- **CWE-190/191** (integer overflow/underflow): 60.9%/55.3% vs clang-tidy 94%.
Requires more complete value-range propagation and bounds-check recognition.
- **CWE-369** (divide by zero): 56.0% vs clang-tidy 94.7%.
Requires stronger zero-value tracking through assignments.
- **CWE-476** (null dereference): 61.9% vs clang-tidy 94.3%.
Requires deeper inter-procedural null propagation and alias analysis.
- **CWE-121** (stack buffer overflow): 57.5% vs clang-tidy 86.6%.
Requires symbolic buffer size tracking across assignments.
Alias analysis and field-sensitive value tracking are the two capabilities
most likely to lift the ceiling. Each would require significant
architectural investment but could push TP rate toward 75%+.
Competitor Landscape
--------------------
5-tool comparison on 15 overlapping Juliet CWEs (28,488 files):
============== ============== ========= ==================================== ===========
Tool Detection Rate FP Rate Analysis Depth Price
============== ============== ========= ==================================== ===========
clang-tidy 91.6% 0.8% AST + path-sensitive Free
**SqC** 67.5% 32.5% AST + CFG + inter-procedural --
Frama-C 61.0% 39.0% Abstract interpretation Free
Infer 43.6% 56.4% Separation logic Free
cppcheck 36.4% 63.6% Data-flow Free
============== ============== ========= ==================================== ===========
*SqC results from v0.3.119 Juliet benchmark (74 CWEs). Competitor figures from
prior study on 15 overlapping CWEs.*
SqC achieves 100% precision (zero FP) on 34 CWEs including CWE-690, CWE-761,
CWE-78, and CWE-416. Broadest CWE coverage (74+ CWEs benchmarked vs
clang-tidy's 15).
**Key context**: Tools on average find ~20% of weaknesses in Juliet
(ISSTA2022). Even commercial tools miss 27% (Goseva2015). Industry FP target
for adoption is 10--20%. See :doc:`bibliography` for full references.