# ADA_Standards Library - Complete Documentation Guide
This guide provides a comprehensive overview of the ADA_Standards library's architecture, usage patterns, and best practices.
## š Table of Contents
1. [Architecture Overview](#architecture-overview)
2. [Core Concepts](#core-concepts)
3. [Complete Workflow](#complete-workflow)
4. [API Reference Quick Guide](#api-reference-quick-guide)
5. [Common Patterns](#common-patterns)
6. [Performance Considerations](#performance-considerations)
7. [Troubleshooting](#troubleshooting)
## Architecture Overview
### Design Philosophy
The library uses a **three-phase approach**:
1. **Extraction Phase**: Regex-based pattern matching identifies Ada constructs
2. **Building Phase**: Stack-based algorithm establishes parent-child relationships
3. **Post-Processing Phase**: Enriches the tree with derived data
### Key Design Decisions
- **Regex over Full Parsing**: Lightweight and fast for analysis tasks
- **Position Preservation**: All cleaning operations maintain character positions
- **Flat-then-Hierarchical**: Extract flat list first, build tree second
- **indextree Arena**: Efficient tree storage with O(1) node access
## Core Concepts
### 1. NodeData - The Universal Node
`NodeData` is designed to represent **any** Ada construct through flexible fields:
```rust
pub struct NodeData {
// Common fields (always used)
name: String, // Identifier
node_type: String, // Classification
start_line: Option<usize>,
end_line: Option<usize>,
start_index: Option<usize>,
end_index: Option<usize>,
// Specialized fields (used by specific node types)
is_body: Option<bool>, // For packages, procedures
arguments: Option<Vec<ArgumentData>>, // For subprograms
conditions: Option<ConditionExpr>, // For control flow
cases: Option<Vec<String>>, // For case statements
// ... and many more
}
```
**Key Insight**: Not all fields are relevant for all node types. A `VariableDeclaration` won't use `arguments`, and an `IfStatement` won't use `tuple_values`.
### 2. Expression Trees
Conditions are parsed into hierarchical trees that respect precedence:
```
"X > 10 and Y < 20"
ā
ā¼
[AND]
/ \
/ \
[>] [<]
/ \ / \
X 10 Y 20
```
Each node in the tree is an `Expression` enum variant:
- `Expression::Binary` - Binary operations
- `Expression::Unary` - Unary operations
- `Expression::Membership` - Membership tests
- `Expression::Literal` - Atomic values
### 3. The Arena Pattern
The AST uses `indextree::Arena` for efficient tree storage:
```rust
pub struct AST {
arena: Arena<NodeData>, // Owns all nodes
root_id: NodeId, // Entry point
// ...
}
```
**Benefits**:
- O(1) node access by ID
- Automatic memory management
- Safe parent-child traversal
- No recursive allocation overhead
## Complete Workflow
### Step-by-Step Process
```rust
use ADA_Standards::{AST, ASTError};
use std::fs;
fn analyze_ada_file(path: &str) -> Result<(), ASTError> {
// STEP 1: Read source file
let raw_code = fs::read_to_string(path)
.expect("Failed to read file");
// STEP 2: Clean the code
// - Removes comments (replaced with spaces)
// - Cleans string contents (replaced with spaces)
// - Normalizes tabs to spaces
let cleaned_code = AST::clean_code(&raw_code);
// STEP 3: Extract all nodes
// Runs all extractor functions to find constructs
let nodes = AST::extract_all_nodes(&cleaned_code)?;
// STEP 4: Create AST instance
// Nodes are stored but tree isn't built yet
let mut ast = AST::new(nodes);
// STEP 5: Build the tree
// - Sorts nodes by position
// - Populates arena
// - Associates end statements
// - Establishes parent-child relationships
ast.build(&cleaned_code)?;
// STEP 6: Post-processing (optional but recommended)
// - Extracts case alternatives
// - Parses exit-when conditions
ast.populate_cases(&cleaned_code)?;
ast.populate_simple_loop_conditions(&cleaned_code)?;
// STEP 7: Analyze!
// Now you can traverse and query the tree
ast.print_tree();
Ok(())
}
```
### Why This Order?
1. **Clean First**: Prevents regex from matching inside strings/comments
2. **Extract Before Build**: Allows sorting and validation
3. **Build Before Post-Process**: Tree structure needed for context
4. **Post-Process Last**: Requires accurate node positions
## API Reference Quick Guide
### AST Construction
| `AST::clean_code(text)` | Preprocess source | Always, before extraction |
| `AST::extract_all_nodes(text)` | Find all constructs | Standard workflow |
| `AST::extract_packages(text)` | Find only packages | Custom extraction |
| `AST::new(nodes)` | Create AST | After extraction |
| `ast.build(text)` | Build tree | After new(), before analysis |
### Post-Processing
| `populate_cases(text)` | Extract when clauses | For case analysis |
| `populate_simple_loop_conditions(text)` | Extract exit-when | For loop analysis |
### Querying
| `find_node_by_name_and_type(name, type)` | Search by criteria | `Option<NodeId>` |
| `root_id()` | Get root node | `NodeId` |
| `arena()` | Access node storage | `&Arena<NodeData>` |
### Traversal Patterns
```rust
// Depth-first traversal from root
for node_id in ast.root_id().descendants(ast.arena()) {
let node = ast.arena().get(node_id).unwrap().get();
println!("{}", node.name);
}
// Get direct children only
for child_id in some_node_id.children(ast.arena()) {
// Process child
}
// Get parent
if let Some(parent_id) = node_id.ancestors(ast.arena()).nth(1) {
// Process parent (nth(0) is self, nth(1) is parent)
}
```
## Common Patterns
### Pattern 1: Find All Procedures
```rust
fn find_all_procedures(ast: &AST) -> Vec<NodeId> {
ast.root_id()
.descendants(ast.arena())
.filter(|&id| {
let node = ast.arena().get(id).unwrap().get();
node.node_type == "ProcedureNode"
})
.collect()
}
```
### Pattern 2: Check Coding Standard
```rust
fn check_procedure_naming(ast: &AST) {
for node_id in ast.root_id().descendants(ast.arena()) {
let node = ast.arena().get(node_id).unwrap().get();
if node.node_type == "ProcedureNode" && node.is_body == Some(true) {
// Check if name starts with uppercase
if !node.name.chars().next().unwrap().is_uppercase() {
eprintln!("Warning: Procedure '{}' should start with uppercase", node.name);
}
}
}
}
```
### Pattern 3: Calculate Metrics
```rust
fn calculate_complexity(ast: &AST, node_id: NodeId) -> usize {
let mut complexity = 1; // Base complexity
for child_id in node_id.descendants(ast.arena()) {
let child = ast.arena().get(child_id).unwrap().get();
match child.node_type.as_str() {
"IfStatement" | "ElsifStatement" => complexity += 1,
"CaseStatement" => {
if let Some(cases) = &child.cases {
complexity += cases.len();
}
}
"WhileLoop" | "ForLoop" => complexity += 1,
_ => {}
}
}
complexity
}
```
### Pattern 4: Extract Documentation
```rust
fn extract_procedure_signatures(ast: &AST) {
for node_id in ast.root_id().descendants(ast.arena()) {
let node = ast.arena().get(node_id).unwrap().get();
if node.node_type == "ProcedureNode" && node.is_body == Some(false) {
print!("procedure {}", node.name);
if let Some(args) = &node.arguments {
print!(" (");
for (i, arg) in args.iter().enumerate() {
if i > 0 { print!("; "); }
print!("{} : {} {}", arg.name, arg.mode, arg.data_type);
}
print!(")");
}
println!(";");
}
}
}
```
## Performance Considerations
### Memory Usage
- **NodeData Size**: ~800 bytes per node (many Options)
- **Arena Overhead**: Minimal (just pointer indirection)
- **Typical File**: 1000 lines ā 100-200 nodes ā 100KB
### Time Complexity
| clean_code | O(n) | Single regex pass |
| extract_all_nodes | O(n Ć m) | n=chars, m=patterns |
| build | O(n log n) | Sort + linear scan |
| find_node | O(n) | Linear search |
| descendants | O(n) | Tree traversal |
### Optimization Tips
1. **Reuse Cleaned Code**: Don't call `clean_code()` multiple times
2. **Extract Once**: Call `extract_all_nodes()` once, not individual extractors multiple times
3. **Build Once**: The tree is immutable after building (except post-processing)
4. **Cache Lookups**: If searching repeatedly, cache NodeIds
## Troubleshooting
### Common Issues
#### Issue: "NodeId not found in arena"
**Cause**: Using a NodeId from a previous AST instance
**Solution**: Rebuild the AST or store node data, not IDs
#### Issue: "Unmatched end statements"
**Cause**: Malformed code or regex not matching properly
**Solution**: Check that code is valid Ada; warnings are usually harmless
#### Issue: "Missing nodes in tree"
**Cause**: Nodes not being pushed onto stack during build
**Solution**: Check if node has `end_line` set (should be None for containers)
#### Issue: "Wrong parent-child relationships"
**Cause**: Incorrect line numbering or end line association
**Solution**: Verify `clean_code()` was called and line counts are accurate
### Debugging Techniques
```rust
// Print the entire tree
ast.print_tree();
// Print detailed node info
for node_id in ast.root_id().descendants(ast.arena()) {
let node = ast.arena().get(node_id).unwrap().get();
node.print_info();
}
// Print expression tree
if let Some(cond) = &node.conditions {
if let Some(root) = &cond.albero {
AST::leggitree(root, 0, "Root: ");
}
}
// Check extraction counts
let nodes = AST::extract_all_nodes(&cleaned)?;
println!("Found {} nodes", nodes.len());
for node in &nodes {
println!(" {} at line {}", node.node_type, node.start_line.unwrap_or(0));
}
```
## Best Practices
### Do's ā
- **Always clean code first** before extraction
- **Check for errors** after each major step
- **Use the arena reference** for traversal (don't clone nodes unnecessarily)
- **Cache frequently-used NodeIds** if doing repeated searches
- **Test with real Ada code** to validate your analysis logic
### Don'ts ā
- **Don't modify nodes after building** (arena is the source of truth)
- **Don't assume all fields are populated** (use pattern matching or if-let)
- **Don't search linearly repeatedly** (O(n²) for large files)
- **Don't forget post-processing** if you need cases or loop conditions
- **Don't rely on node_ids vector** after build (use arena instead)
## Extending the Library
### Adding a New Extractor
```rust
pub fn extract_my_construct(code_text: &str) -> Result<Vec<NodeData>, ASTError> {
let pattern = Reg::new(r"my_pattern_here").map_err(|_| ASTError::RegexError)?;
let mut nodes = Vec::new();
for cap in pattern.captures_iter(code_text) {
let captures = cap.map_err(|_| ASTError::RegexError)?;
let keyword = captures.name("keyword").ok_or(ASTError::InvalidCapture)?;
let start_line = code_text[..keyword.start()].matches('\n').count() + 1;
let start_index = keyword.start();
let node = NodeData::new(
"MyNode".to_string(),
"MyNodeType".to_string(),
Some(start_line),
Some(start_index),
false,
);
nodes.push(node);
}
Ok(nodes)
}
```
### Adding a New Node Type
1. Add a new `node_type` string constant
2. Update `get_end_keyword()` if it has an end statement
3. Add extractor function
4. Call from `extract_all_nodes()`
5. Document the new fields used in `NodeData`
## Conclusion
The ADA_Standards library provides a flexible, efficient framework for Ada code analysis. By understanding its three-phase architecture and leveraging the indextree arena pattern, you can build powerful linting, metrics, and refactoring tools.
For more examples, see the test suite in `tests/test.rs`, which demonstrates end-to-end usage on a complex Ada file.