Expand description
Code Analysis and Code2Vec Embeddings
This module provides statistical approximation techniques for code analysis without state explosion, following the code2vec approach.
§Architecture
Source Code
│
▼
┌──────────────┐
│ AST Parser │
└──────────────┘
│
▼
┌──────────────────────────────┐
│ Path Extractor │
│ (leaf-to-leaf paths) │
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Code2Vec Encoder │
│ (path → vector embedding) │
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ GNN for Code Graphs │
│ (type/lifetime propagation) │
└──────────────────────────────┘§Features
- AST Representation: Lightweight AST node types for code analysis
- Path Extraction: Extract paths between terminal nodes (code2vec style)
- Embedding Encoder: Map paths to dense vector representations
- Code Graph Processing: Use GNN layers for type inference
§References
- Alon et al. (2019), “code2vec: Learning distributed representations of code”
- Allamanis et al. (2018), “A survey of machine learning for big code”
Modules§
- pooling
- Global pooling operations for code graphs
Structs§
- AstNode
- A node in the Abstract Syntax Tree
- AstPath
- A path context representing a connection between two terminals
- Code2
VecEncoder Code2Vecencoder for generating embeddings from AST paths- Code
Embedding - Code embedding representation
- Code
Graph - Code graph representation
- Code
Graph Edge - An edge in the code graph
- Code
Graph Node - A node in the code graph
- CodeMPNN
- Stack of MPNN layers for deep code analysis
- CodeMPNN
Layer - Message Passing Neural Network layer for code graphs
- Path
Context - Context for a path including positional information
- Path
Extractor - Extracts paths from AST following the code2vec approach
- Token
- A token (terminal node) in the AST
Enums§
- AstNode
Type - Types of AST nodes for code analysis
- Code
Edge Type - Edge types in a code graph
- Token
Type - Types of tokens (terminal nodes in the AST)
Constants§
- DEFAULT_
EMBEDDING_ DIM - Default embedding dimension
- MAX_
PATHS_ PER_ METHOD - Maximum number of paths to sample per method
- MAX_
PATH_ LENGTH - Maximum path length for code2vec paths