Skip to main content

Module code

Module code 

Source
Expand description

Code Analysis and Code2Vec Embeddings

This module provides statistical approximation techniques for code analysis without state explosion, following the code2vec approach.

§Architecture

Source Code
     │
     ▼
┌──────────────┐
│  AST Parser  │
└──────────────┘
     │
     ▼
┌──────────────────────────────┐
│  Path Extractor              │
│  (leaf-to-leaf paths)        │
└──────────────────────────────┘
     │
     ▼
┌──────────────────────────────┐
│  Code2Vec Encoder            │
│  (path → vector embedding)   │
└──────────────────────────────┘
     │
     ▼
┌──────────────────────────────┐
│  GNN for Code Graphs         │
│  (type/lifetime propagation) │
└──────────────────────────────┘

§Features

  • AST Representation: Lightweight AST node types for code analysis
  • Path Extraction: Extract paths between terminal nodes (code2vec style)
  • Embedding Encoder: Map paths to dense vector representations
  • Code Graph Processing: Use GNN layers for type inference

§References

  • Alon et al. (2019), “code2vec: Learning distributed representations of code”
  • Allamanis et al. (2018), “A survey of machine learning for big code”

Modules§

pooling
Global pooling operations for code graphs

Structs§

AstNode
A node in the Abstract Syntax Tree
AstPath
A path context representing a connection between two terminals
Code2VecEncoder
Code2Vec encoder for generating embeddings from AST paths
CodeEmbedding
Code embedding representation
CodeGraph
Code graph representation
CodeGraphEdge
An edge in the code graph
CodeGraphNode
A node in the code graph
CodeMPNN
Stack of MPNN layers for deep code analysis
CodeMPNNLayer
Message Passing Neural Network layer for code graphs
PathContext
Context for a path including positional information
PathExtractor
Extracts paths from AST following the code2vec approach
Token
A token (terminal node) in the AST

Enums§

AstNodeType
Types of AST nodes for code analysis
CodeEdgeType
Edge types in a code graph
TokenType
Types of tokens (terminal nodes in the AST)

Constants§

DEFAULT_EMBEDDING_DIM
Default embedding dimension
MAX_PATHS_PER_METHOD
Maximum number of paths to sample per method
MAX_PATH_LENGTH
Maximum path length for code2vec paths