codegraph-c 0.1.0

C parser for CodeGraph - extracts code entities and relationships from C source files
Documentation

codegraph-c

C language parser for the CodeGraph framework.

Overview

codegraph-c provides robust C source code parsing with specialized support for Linux kernel and system-level code. It uses tree-sitter for fault-tolerant AST generation and includes a sophisticated preprocessing pipeline that handles:

  • GCC extensions (__attribute__, __asm__, typeof, etc.)
  • Linux kernel macros (__init, __exit, container_of, etc.)
  • Preprocessor conditionals (#if 0, #ifdef, etc.)
  • Platform-specific code (Linux, FreeBSD, Darwin)

Features

  • Fault-Tolerant Parsing: Extracts entities even from code with syntax errors
  • Layered Pipeline: Multi-stage preprocessing for kernel code
  • Platform Detection: Automatic detection of target platform with confidence scoring
  • Entity Extraction: Functions, structs, unions, enums, typedefs
  • Relationship Tracking: Include directives, function calls
  • Complexity Metrics: Cyclomatic complexity calculation
  • Graph Integration: Full integration with codegraph for code analysis

Usage

Basic Parsing

use codegraph_c::CParser;
use codegraph_parser_api::CodeParser;
use codegraph::CodeGraph;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graph = CodeGraph::in_memory()?;
    let parser = CParser::new();

    let file_info = parser.parse_file(Path::new("main.c"), &mut graph)?;
    println!("Parsed {} functions", file_info.functions.len());
    Ok(())
}

Kernel Code Parsing

For code with kernel-specific constructs:

use codegraph_c::{CParser, ExtractionOptions};
use codegraph_c::extractor::extract_with_options;
use codegraph_parser_api::ParserConfig;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let source = r#"
        #include <linux/module.h>

        static __init int my_init(void) {
            return 0;
        }
    "#;

    let options = ExtractionOptions::for_kernel_code();
    let result = extract_with_options(
        source,
        Path::new("driver.c"),
        &ParserConfig::default(),
        &options
    )?;

    println!("Extracted {} functions", result.ir.functions.len());
    Ok(())
}

Pipeline Processing

For maximum control over preprocessing:

use codegraph_c::pipeline::{Pipeline, PipelineConfig};

fn main() {
    let pipeline = Pipeline::new();
    let config = PipelineConfig::for_kernel_code();

    let source = r#"
        #include <linux/module.h>
        MODULE_LICENSE("GPL");
        static __init int my_init(void) { return 0; }
    "#;

    let result = pipeline.process(source, &config);
    println!("Platform: {} (confidence: {:.0}%)",
        result.platform.platform_id,
        result.platform.confidence * 100.0);
    println!("Processed source ready for parsing");
}

Architecture

The parser uses a layered processing pipeline:

Source Code
    │
    ▼
┌─────────────────────────────────────┐
│  1. Platform Detection              │
│     Detect Linux/FreeBSD/Darwin     │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  2. Conditional Evaluation          │
│     Strip #if 0 blocks              │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  3. GCC Neutralization              │
│     Handle __attribute__, etc.      │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  4. Macro Expansion                 │
│     Expand kernel macros            │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  5. tree-sitter Parsing             │
│     Fault-tolerant AST              │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  6. Entity Extraction               │
│     Functions, structs, calls       │
└─────────────────────────────────────┘

Supported Constructs

Type Definitions

  • u8, u16, u32, u64, s8, s16, s32, s64
  • size_t, ssize_t, uintptr_t, intptr_t
  • __le16, __le32, __be16, __be32 (kernel types)
  • Boolean types: bool, _Bool

Attributes (stripped)

  • __init, __exit, __user, __kernel
  • __iomem, __force, __percpu, __rcu
  • __must_check, __always_inline, __noinline
  • __section(...), __aligned(...)

GCC Extensions (neutralized)

  • __attribute__((...))
  • __extension__
  • __asm__, __asm volatile
  • typeof(), __typeof__()
  • Statement expressions ({ ... })

Macros (expanded or neutralized)

  • container_of(), offsetof()
  • likely(), unlikely()
  • BUILD_BUG_ON(), WARN_ON()
  • list_for_each() and iterator macros

Testing

# Run all tests
cargo test -p codegraph-c

# Run integration tests
cargo test -p codegraph-c --test integration_tests

Performance

The parser achieves good parsing rates on real-world kernel code:

Codebase Files Clean Parse Rate
ICE Driver 84 75%+
i915 Graphics 526 70%+
NVMe CLI 156 85%+
netperf 47 94%+

License

MIT OR Apache-2.0