Skip to main content

Crate astchunk

Crate astchunk 

Source
Expand description

§astchunk

AST-based code chunking, implementing the algorithm from the cAST paper.

§Quick start

A typical pipeline is: Document -> AstChunk -> TextChunk -> JsonRecord.

use astchunk::chunker::{Chunker, CastChunker, CastChunkerOptions};
use astchunk::formatter::{CanonicalFormatter, Formatter};
use astchunk::output::JsonRecord;
use astchunk::types::{Document, DocumentId, Origin};
use astchunk::lang::Language;

let source = "def hello():\n    print('hi')\n";
let document = Document {
    document_id: DocumentId(0),
    language: Language::Python,
    source: source.into(),
    origin: Origin::default(),
};
let chunker = CastChunker::new(CastChunkerOptions::default());
let ast_chunks = chunker.chunk(&document).unwrap();

let formatter = CanonicalFormatter::default();
let text_chunks = formatter.format(&document, &ast_chunks).unwrap();

let records = JsonRecord::build(&document, &ast_chunks, &text_chunks);

assert!(!ast_chunks.is_empty());
assert_eq!(text_chunks.len(), ast_chunks.len());
assert_eq!(records.len(), text_chunks.len());

§Modules

§Feature flags

FeatureDescription
cliBuild the command-line interface

Modules§

chunker
Chunking traits and concrete implementations for producing AST chunks.
error
Error types returned by the public astchunk pipeline APIs.
formatter
Text formatting traits and implementations built on top of AST chunks.
lang
Language definitions and tree-sitter bindings used by the chunking pipeline.
output
Output record types for serializing formatted chunks into downstream formats.
types
Core data types shared across the chunking, formatting, and output pipeline.