thread-flow
Thread's dataflow integration for incremental code analysis, using CocoIndex for content-addressed caching and dependency tracking.
Overview
thread-flow bridges Thread's imperative AST analysis engine with CocoIndex's declarative dataflow framework, enabling persistent incremental updates and multi-backend storage. It provides:
- ✅ Content-Addressed Caching: 50x+ performance gains via automatic incremental updates
- ✅ Dependency Tracking: File-level and symbol-level dependency graph management
- ✅ Multi-Backend Storage: Postgres (CLI), D1 (Edge), and in-memory (testing)
- ✅ Dual Deployment: Single codebase compiles to CLI (Rayon parallelism) and Edge (tokio async)
- ✅ Language Extractors: Built-in support for Rust, Python, TypeScript, and Go
Architecture
┌─────────────────────────────────────────────────────────────┐
│ thread-flow Crate │
├─────────────────────────────────────────────────────────────┤
│ Incremental System │
│ ├─ Analyzer: Change detection & invalidation │
│ ├─ Extractors: Language-specific dependency parsing │
│ │ ├─ Rust: use declarations, pub use re-exports │
│ │ ├─ Python: import/from...import statements │
│ │ ├─ TypeScript: ES6 imports, CommonJS requires │
│ │ └─ Go: import blocks, module path resolution │
│ ├─ Graph: BFS traversal, topological sort, cycles │
│ └─ Storage: Backend abstraction with factory pattern │
│ ├─ Postgres: Connection pooling, prepared statements │
│ ├─ D1: Cloudflare REST API, HTTP client │
│ └─ InMemory: Testing and development │
├─────────────────────────────────────────────────────────────┤
│ CocoIndex Integration │
│ ├─ Bridge: Adapts Thread → CocoIndex operators │
│ ├─ Flows: Declarative analysis pipeline builder │
│ └─ Runtime: CLI (Rayon) vs Edge (tokio) strategies │
└─────────────────────────────────────────────────────────────┘
Quick Start
Add to your Cargo.toml:
[]
= { = "0.1", = ["postgres-backend", "parallel"] }
Basic Usage
use ;
use PathBuf;
async
Dependency Extraction
use ;
use Path;
async
Invalidation and Re-analysis
use IncrementalAnalyzer;
use PathBuf;
async
Feature Flags
| Feature | Description | Default |
|---|---|---|
postgres-backend |
Postgres storage with connection pooling | ✅ |
d1-backend |
Cloudflare D1 backend for edge deployment | ❌ |
parallel |
Rayon-based parallelism (CLI only) | ✅ |
caching |
Query result caching with Moka | ❌ |
recoco-minimal |
Local file source for CocoIndex | ✅ |
recoco-postgres |
PostgreSQL target for CocoIndex | ✅ |
worker |
Edge deployment optimizations | ❌ |
Feature Combinations
CLI Deployment (recommended):
= { = "0.1", = ["postgres-backend", "parallel"] }
Edge Deployment (Cloudflare Workers):
= { = "0.1", = ["d1-backend", "worker"] }
Testing:
[]
= "0.1" # InMemory backend always available
Deployment Modes
CLI Deployment
Uses Postgres for persistent storage with Rayon for CPU-bound parallelism:
use ;
let backend = create_backend.await?;
// Configure for CLI
// - Rayon parallel processing enabled via `parallel` feature
// - Connection pooling via deadpool-postgres
// - Batch operations for improved throughput
Performance targets:
- Storage latency: <10ms p95
- Cache hit rate: >90%
- Parallel speedup: 3-4x on quad-core
Edge Deployment
Uses Cloudflare D1 for distributed storage with tokio async I/O:
use ;
let backend = create_backend.await?;
// Configure for Edge
// - HTTP API client for D1 REST API
// - Async-first with tokio runtime
// - No filesystem access (worker feature)
Performance targets:
- Storage latency: <50ms p95
- Cache hit rate: >90%
- Horizontal scaling across edge locations
API Documentation
Comprehensive API docs and integration guides:
- Incremental System: See incremental module docs
- D1 Integration: See
docs/api/D1_INTEGRATION_API.md - CocoIndex Bridge: See bridge module docs
- Language Extractors: See extractors module docs
Examples
Run examples with:
# Observability instrumentation
# D1 local testing (requires D1 emulator)
# D1 integration testing (requires D1 credentials)
Testing
# Run all tests
# Run incremental system tests
# Run backend-specific tests
# Run performance regression tests
Benchmarking
# Fingerprint performance
# D1 profiling (requires credentials)
# Load testing
Performance Characteristics
Incremental Updates
- Fingerprint computation: <5µs per file (Blake3)
- Dependency extraction: 1-10ms per file (language-dependent)
- Graph traversal: O(V+E) for BFS invalidation
- Cache hit rate: >90% typical, >95% ideal
Storage Backends
| Backend | Read Latency (p95) | Write Latency (p95) | Throughput |
|---|---|---|---|
| InMemory | <1ms | <1ms | 10K+ ops/sec |
| Postgres | <10ms | <15ms | 1K+ ops/sec |
| D1 | <50ms | <100ms | 100+ ops/sec |
Language Extractors
| Language | Parse Time (p95) | Complexity |
|---|---|---|
| Rust | 2-5ms | High (macros, visibility) |
| TypeScript | 1-3ms | Medium (ESM + CJS) |
| Python | 1-2ms | Low (simple imports) |
| Go | 1-3ms | Medium (module resolution) |
Contributing
Development Setup
# Install development tools
# Run tests
# Run linting
# Format code
Architecture Principles
- Service-Library Dual Architecture: Features consider both library API design AND service deployment
- Test-First Development: Tests → Approve → Fail → Implement (mandatory)
- Constitutional Compliance: All changes must adhere to Thread Constitution v2.0.0
See CLAUDE.md for complete development guidelines.
License
AGPL-3.0-or-later
Related Crates
thread-ast-engine: Core AST parsing and pattern matchingthread-language: Language definitions and tree-sitter parsersthread-services: High-level service interfacesrecoco: CocoIndex dataflow engine
Status: Production-ready (Phase 5 complete) Maintainer: Knitli Inc. knitli@knit.li Contributors: Claude Sonnet 4.5 noreply@anthropic.com