Uncomment: Tree-sitter Based Comment Removal Tool
A fast, accurate comment removal tool that uses tree-sitter for parsing, ensuring 100% accuracy in comment identification across multiple programming languages.
Features
- 100% Accurate: Uses tree-sitter AST parsing to correctly identify comments
- No False Positives: Never removes comment-like content from strings
- Smart Preservation: Keeps important metadata, TODOs, FIXMEs, and language-specific patterns
- Parallel Processing: Multi-threaded processing for improved performance
- Extensible: Easy to add new languages through configuration
- Fast: Leverages tree-sitter's optimized parsing
- Safe: Dry-run mode to preview changes
- Built-in Benchmarking: Performance analysis and profiling tools
Supported Languages
- Python (.py, .pyw, .pyi, .pyx, .pxd)
- JavaScript (.js, .jsx, .mjs, .cjs)
- TypeScript (.ts, .tsx, .mts, .cts, .d.ts, .d.mts, .d.cts)
- Rust (.rs)
- Go (.go)
- Java (.java)
- C (.c, .h)
- C++ (.cpp, .cc, .cxx, .hpp, .hxx)
- Ruby (.rb, .rake, .gemspec)
- YAML (.yml, .yaml)
- HCL/Terraform (.hcl, .tf, .tfvars)
- Makefile (Makefile, .mk)
- Zig (.zig)
Installation
Via Package Managers
Cargo (Rust)
npm (Node.js)
pip (Python)
From source
Requirements
- For building from source: Rust 1.70+
- For npm/pip packages: Pre-built binaries are downloaded automatically
Usage
# Remove comments from a single file
# Preview changes without modifying files
# Process multiple files
# Remove documentation comments/docstrings
# Remove TODO and FIXME comments
# Add custom patterns to preserve
# Process entire directory recursively
# Use parallel processing with 8 threads
# Benchmark performance on a large codebase
# Profile performance with detailed analysis
Default Preservation Rules
Always Preserved
- Comments containing
~keep - TODO comments (unless
--remove-todo) - FIXME comments (unless
--remove-fixme) - Documentation comments (unless
--remove-doc)
Language-Specific Preservation
Python:
- Type hints:
# type:,# mypy: - Linting:
# noqa,# pylint:,# flake8:,# ruff: - Formatting:
# fmt:,# isort: - Other:
# pragma:,# NOTE:
JavaScript/TypeScript:
- Type checking:
@flow,@ts-ignore,@ts-nocheck - Linting:
eslint-disable,eslint-enable,biome-ignore - Formatting:
prettier-ignore - Coverage:
v8 ignore,c8 ignore,istanbul ignore - Other:
@jsx,@license,@preserve
Rust:
- Attributes and directives (preserved in comment form)
- Doc comments
///and//!(unless--remove-doc) - Clippy directives:
clippy::
Haskell:
- Comments:
-- - Haddock:
-- |,{-^ ... -},{-| ... -}(unless--remove-doc)
YAML/HCL/Makefile:
- Standard comment removal while preserving file structure
- Supports both
#and//style comments in HCL/Terraform
How It Works
Unlike regex-based tools, uncomment uses tree-sitter to build a proper Abstract Syntax Tree (AST) of your code. This means it understands the difference between:
- Real comments vs comment-like content in strings
- Documentation comments vs regular comments
- Inline comments vs standalone comments
- Language-specific metadata that should be preserved
Architecture
The tool is built with a generic, extensible architecture:
- Language Registry: Dynamically loads language configurations
- AST Visitor: Traverses the tree-sitter AST to find comments
- Preservation Engine: Applies rules to determine what to keep
- Output Generator: Produces clean code with comments removed
Adding New Languages
Languages are configured through the registry. To add a new language:
- Add the tree-sitter parser dependency
- Register the language in
src/languages/registry.rs - Define comment node types and preservation patterns
- That's it! No other code changes needed
Git Hooks
Pre-commit
Add to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/Goldziher/uncomment
rev: v2.2.0
hooks:
- id: uncomment
Lefthook
Add to your lefthook.yml:
pre-commit:
commands:
uncomment:
run: uncomment {staged_files}
stage_fixed: true
For both hooks, install uncomment via pip:
Performance
While slightly slower than regex-based approaches due to parsing overhead, the tool is very fast and scales well with parallel processing:
Single-threaded Performance
- Small files (<1000 lines): ~20-30ms
- Large files (>10000 lines): ~100-200ms
Parallel Processing Benchmarks
Performance scales excellently with multiple threads:
| Thread Count | Files/Second | Speedup |
|---|---|---|
| 1 thread | 1,500 | 1.0x |
| 4 threads | 3,900 | 2.6x |
| 8 threads | 5,100 | 3.4x |
Benchmarks run on a large enterprise codebase with 5,000 mixed language files
Built-in Benchmarking
Use the built-in tools to measure performance on your specific codebase:
# Basic benchmark
# Detailed benchmark with multiple iterations
# Memory and performance profiling
The accuracy gained through AST parsing is worth the small performance cost, and parallel processing makes it suitable for even the largest codebases.
License
MIT