Uncomment: Tree-sitter Based Comment Removal Tool
A fast, accurate, and extensible comment removal tool that uses tree-sitter for parsing, ensuring 100% accuracy in comment identification. Originally created to clean up AI-generated code with excessive comments, it now supports any language with a tree-sitter grammar through its flexible configuration system.
Support This Project
If you find uncomment helpful, please consider sponsoring the development:
Your support helps maintain and improve this tool for the community! 🚀
Features
- 100% Accurate: Uses tree-sitter AST parsing to correctly identify comments
- No False Positives: Never removes comment-like content from strings
- Smart Preservation: Keeps important metadata, TODOs, FIXMEs, and language-specific patterns
- Parallel Processing: Multi-threaded processing for improved performance
- Extensible: Support any language with tree-sitter grammar through configuration
- Dynamic Grammar Loading: Load grammars from Git, local paths, or pre-compiled libraries
- Configuration System: TOML-based configuration for project-specific settings
- Smart Init Command: Automatically generate configuration based on your project
- Fast: Leverages tree-sitter's optimized parsing
- Safe: Dry-run mode to preview changes
- Built-in Benchmarking: Performance analysis and profiling tools
Supported Languages
Built-in Languages
- Python (.py, .pyw, .pyi, .pyx, .pxd)
- JavaScript (.js, .jsx, .mjs, .cjs)
- TypeScript (.ts, .tsx, .mts, .cts, .d.ts, .d.mts, .d.cts)
- Rust (.rs)
- Go (.go)
- Java (.java)
- C (.c, .h)
- C++ (.cpp, .cc, .cxx, .hpp, .hxx)
- Ruby (.rb, .rake, .gemspec)
- YAML (.yml, .yaml)
- HCL/Terraform (.hcl, .tf, .tfvars)
- Makefile (Makefile, .mk)
- Shell/Bash (.sh, .bash, .zsh, .bashrc, .zshrc)
- Haskell (.hs, .lhs)
- JSON with Comments (.jsonc)
Extensible to Any Language
Through the configuration system, you can add support for any language with a tree-sitter grammar, including:
- Vue, Svelte, Astro (Web frameworks)
- Swift, Kotlin, Dart (Mobile development)
- Zig, Nim (Systems programming)
- Elixir, Clojure, Julia (Functional/Scientific)
- And many more...
Installation
Via Package Managers
Homebrew (macOS/Linux)
Cargo (Rust)
npm (Node.js)
pip (Python)
From source
Requirements
- For building from source: Rust 1.70+
- For npm/pip packages: Pre-built binaries are downloaded automatically
Quick Start
Run Without Installing
Add --dry-run to either command to preview changes before writing.
Install Locally
# Generate a configuration file for your project
# Remove comments from files
# Preview changes without modifying files
Usage
Configuration
# Generate a smart configuration based on your project
# Generate a comprehensive configuration with all supported languages
# Interactive configuration setup
# Use a custom configuration file
Init Command Examples
The init command intelligently detects languages in your project:
# Smart detection - analyzes your project and includes only detected languages
)
)
# Comprehensive mode - includes configurations for 25+ languages
# Specify output location
# Force overwrite existing configuration
Basic Usage
# Remove comments from a single file
# Preview changes without modifying files
# Process multiple files
# Remove documentation comments/docstrings
# Remove TODO and FIXME comments
# Add custom patterns to preserve
# Process entire directory recursively
# Use parallel processing with 8 threads
Optional Benchmarking Tools
The crate ships development binaries for benchmarking and profiling, but they are gated behind the bench-tools feature so they are not installed for regular users.
- Install from crates.io with the extras:
- Run locally without installing:
Contributing
See CONTRIBUTING.md for local development, automation hooks, and release procedures.
Default Preservation Rules
Always Preserved
- Comments containing
~keep - TODO comments (unless
--remove-todo) - FIXME comments (unless
--remove-fixme) - Documentation comments (unless
--remove-doc)
Linting Tool Directives (Always Preserved)
The tool preserves all linting and formatting directives to ensure your CI/CD pipelines and development workflows remain intact:
Go:
//nolint,//nolint:gosec,//golangci-lint,//staticcheck,//go:generate
Python:
# noqa,# type: ignore,# mypy:,# pyright:,# ruff:,# pylint:,# flake8:# fmt: off/on,# black:,# isort:,# bandit:,# pyre-ignore
JavaScript/TypeScript:
- ESLint:
eslint-disable,eslint-enable,eslint-disable-next-line - TypeScript:
@ts-ignore,@ts-expect-error,@ts-nocheck,@ts-check - Triple-slash:
/// <reference,/// <amd-module,/// <amd-dependency - Formatters:
prettier-ignore,biome-ignore,deno-lint-ignore - Coverage:
v8 ignore,c8 ignore,istanbul ignore
Rust:
- Attributes:
#[allow],#[deny],#[warn],#[forbid],#[cfg] - Clippy:
clippy::,#[rustfmt::skip]
Java:
@SuppressWarnings,@SuppressFBWarnings,//noinspection,// checkstyle:
C/C++:
// NOLINT,// NOLINTNEXTLINE,#pragma,// clang-format off/on
Shell/Bash:
# shellcheck disable,# hadolint ignore
YAML:
# yamllint disable/enable
HCL/Terraform:
# tfsec:ignore,# checkov:skip,# trivy:ignore,# tflint-ignore
Ruby:
# rubocop:disable/enable,# reek:,# standard:disable/enable
Configuration
Uncomment uses a flexible TOML-based configuration system that allows you to customize behavior for your project.
Configuration File Discovery
Uncomment searches for configuration files in the following order:
- Command-line specified config:
--config path/to/config.toml .uncommentrc.tomlin the current directory.uncommentrc.tomlin parent directories (up to git root or filesystem root)~/.config/uncomment/config.toml(global configuration)- Built-in defaults
Basic Configuration Example
[]
= false
= false
= false
= ["IMPORTANT", "NOTE", "WARNING"]
= true
= true
[]
= ["py", "pyw", "pyi"]
= ["noqa", "type:", "pragma:", "pylint:"]
[]
# Keep all comments in test files
= false
= false
= false
Dynamic Grammar Loading
You can extend support to any language with a tree-sitter grammar:
# Add Swift support via Git
[]
= "Swift"
= ["swift"]
= ["comment", "multiline_comment"]
= ["MARK:", "TODO:", "FIXME:", "swiftlint:"]
[]
= { = "git", = "https://github.com/alex-pinkus/tree-sitter-swift", = "main" }
# Use a local grammar
[]
= "Custom Language"
= ["custom"]
= ["comment"]
[]
= { = "local", = "/path/to/tree-sitter-custom" }
# Use a pre-compiled library
[]
= "Proprietary Language"
= ["prop"]
= ["comment"]
[]
= { = "library", = "/usr/local/lib/libtree-sitter-proprietary.so" }
Configuration Merging
When multiple configuration files are found, they are merged with the following precedence (highest to lowest):
- Command-line flags
- Local
.uncommentrc.tomlfiles (closer to the file being processed wins) - Global configuration (
~/.config/uncomment/config.toml) - Built-in defaults
Pattern-specific configurations override language configurations for matching files.
How It Works
Unlike regex-based tools, uncomment uses tree-sitter to build a proper Abstract Syntax Tree (AST) of your code. This means it understands the difference between:
- Real comments vs comment-like content in strings
- Documentation comments vs regular comments
- Inline comments vs standalone comments
- Language-specific metadata that should be preserved
Architecture
The tool is built with a modular, extensible architecture:
- Language Registry: Manages both built-in and dynamically loaded languages
- Grammar Manager: Handles loading grammars from Git, local paths, or compiled libraries
- Configuration System: TOML-based hierarchical configuration with merging
- AST Visitor: Traverses the tree-sitter AST to find comments
- Preservation Engine: Applies rules to determine what to keep
- Output Generator: Produces clean code with comments removed
Key Components
- Dynamic Grammar Loading: Automatically downloads and compiles tree-sitter grammars
- Grammar Caching: Caches compiled grammars for performance
- Configuration Discovery: Searches for configs in project hierarchy
- Pattern Matching: File-pattern-specific configuration overrides
Adding New Languages
With the new configuration system, you can add languages without modifying code:
Method 1: Using Configuration (Recommended)
Add to your .uncommentrc.toml:
[]
= "My Language"
= ["ml", "mli"]
= ["comment"]
= ["TODO", "FIXME"]
[]
= { = "git", = "https://github.com/tree-sitter/tree-sitter-mylang", = "main" }
Method 2: Built-in Support
For frequently used languages:
- Add the tree-sitter parser dependency to
Cargo.toml - Register the language in
src/grammar/mod.rs - Add language configuration in
src/languages/registry.rs
Git Hooks
Pre-commit
Add to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/Goldziher/uncomment
rev: v2.5.0
hooks:
- id: uncomment
Lefthook
Add to your lefthook.yml:
pre-commit:
commands:
uncomment:
run: uncomment {staged_files}
stage_fixed: true
For both hooks, install uncomment via pip:
Performance
While slightly slower than regex-based approaches due to parsing overhead, the tool is very fast and scales well with parallel processing:
Single-threaded Performance
- Small files (<1000 lines): ~20-30ms
- Large files (>10000 lines): ~100-200ms
Parallel Processing Benchmarks
Performance scales excellently with multiple threads:
| Thread Count | Files/Second | Speedup |
|---|---|---|
| 1 thread | 1,500 | 1.0x |
| 4 threads | 3,900 | 2.6x |
| 8 threads | 5,100 | 3.4x |
Benchmarks run on a large enterprise codebase with 5,000 mixed language files
Built-in Benchmarking
Use the built-in tools to measure performance on your specific codebase:
# Basic benchmark
# Detailed benchmark with multiple iterations
# Memory and performance profiling
The accuracy gained through AST parsing is worth the small performance cost, and parallel processing makes it suitable for even the largest codebases.
Development
Project Structure
uncomment/
├── src/ # Source code
├── tests/ # Integration tests
├── fixtures/ # Test fixtures
│ ├── languages/ # Language-specific test files
│ └── repos/ # Repository test configurations
├── bench/ # CLI benchmarking tool
├── benchmarks/ # Rust micro-benchmarks
├── test-repos/ # Manual testing scripts
└── scripts/ # Build and release scripts
Benchmarking
The project includes two types of benchmarking tools:
bench/: Custom CLI benchmarking tool for testing real-world performance on large codebases. Use viacargo run --release --bin benchmark.benchmarks/: Standard Rust micro-benchmarks for testing specific functions and components. Run withcargo bench.
Testing
# Run all tests
# Run tests with output
# Run integration tests (including network-dependent)
License
MIT