# big-code-analysis-cli
`bca` analyzes source code and emits per-file structured metrics,
aggregated reports, AST dumps, node lookups, and more.
> **Migrating from the flag-style CLI?** The CLI is now subcommand-driven.
> See the [migration guide](../big-code-analysis-book/src/migration.md)
> for old-form -> new-form mappings of every flag.
## Installation
```sh
cd big-code-analysis-cli/
cargo build
```
## Usage
```sh
bca [GLOBAL OPTIONS] <COMMAND> [COMMAND OPTIONS]
```
The global options describe *what to walk* (paths, includes/excludes,
parallelism, language overrides). The command picks *what to do* with each
file, with command-specific options as needed.
## Commands
| `metrics` | Per-file metric output (`-O json/yaml/toml/cbor`, `-o DIR`). |
| `ops` | Per-file operand/operator output (same formats as `metrics`). |
| `report <FORMAT>` | Aggregated report. `markdown` today; `html` reserved. |
| `dump` | AST dump to stdout. |
| `find <NODE>...` | Find nodes of one or more types. |
| `count <NODE>...` | Count nodes of one or more types. |
| `functions` | List functions/methods and their spans. |
| `strip-comments` | Remove comments from source files (`--in-place`). |
| `preproc` | Build preprocessor-data JSON for C/C++ analysis. |
| `list-metrics [names\|descriptions]` | List computable metrics. |
Run `bca <COMMAND> --help` for command-specific options.
## Global options
- `-p, --paths <FILE>...` — input files or directories.
- `-I, --include [<GLOB>...]` — include files matching pattern.
- `-X, --exclude [<GLOB>...]` — exclude files matching pattern.
- `-j, --num-jobs <N>` — worker threads.
- `-l, --language-type <LANG>` — force a language instead of inferring.
- `--ls <LINE_START>` / `--le <LINE_END>` — line range (used by `dump`,
`find`).
- `-w, --warning` — print warnings (skipped files, unrecognized
languages).
- `--no-skip-generated` — disable auto-skip of files marked as generated
(see [Skipping generated code](#skipping-generated-code)).
- `--report-skipped` — log a `skipped (generated): <path>` line to stderr
for every file the generated-code detector excludes.
- `--preproc-data <FILE>` — consume an existing preproc JSON during C/C++
analysis. Build one with `bca preproc`.
Global options work both before and after the subcommand.
## Building with a subset of languages
The shipped `bca` binary compiles every supported tree-sitter grammar
in. The `big-code-analysis-cli` crate pins the library's
`all-languages` feature set explicitly, so passing
`--no-default-features` or a custom `--features` list to
`cargo build -p big-code-analysis-cli` does **not** drop grammars
from the resulting binary — feature selection on the CLI crate is
not honoured (see [#252][issue-252] for the rationale: dropping a
grammar silently from a user-facing binary would surface as
"language X stopped working" rather than a build error).
Consumers who need a reduced feature set should embed the
`big-code-analysis` library in their own Rust code and control
feature selection in their own `Cargo.toml`. See the library's
[per-language Cargo features][cargo-features] chapter for the full
list of features and a worked example.
[cargo-features]: https://dekobon.github.io/big-code-analysis/library/cargo-features.html
[issue-252]: https://github.com/dekobon/big-code-analysis/issues/252
## Examples
Per-file JSON metrics:
```sh
bca --paths ./src metrics -O json -o ./out/
```
Aggregated markdown quality report:
```sh
bca --paths "$PWD" --num-jobs $(nproc) \
report markdown --top 20 --strip-prefix "$PWD/"
```
AST dump for one file:
```sh
bca --paths ./file.rs dump
```
List all metrics with one-line descriptions:
```sh
bca list-metrics descriptions
```
## Skipping generated code
Generated bindings (protobuf stubs, OpenAPI clients, lex/yacc output,
build-system plumbing) inflate metrics for code no human will refactor.
By default, `bca` scans the first ~50 lines / 5 KiB of each file for a
generated-code marker and skips matches before parsing.
Recognized markers (case-insensitive):
- `@generated` — Facebook / Meta convention; also emitted by buck2,
rustfmt, prettier, and many code generators.
- `DO NOT EDIT` — Go's `// Code generated by … DO NOT EDIT.` is the
canonical form; the bare phrase is also widely copied (Bazel, protoc,
OpenAPI clients).
- `GENERATED CODE` — Lizard's marker, recognized for compatibility.
A marker phrase that appears only deep in the file body (past the scan
window) does not trigger the skip.
To restore the previous behavior and analyze everything, pass
`--no-skip-generated`. To audit which files were excluded, pass
`--report-skipped`; the CLI logs `skipped (generated): <path>` to stderr
for each file.