pgcrab 0.2.0

Linting and documentation generation tool for Postgres database schemas
Documentation
<!--
SPDX-FileCopyrightText: 2025 Olivier 'reivilibre'

SPDX-License-Identifier: GPL-3.0-or-later
-->

# pgCrab

**Note: this is beta-grade software, which should be usable but has plenty of room to improve.**

A simple but helpful linter and documentation tool for Postgres database schemas.

Helps cure crabbiness around schema exploration and some avoidable performance production issues.
Don't get woken up in the middle of the night for something you could have avoided with a bit of static analysis.

## What can pgCrab do?

### Generate documentation for your schema

pgCrab can connect to a database loaded with your schema and generate documentation for it.

Comments for each table and column will be collected from the database's built-in description
fields (set using `COMMENT ON`), but you can also harvest missing comments
from SQL comments (`--`) in your schema files.

You can provide a custom template, but by default Markdown will be emitted, suitable for embedding
in mdBook or other Markdown-based tools.

The documentation page includes all tables, with their:
  - comments;
  - columns (with datatypes and comments);
  - indices;
  - primary keys;
  - foreign keys; and
  - check constraints.

Foreign key relationships are documented in both directions, so it's easy to navigate via backreferences.

```
# uses libpq-compatible environment variables
pgcrab doc gen [--template <jinja template file>] [--harvest <path to SQL directory or files>]

# Postgres connection URI
pgcrab [-C postgres://user:pass@host/dbname] doc gen [--template <jinja template file>] [--harvest <path to SQL directory or files>]
```

A basic Markdown template is included in the repository: `basic_md_template.md.j2`.

### List uncommented items

Follows the same command structure as `pgcrab doc gen` but instead provides a list of uncommented items
in your database.

Exit code 1 when there is at least one uncommented item.

```
# uses libpq-compatible environment variables
pgcrab doc uncommented [--from-sql <path to SQL directory or files>]

# Postgres connection URI
pgcrab [-C postgres://user:pass@host/dbname] doc uncommented [--from-sql <path to SQL directory or files>] [postgres://user:pass@host/dbname]
```

### Lint your database schema for problems and suspicious things

You will need to set up a local Postgres instance with your schema loaded for linting.

```
# uses libpq-compatible environment variables
pgcrab lint-schema [--add-concessions]

# Postgres connection URI
pgcrab [-C postgres://user:pass@host/dbname] lint-schema [--add-concessions]
```

Many thanks to [pg-index-health-sql](https://github.com/mfvanek/pg-index-health-sql/), from which most of the lint rules were taken!

<details>
<summary>

**List of schema lint rules (click to expand):**

</summary>

| **Lint ID** | **Description** | **References** |
|:------------|-----------------|----------------|
| `dont_use_timestamp_without_time_zone` | Don't use `timestamp (without time zone)` - use `timestamptz` instead | [Postgres Wiki]https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_timestamp_.28without_time_zone.29 |
| `dont_use_money` | Don't use the `money` type; there are usually more appropriate types. | [Postgres Wiki]https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_money |
| `dont_use_serial` | Don't use `serial` types - prefer `GENERATED BY DEFAULT AS IDENTITY`. | [Postgres Wiki]https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_serial |
| `dont_use_varchar_n_by_default` | Don't use `VARCHAR(n)` by default (no benefits in Postgres) | [Postgres Wiki]https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_varchar.28n.29_by_default |
| `column_requires_quotation` | Column name always requires quotation due to naming convention | [Postgres Wiki]https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_upper_case_table_or_column_names |
| `object_requires_quotation` | Object name always requires quotation due to naming convention | [Postgres Wiki]https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_upper_case_table_or_column_names |
| `possible_object_name_truncation` | Object name may have been truncated due to length limits | [Postgres Docs]https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-MAX-IDENTIFIER-LENGTH |
| `duplicate_indexes` | Duplicate indexes: waste of space and disk bandwidth | . |
| `duplicate_foreign_keys` | Duplicate foreign keys: identical constraints | . |
| `foreign_key_without_index` | Foreign key without index: can cause poor delete performance | . |
| `foreign_key_with_unmatched_column_type` | Foreign key with mismatched column types: causes type coercions | . |
| `index_with_redundant_where_clause` | Index with redundant `WHERE IS NOT NULL` clause on `NOT NULL` column | . |
| `overlapping_indexes` | Overlapping indexes: one index is a prefix of another. | . |
| `overlapping_foreign_keys` | Overlapping foreign keys: one constraint is made redundant by another. | . |
| `table_without_primary_key` | Table without primary key: should almost always have one. | . |
| `btree_index_on_array_column` | B-tree index on array column: consider GIN index instead. | . |
| `index_with_boolean` | Non-unique index on just a boolean column: low selectivity. | . |
| `table_without_replica_identity` | Table without replica identity: affects logical replication. | [Postgres Docs]https://www.postgresql.org/docs/17/logical-replication-publication.html |

</details>

Have a lot of triggered lint rules from your existing schema and won't change them?
Use `--add-concessions` to add concession rules to the pgCrab config file.

### Convert SQL comments (`--`) to real schema `COMMENT`s (descriptions)

It's quite painful to use the `COMMENT ON` feature in Postgres, because you
have to write a statement for every item you are describing.
You also can't do it in the same statement as creating every item.

To help with this, you can have pgCrab harvest comments from your SQL migrations
and convert them to `COMMENT ON` statements.

This is not as portable as using `COMMENT ON` but it might help some projects
that currently don't use `COMMENT ON` anyway because it is verbose and not
interoperable with SQLite.

```
pgcrab doc convert path/to/schema/migration/directory/ > comment.sql
```

If you want to skip lines that would apply a COMMENT that already exists, then
use the `--compare` flag and ensure to either pass a database connection via `-C`
or by libpq environment variables.

```
pgcrab [-C postgres://user:pass@host/dbname] doc convert path/to/schema/migration/directory/ --compare > comments.sql
```

# Installation



## Nix

### via Nix flake

There is a Nix flake in this repository, with a `pgcrab` package in it.

`nix build` will produce a build of pgCrab at `./result/bin/pgcrab`.

TODO How to add to a NixOS system.

## Docker/Podman/Containers

### from container registry

- `docker.io/pgcrab/pgcrab:latest`
- `ghcr.io/pgcrab/pgcrab:latest`

Versioned images will follow soon.

### self-build

You can build with:
```
<podman|docker> build -t localhost/pgcrab .
```

Run using:
```
<podman|docker> run localhost/pgcrab --help
```

## Use in GitHub Actions

TODO, but for now I suggest just using the container image from GHCR directly.

## Build Locally from source

With `cargo` installed, you can choose one of the following:

- `cargo install --path .` to install to your user;
- or `cargo build --release` to just get a release build in your target directory.

You can use `cargo run -- <args>` to run from the source checkout.


# Configuration

Normally you should keep your configuration file in the root of your project repository, under `.config/pgcrab.toml`.
However, pgCrab will look for a configuration file by ascending directories, looking for `.config/pgcrab.toml` at any level
until hitting either a repository root, filesystem root or your user's home directory.

```
[schema.concessions]
# You can ignore diagnostics produced by a rule on some tables, indexes or foreign keys.
# The `--add-concessions` flag will automatically produce these rules for you, if desired.
rule_id = [
  "my_table",
  # globs are supported
  "legacy_table_*"
]
rule_id2 = [
  # to specify an index, use `.`
  "my_table.my_index"
  # to specify all indexes on a table, use `.*`
  "my_table.*",
  # to specify all indexes whatsoever, use `**`
  "**"
]
# you can also disable all rules for a given object:
"*" = [
  "my_table"
]
```

# Licence

pgCrab is available under the GPL 3.0 or later; see attached licence source.

Some of the SQL lint rules are available under Apache 2.0.


# Contributions

TODO

# Thanks

- [pg-index-health-sql]https://github.com/mfvanek/pg-index-health-sql/, from which many lint rules were takens