clockpipe 0.4.0

Clickhouse Data Synchronization Pipeline
clockpipe-0.4.0 is not a library.

clockpipe

GitHub license

  • An alternative to clickpipe for on-premise clickhouse users.
  • Based on CDC, data from the original source is written to clickhouse.

Supported Source

  • PostgreSQL

Install

Build from source code

git clone https://github.com/myyrakle/clockpipe
cd clockpipe
cargo install --path .

Using Cargo

cargo install clockpipe

Using Docker

sudo docker run -v $(pwd)/clockpipe-config.json:/app/config.json --network host myyrakle/clockpipe:v0.3.0

PostgreSQL Setup

  • Synchronization is implemented through PostgreSQL Publication.
  • modify postgresql.conf and restart postgresql server.
postgres=# SHOW config_file;
                   config_file
-------------------------------------------------
 /opt/homebrew/var/postgresql@14/postgresql.conf

enable logical replica

sudo vim /opt/homebrew/var/postgresql@14/postgresql.conf
wal_level=logical

max_slot_wal_keep_size=-1
max_wal_size=10240
sudo systemctl restart postgresql

How to Run

  • Prepare config file (example)
  • Enter the information about the PostgreSQL table you want to synchronize.
    "tables": [
        {
            "schema_name": "public",
            "table_name": "foo"
        },
        {
            "schema_name": "public",
            "table_name": "nc_usr_account"
        }
    ]
  • Then, Run it
clockpipe run --config-file ./clockpipe-config.json
  • Pipe automatically creates and synchronizes tables in Clickhouse by querying table information.

  • If you don't want the initial synchronization, use the skip_copy option. (CDC-based synchronization still works.)

    "tables": [
        {
            "schema_name": "public",
            "table_name": "user_table",
            "skip_copy": true
        }
    ]
  • You can also adjust the log level. You can set values such as error, warn, info, and debug to the "RUST_LOG" environment variable.
RUST_LOG=debug clockpipe run --config-file ./clockpipe-config.json
  • Columns added from the source will also be automatically synchronized after the initial table link. (if restarted)