mongo_sync 0.1.0

instant coding answers via the command line(just like howdoi)
Documentation
# mongo_sync
Mongodb realtime synchronizer, which is similar to [py-mongo-sync](https://github.com/caosiyang/py-mongo-sync)

# Features
- Support database level and collection full and concurrent synchronize.  You can use `--collection-concurrent` to define how many threads to sync a database, use `--doc-concurrent` to define how many threads to sync a collection.
- Support oplog based synchronize, so we can synchronize incremental data in realtime.
- Support daily rotation log, you can use it through `--log-path` option.  Or else log information will be output to stdout.

# Support
Mongodb 3.6+ (because of official mongodb driver only support mongodb 3.6+)

# Intall
The recommended way to install mongo_sync is using `cargo`:
```shell
cargo +nightly install mongo_sync
```

You can download [released binary](https://github.com/WindSoilder/mongo_sync/releases/) as well.

# Running tests?
To running integration tests, you need to config `SYNCER_TEST_SOURCE` to a testing mongodb uri, or `mongodb://localhost:27017` will be used.

# Example
To run synchronizer, you need to start oplog_syncer to make a realtime mongodb oplog sync first.

Then you can run `db_sync` to sync database in realtime.

## oplog_syncer
```shell
./target/release/oplog_syncer --src-uri "mongodb://localhost:27017" --oplog-storage-uri "mongodb://localhost:27018/"
```

## db_sync
```shell
db_sync --src-uri "mongodb://localhost:27017/?authSource=admin" --oplog-storage-uri  "mongodb://localhost:27018/?authSource=admin" --target-uri "mongodb://localhost:27019" --db test_db
```

Note that the `--oplog-storage-uri` in oplog_syncer and db_sync must be the same.

# Usage help
## oplog_syncer
```shell
USAGE:
    oplog_syncer [OPTIONS] --src-uri <src-uri> --oplog-storage-uri <oplog-storage-uri>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --log-path <log-path>
            log file path, if not specified, all log information will be output to stdout

    -o, --oplog-storage-uri <oplog-storage-uri>    target oplog storage uri
    -s, --src-uri <src-uri>                        source database uri, must be a mongodb cluster
```

## db_sync
```shell
USAGE:
    db_sync [OPTIONS] --src-uri <src-uri> --target-uri <target-uri> --oplog-storage-uri <oplog-storage-uri> --db <db>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --collection-concurrent <collection-concurrent>    how many threads to sync a database
    -c, --colls <colls>...
            collections to sync, default sync all collections inside a database

    -d, --db <db>                                          database to sync
        --doc-concurrent <doc-concurrent>                  how many threads to sync a collection
        --log-path <log-path>
            log file path, if no specified, all log information will be output to stdout

    -o, --oplog-storage-uri <oplog-storage-uri>
            mongodb uri which save oplogs, it's saved by `oplog_syncer` binary

    -s, --src-uri <src-uri>                                source mongodb uri
    -t, --target-uri <target-uri>                          target mongodb uri
```

# The basic arthitecture diagram
```
      ┌───────────────┐
      │ target db     │
      └───┬───────────┘
     xxxx ▼  xxxxxxx
    xx  ┌───────┐  x
   xx   │db_sync│
  x     └───────┘  x
  xxxxxx  ▲  ▲     x
       xx │  │ xxxx
          │  │
          │  │
Full dump │  │Incr dump
          │  │(Real time)
          │  │
          │  │
          │  │
          │  │         ┌─────────────────────┐
          │  └─────────┤oplog storage db     │
          │            └──────▲──────────────┘
          │          xxxxxxx  │   xxxxx
          │          x ┌──────┴──────┐x
          │          x │Oplog syncer │x   Sync oplog from source cluster
          │          x └──────▲──────┘x   to oplog storage in real time
          │          xxxxxxxxx│ xxxxxxx
          │                   │
          │                   │
          │                   │
          │            ┌──────┴───────┐
          └────────────┤Source cluster│
                       └──────────────┘
```

According to the diagram, you can find that there are 2 basic programs provided by mongo_sync
1. oplog syncer: sync mongodb cluster's oplog to target `oplog storage db`.
2. db sync: sync data from `source cluster` to `target db`.

# Benchmark
It's not strictly benchmark test, I just test it manually.

Scenario:

When source cluster insert 50,000 records, how long the `target_db` can synchronizer these new 50,000 insert.

My testing result:

`db_sync` takes about 50 seconds to sync these update, and `py-mongo-sync` takes about 225 seconds to sync these update.  In general, it's about 3.5x faster than `py-mongo-sync`.

And, please note that `50 seconds` is not accrutely, it highly depends on your database and running machine performance.

# How does the core work?
- oplog_syncer just use [tailable cursor]https://docs.mongodb.com/manual/core/tailable-cursors/ to read mongodb oplogs in realtime.
- db_sync full dump just use multithread with [find]https://docs.mongodb.com/manual/reference/method/db.collection.find/ to read data.
- db_sync incr dump use [oplog]https://docs.mongodb.com/manual/core/replica-set-oplog/ to sync incremental update (CRUD, database command.), and this is why source database must be a cluster.

# Notes
1. In incremental state, for now it only support the following command to be sync (which enough for my personal use.):
- rename collection
- dorp collection
- create collection
- drop indexes
- create indexes
2. I haven't test mongo sharding as target, but it should be ok to work.
3. during running `oplog_syncer`, `oplog storage db` will create and using databse named `source_oplog`, and create and using collection named `source_oplog`.  For now this is hardcoded.
4. during running `db_sync`, target databse will create a new collection named `oplog_records`, it saves the latest oplog timestamp applied to the database.