Module tabin_plugins::scripts [−] [src]

Documentation about the various scripts contained herein

check-graphite
check-cpu
check-container-cpu
check-load
check-ram
check-container-ram
check-procs
check-fs-writeable
check-disk

check-graphite

Cross platform, only requires access to a graphite instance.

$ check-graphite --help
check-graphite (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Query graphite and exit based on predicates

USAGE:
    check-graphite [FLAGS] [OPTIONS] <URL> <PATH> <ASSERTION>...

FLAGS:
    -h, --help                 Prints help information
        --print-url            Unconditionally print the graphite url queried
    -V, --version              Prints version information
        --verify-assertions    Just check assertion syntax, do not query urls

OPTIONS:
        --graphite-error <GRAPHITE_ERROR_STATUS>
            What to say if graphite returns a 500 or invalid JSON. Default: unknown. [possible values: ok, warning,
            critical, unknown]
        --no-data <NO_DATA_STATUS>
            What to do with no data. This is the value to use for the assertion 'if all values are null' Default: warn.
            [possible values: ok, warning, critical, unknown]
        --retries <COUNT>                           How many times to retry reaching graphite. Default 4.
    -w, --window <MINUTES>                          How many minutes of data to test. Default 10.
        --window-start <MINUTES_IN_PAST>            How far back to start the window. Default is now.

ARGS:
    <URL>             The domain to query graphite. Must include scheme (http/s)
    <PATH>            The graphite path to query. For example: "collectd.*.cpu"
    <ASSERTION>...    The assertion to make against the PATH. See Below.

About Assertions:

    Assertions look like 'critical if any point in any series is > 5'.

    They describe what you care about in your graphite data. The structure of
    an assertion is as follows:

        <errorkind> if <point spec> [in <series spec>] is|are [not] <operator> <threshold>

    Where:

        - `errorkind` is either `critical` or `warning`
        - `point spec` can be one of:
            - `any point`
            - `all points`
            - `at least <N>% of points`
            - `most recent point`
        - `series spec` (optional) can be one of:
            - `any series`
            - `all series`
            - `at least <N>% of series`
            - `not` is optional, and inverts the following operator
        - `operator` is one of: `==` `!=` `<` `>` `<=` `>=`
        - `threshold` is a floating-point value (e.g. 100, 78.0)

    Here are some example assertions:

        - `critical if any point is > 0`
        - `critical if any point in at least 40% of series is > 0`
        - `critical if any point is not > 0`
        - `warning if any point is == 9`
        - `critical if all points are > 100.0`
        - `critical if at least 20% of points are > 100`
        - `critical if most recent point is > 5`
        - `critical if most recent point in all series are == 0`

check-cpu

Linux-only.

$ check-cpu --help
check-cpu  (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check cpu usage of the current computer

USAGE:
    check-cpu [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
        --per-cpu    Gauge values per-cpu instead of across the entire machine
    -V, --version    Prints version information

OPTIONS:
        --show-hogs <count>        Show <count> most cpu-intensive processes in this container. [default: 0]
        --cpu-count <cpu_count>    If --per-cpu is specified, this is how many
                                                                CPUs need to be at a threshold to trigger. [default: 1]
    -c, --crit <crit>              Percent to go critical at [default: 80]
    -s, --sample <seconds>         Seconds to take sample over [default: 5]
    -w, --warn <warn>              Percent to warn at [default: 80]
        --type <work_type>...      See 'CPU Work Types, below [default: active]

CPU Work Types:

    Specifying one of the CPU kinds via `--type` checks that kind of
    utilization. The default is to check total utilization. Specifying this
    multiple times alerts if *any* of the CPU usage types are critical.

    There are three CPU type groups: `active` `activeplusiowait` and
    `activeminusnice`. `activeplusiowait` considers time spent waiting for IO
    to be busy time, this gets alerts to be more aligned with the overall
    system load, but is different from CPU usage reported by `top` since the
    CPU isn't actually *busy* during this time.

    --type=<usage>           Some of:
                                active activeplusiowait activeminusnice
                                user nice system irq softirq steal guest
                                idle iowait [default: active]

check-container-cpu

Linux-only. Can only be run from inside a cgroup.

$ check-container-cpu --help
check-container-cpu (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check the cpu usage of the currently-running container.

This must be run from inside the container to be checked.

USAGE:
    check-container-cpu [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --show-hogs <count>                  Show <count> most cpu-intensive processes in this container. [default: 0]
    -c, --crit <crit>                        Percent to go critical at [default: 80]
    -s, --sample <seconds>                   Seconds to take sample over [default: 5]
        --shares-per-cpu <shares_per_cpu>    The number of CPU shares given to a cgroup when it has exactly one CPU
                                             allocated to it.
    -w, --warn <warn>                        Percent to warn at [default: 80]

About usage percentages:

    If you don't specify '--shares-per-cpu', percentages should be specified
    relative to a single CPU's usage. So if you have a process that you want to
    be allowed to use 4 CPUs worth of processor time, and you were planning on
    going critical at 90%, you should specify something like '--crit 360'

    However, if you are using a container orchestrator such as Mesos, you often
    tell it that you want this container to have '2 CPUs' worth of hardware.
    Your scheduler is responsible for deciding how many cgroup cpu shares 1
    CPU's worth of time is, and keeping track of how many shares it has doled
    out, and then schedule your containers to run with 2 CPUs worth of CPU
    shares. Assuming that your scheduler uses the default number of shares
    (1024) as 'one cpu', this will mean that you have given that cgroup 2048
    shares.

    If you do specify --shares-per-cpu then the percentage that you give will
    be scaled by the number of CPUs worth of shares that this container has
    been given, and CPU usage will be compared to the total percent of the CPUs
    that it has been allocated.

    Which is to say, if you specify --shares-per-cpu, you should always specify
    your warn/crit percentages out of 100%, because this script will correctly
    scale it for your process.

    Here are some examples, where 'shares granted' is the value in
    /sys/fs/cgroup/cpu/cpu.shares:

        * args: --shares-per-cpu 1024 --crit 90
          shares granted: 1024
          percent of one CPU to alert at: 90
        * args: --shares-per-cpu 1024 --crit 90
          shares granted: 2024
          percent of one CPU to alert at: 180
        * args: --shares-per-cpu 1024 --crit 90
          shares granted: 102
          percent of one CPU to alert at: 9

check-load

Linux-only.

$ check-load --help
check-load (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check the load average of the system

Load average is the number of processes *waiting* to do work in a queue, either due to IO or CPU constraints. The
numbers used to check are the load averaged over 1, 5 and 15 minutes, respectively

USAGE:
    check-load [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
        --per-cpu    Divide the load average by the number of processors on the system.
    -V, --version    Prints version information
    -v, --verbose    print info even if everything is okay

OPTIONS:
    -c, --crit <crit>    Averages to go critical at [default: 10,5,3]
    -w, --warn <warn>    Averages to warn at [default: 5,3.5,2.5]

check-ram

Linux-only.

$ check-ram --help
check-ram (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check the ram usage of the current computer

USAGE:
    check-ram [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --show-hogs <count>    Show <count> most ram-intensive processes in this computer. [default: 0]
    -c, --crit <crit>          Percent to go critical at [default: 95]
    -w, --warn <warn>          Percent to warn at [default: 85]

check-container-ram

Linux-only. Can only be run from inside a cgroup.

$ check-container-ram --help
check-container-ram (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check the RAM usage of the currently-running container.

This must be run from inside the container to be checked.

This checks as a ratio of the limit specified in the cgroup memory limit, and if there is no limit set (or the limit is
greater than the total memory available on the system) this checks against the total system memory.

USAGE:
    check-container-ram [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --show-hogs <count>                Show <count> most ram-intensive processes in this container. [default: 0]
    -c, --crit <crit>                      Percent to go critical at [default: 95]
        --invalid-limit <invalid_limit>    Status to consider this check if the CGroup limit is greater than the system
                                           ram [default: ok]
    -w, --warn <warn>                      Percent to warn at [default: 85]

check-procs

Linux-only. Reads running processes

$ check-procs --help
check-procs (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check that an expected number of processes are running.

Optionally, kill unwanted processes.

USAGE:
    check-procs [FLAGS] [OPTIONS] [--] [pattern]

FLAGS:
        --allow-unparseable-procs    In combination with --crit-over M this will not alert if any processes cannot be
                                     parsed
    -h, --help                       Prints help information
    -V, --version                    Prints version information

OPTIONS:
        --crit-over <M>                               Error if there are more than <M> procs matching <pattern>
        --crit-under <N>                              Error if there are fewer than <N> procs matching <pattern>
        --kill-parents-of-matching <PARENT_SIGNAL>
            If *any* processes match, then kill their parents with the provided signal which can be either an integer or
            a name like KILL or SIGTERM. This has the same exit status behavior as kill-matching.
        --kill-matching <SIGNAL>
            If *any* processes match, then kill them with the provided signal which can be either an integer or a name
            like KILL or SIGTERM. This option does not affect the exit status, all matches are always killed, and if
            --crit-under/over are violated then then this will still exit critical.
        --state <states>...
            Filter to only processes in these states. If passed multiple times, processes matching any state are
            included.
            Choices: running sleeping uninterruptible-sleep waiting stopped zombie

ARGS:
    <pattern>    Regex that command and its arguments must match

Examples:

    Ensure at least two nginx processes are running:

        check-procs --crit-under 2 nginx

    Ensure there are not more than 30 zombie proccesses on the system:

        check-procs --crit-over 30 --state zombie

    Ensure that there are not more than 5 java processes running MyMainClass
    that are in the zombie *or* waiting states. Note that since there can be
    multiple states the regex must come before the `state` flag:

        check-procs 'java.*MyMainClass' --crit-over 5 --state zombie waiting

    Ensure that there are at least three (running or waiting) (cassandra or
    postgres) processes:

        check-procs --crit-under 3 --state=running --state=waiting 'cassandra|postgres'

check-fs-writeable

$ check-fs-writeable --help
check-fs-writeable (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check that we can write to a filesystem by writing a byte to a file.

Does not try to create the directory, or do anything else. Just writes a single byte to a file, errors if it cannot, and
then deletes the file.

USAGE:
    check-fs-writeable <filename>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <filename>    The file to write to

check-disk

Unix only.

$ check-disk --help
check-disk (part of tabin-plugins) 0.3.0
Brandon W Maister <quodlibetor@gmail.com>
Check all mounted file systems for disk usage.

For some reason this check generally generates values that are between 1% and 3% higher than `df`, even though AFAICT
we're both just calling statvfs a bunch of times.

USAGE:
    check-disk [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
        --info       Print information of all known filesystems. Similar to df.
    -V, --version    Prints version information

OPTIONS:
        --inaccessible-status <STATUS>       If any filesystems are inaccessible print a warning and exit with STATUS.
                                             Choices: [critical, warning, ok]
    -c, --crit <crit>                        Percent to go critical at [default: 90]
    -C, --crit-inodes <crit_inodes>          Percent of inode usage to go critical at [default: 90]
        --exclude-type <exclude-fs-type>     Do not check filesystems that are of this type.
        --exclude-pattern <exclude-regex>    Only check filesystems that match this regex
        --type <fs-type>                     Only check filesystems that are of this type, e.g. ext4 or tmpfs. See 'man
                                             8 mount' for more examples.
        --pattern <regex>                    Only check filesystems that match this regex
    -w, --warn <warn>                        Percent to warn at [default: 80]
    -W, --warn-inodes <warn_inodes>          Percent of inode usage to warn at [default: 80]