Expand description
§grok
The grok
library allows you to quickly parse and match potentially
unstructured data into a structed result. It is especially helpful when parsing
logfiles of all kinds. This Rust version is mainly a
port from the Java version which in
turn drew inspiration from the original Ruby
version.
§Usage
Add this to your Cargo.toml
:
[dependencies]
grok = "2.3"
Here is a simple example which stores a pattern, compiles it and then matches a line on it:
use grok::Grok;
fn main() {
// Instantiate Grok
let mut grok = Grok::default();
// Add a pattern which might be a regex or an alias
grok.add_pattern("USERNAME", r"[a-zA-Z0-9._-]+");
// Compile the definitions into the pattern you want
let pattern = grok
.compile("%{USERNAME}", false)
.expect("Error while compiling!");
// Match the compiled pattern against a string
match pattern.match_against("root") {
Some(m) => println!("Found username {:?}", m.get("USERNAME")),
None => println!("No matches found!"),
}
}
Note that compiling the pattern is an expensive operation, so very similar to
plain regex handling the compile
operation should be performed once and then
the match_against
method on the pattern can be called repeatedly in a loop or
iterator. The returned pattern is not bound to the lifetime of the original grok
instance so it can be passed freely around. For performance reasons the Match
returned is bound to the pattern lifetime so keep them close together or
clone/copy out the containing results as needed.
§Pattern Syntax
A grok pattern is a standard regular expression string with grok pattern placeholders embedded in it.
The grok pattern placeholders are of the form
%{name:alias:extract=definition}
, where name
is the name of the pattern,
alias
is the alias of the pattern, extract
is the extract of the pattern,
and definition
is the definition of the pattern.
name
is the name of the pattern and is required. It may contain any alphanumeric character, or_
.alias
is the alias of the pattern and is optional. It may contain any alphanumeric character, or any of_-[].
. If extract is provided,alias
may be empty.extract
is the extract of the pattern and is optional. It may contain any alphanumeric character, or any of_-[].
.definition
is the definition of the pattern and is optional. It may contain any character other than{
or}
.
A literal %
character may appear in a grok pattern as long as it is not
followed by {
. You can surround the percent with grouped parentheses
(%){..}
, a non-capturing group (?:%){..}
, or use the \x25
escape
sequence, ie: \x25{..}
.
For example, to match log messages like so:
2016-09-19T18:19:00 [8.8.8.8:prd] DEBUG this is an example log message
… the following pattern could be used:
%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip}:%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}
§Further Information
This library supports multiple regex engines through feature flags. By default, it uses onig, which is a Rust binding for the powerful Oniguruma regex library. You can also use the standard Rust regex engine or fancy-regex by enabling the respective features:
The default engine is onig
for compatibility with previous 2.x releases:
[dependencies]
grok = { version = "2.3", features = ["onig"] }
The pcre2
engine is a more complete Rust regex library supporting
backtracking, JIT compilation and is the fastest engine for most use cases:
[dependencies]
grok = { version = "2.3", default-features = false, features = ["pcre2"] }
The fancy-regex
engine is a more complete Rust regex library supporting
backtracking:
[dependencies]
grok = { version = "2.3", default-features = false, features = ["fancy-regex"] }
The regex
engine is supported, but it does not support backtracking, so many
patterns are unusable. This is not recommended for most use cases:
[dependencies]
grok = { version = "2.3", default-features = false, features = ["regex"] }
§License
grok
is distributed under the terms of the Apache License (Version 2.0).
See LICENSE for details.
Modules§
Structs§
- Grok
- The
Grok
struct is the main entry point into using this library. - Matches
- The
Matches
represent matched results from aPattern
against a provided text. - Matches
Iter - An
Iterator
over all matches, accessible viaMatches
. - Pattern
- The
Pattern
represents a compiled regex, ready to be matched against arbitrary text.
Enums§
- Error
- Errors that can occur when using this library.
Functions§
- patterns
- Returns the default patterns, also used by the default constructor of
Grok
.