Categorized semi-structured text utilizing the drain algorithm: https://arxiv.org/pdf/1806.04356.pdf The main implementation is a fixed-sized prefix tree. Consequently, this assumes that splits that give us more information come earlier in the text.
This might prove to not be optimal given some text formats.
Given log values:
Node 2 is online Node 4 going offline
With a fixed tree depth of 3 we would get the following splits
4 // initial root is the number of tokens
“Node” // first prefix node of value “Node”
“<*>” // Numbers are assumed to be variable and are replaced with wildcard
“is” “going” // last two splits of is and going /
[Node * is online] [Node * going offline] //the individual text templates for this simple case
Main drain algorithm implementation Contains the structure of the drain prefix tree along with configuration options
Represents a cluster of logs