STAM Tools
A collection of command-line tools for working with STAM.
Various tools are grouped under the stam tool, and invoked with a subcommand:
stam annotate- Add an annotation from a JSON filestam info- Return information regarding a STAM model.stam init- Initialize a new STAM annotationstorestam to-text- Print the text of any resources in the model.stam to-tsv- Convert STAM to a simple TSV (Tab Separated Values) format. This is not lossless but provides a decent view on the data.stam validate- Validate a STAM model.stam save- Write a STAM model to file(s). This can be used to switch between STAM JSON and STAM CSV output, based on the extension.stam tag- Regular-expression based tagger on plain text.
For many of these, you can set --verbose for extra details in the output.
Installation
From source
$ cargo install stam-tools
Usage
Add the --help flag after the subcommand for extensive usage instructions.
Most tools take as input a STAM JSON file containing an annotation store. Any
files mentioned via the @include mechanism are loaded automatically.
Instead of passing STAM JSON files, you can read from stdin and/or output to
stdout by setting the filename to -, this works in many places.
These tools also support reading and writing STAM CSV.
Tools
stam tag
The stam tag tool can be used for matching regular expressions in text and subsequently associated annotations with the found results. It is a tool to do for example tokenization or other tagging tasks.
The stam tag command takes a TSV file (example) containing regular expression rules for the tagger.
The file contains the following columns:
- The regular expressions follow the this syntax. The expression may contain one or or more capture groups containing the items that will be tagged, in that case anything else is considered context and will not be tagged.
- The ID of annotation data set
- The ID of the data key
- The value to set. If this follows the syntax $1,$2,etc.. it will assign the value of that capture group (1-indexed).
Example:
#EXPRESSION #ANNOTATIONSET #DATAKEY #DATAVALUE
\w+(?:[-_]\w+)* simpletokens type word
[\.\?,/]+ simpletokens type punctuation
[0-9]+(?:[,\.][0-9]+) simpletokens type number