Crate stam

Expand description

§Introduction

STAM is a standalone data model for stand-off text annotation. This is a software library to work with the model from Rust, and is the primary library/reference implementation for STAM. It aims to implement the full model as per the STAM specification and most of the extensions.

What can you do with this library?

Keep, build and manipulate an efficient in-memory store of texts and annotations on texts
Search in annotations, data and text, either programmatically or via the STAM Query Language.
- Search annotations by data, textual content, relations between text fragments (overlap, embedding, adjacency, etc).
- Search in text (incl. via regular expressions) and find annotations targeting found text selections.
- Elementary text operations with regard for text offsets (splitting text on a delimiter, stripping text).
- Search in data (set,key,value) and find annotations that use the data.
- Convert between different kind of offsets (absolute, relative to other structures, UTF-8 bytes vs unicode codepoints, etc)
Read and write resources and annotations from/to STAM JSON, STAM CSV, or an optimised binary (CBOR) representation.
- The underlying STAM model aims to be clear and simple. It is flexible and does not commit to any vocabulary or annotation paradigm other than stand-off annotation.

This STAM library is intended as a foundation upon which further applications can be built that deal with stand-off annotations on text. We implement all the low-level logic in dealing this so you no longer have to and can focus on your actual application. The library is written with performance in mind.

This is the root module for the STAM library. The STAM library consists of two APIs, a low-level API and a high-level API, the latter is of most interest to end users and is implemented in api/*.rs.

§Table of Contents (abridged)

AnnotationStore - The main annotation store that holds everything together.
Result items: - These encapsulate the underlying primary structures and is the main way in which things are returned throughout the high-level API.
Values and Operators:
- DataValue - Encapsulates an actual value and its type.
- DataOperator - Defines a test done on a DataValue
- TextSelectionOperator - Performs a particular comparison of text selections (e.g. overlap, embedding, adjacency, etc..)
Iterators:
- AnnotationIterator - Iterator trait to iterate over annotations, typically produced by an annotations() method.
- DataIterator - Iterator trait to iterate over annotation data, typically produced by a data() method.
- TextSelectionIterator - iterator (trait), typically produced by a textselections() or related_text() method.
- ResourcesIterator - iterator (trait), typically produced by a resources() method.
- KeyIterator - iterator (trait), typically produced by a keys() method.
- TextIter - iterator over actual text, typically produced by a text() method.
Text operations:
- FindText - Trait available on textresources and text selections to provide text-searching methods
- Text - Lower-level API trait to obtain text.
Collections:
- Annotations == Handles<Annotation> - Arbitrary collection of Annotation (by reference)
- Data == Handles<AnnotationData> - Arbitrary collection of AnnotationData (by reference)
- Resources == Handles<TextResource> - Arbitrary collection of TextResource (by reference).
- Keys == Handles<DataKey> - Arbitrary collection of DataKey (by reference).
Querying:
- Query - Holds a query, may be parsed from STAMQL.
- QueryResultItems
- QueryResultItem
Referencing Text (both high and low-level API):
- Cursor - Points to a text position, position may be relative.
- Offset - Range (two cursors) that can be used to selects a text, positions may be relative.
Primary structures (low level API):

Structs§

Annotation
Annotation represents a particular instance of annotation and is the central concept of the model. They can be considered the primary nodes of the graph model. The instance of annotation is strictly decoupled from the data or key/value of the annotation (AnnotationData). After all, multiple instances can be annotated with the same label (multiple annotations may share the same annotation data). Moreover, an Annotation can have multiple annotation data associated. The result is that multiple annotations with the exact same content require less storage space, and searching and indexing is facilitated.
AnnotationBuilder
This is the builder that builds Annotation. The actual building is done by passing this structure to AnnotationStore::annotate(), there is no build() method for this builder.
AnnotationData
AnnotationData holds the actual content of an annotation; a key/value pair. (the term feature is regularly seen for this in certain annotation paradigms). Annotation Data is deliberately decoupled from the actual Annotation instances so multiple annotation instances can point to the same content without causing any overhead in storage. Moreover, it facilitates indexing and searching. The annotation data is part of an AnnotationDataSet, which effectively defines a certain user-defined vocabulary.
AnnotationDataBuilder
This is the builder for AnnotationData. It contains public IDs or handles that will be resolved. This structure is usually not instantiated directly but via the AnnotationBuilder.with_data(), AnnotationDataSet.insert_data() or AnnotationDataSet.with_data() or AnnotationDataSet.build_insert_data() methods. It also does not have its own build() method but is resolved via the aforementioned methods.
AnnotationDataHandle
Handle to an instance of AnnotationData in the store (AnnotationDataSet).
AnnotationDataSet
An AnnotationDataSet stores the keys DataKey and values AnnotationData (which in turn encapsulates DataValue) that are used by annotations. It effectively defines a certain vocabulary, i.e. key/value pairs. The AnnotationDataSet does not store the Annotation instances, those are in the AnnotationStore. The datasets themselves are also held by the AnnotationStore.
AnnotationDataSetHandle
AnnotationHandle
Handle to an instance of Annotation in the store.
AnnotationStore
An Annotation Store is a collection of annotations, resources and annotation data sets. It can be seen as the root of the graph model and the glue that holds everything together. It is the entry point for any stam model.
Config
This holds the configuration. It is not limited to configuring a single part of the model, but unifies all in a single configuration.
DataKey
The DataKey structure defines a vocabulary field or feature, as it is called in some annotation paradigms. it belongs to a certain AnnotationDataSet. An AnnotationData instance in turn makes reference to a DataKey and assigns it a value, producing a full key/value pair.
DataKeyHandle
Handle to an instance of DataKey in the store (AnnotationDataSet)
FilterAllIter
FilteredAnnotations
An iterator that applies a filter to constrain annotations. This iterator implements AnnotationIterator and is itself produced by the various filter_*() methods on that trait.
FilteredData
An iterator that applies a filter to constrain annotation data. This iterator implements DataIterator and is itself produced by the various filter_*() methods on that trait.
FilteredKeys
An iterator that applies a filter to constrain keys. This iterator implements KeyIterator and is itself produced by the various filter_*() methods on that trait.
FilteredResources
An iterator that applies a filter to constrain resources. This iterator implements ResourcesIterator and is itself produced by the various filter*() methods on that trait.
FilteredTextSelections
An iterator that applies a filter to constrain text selections. This iterator implements TextSelectionIterator and is itself produced by the various filter_*() methods on that trait.
FindNoCaseTextIter
This iterator is produced by FindText::find_text_nocase() and searches a text for a single fragment, without regard for casing. It has more overhead than the exact (case sensitive) variant FindTextIter.
FindRegexIter
This iterator is produced by FindText::find_text_regex() and searches a text based on regular expressions.
FindRegexMatch
This match structure is returned by the FindRegexIter iterator, which is in turn produced by FindText::find_text_regex() and searches a text based on regular expressions. This structure represents a single regular-expression match of the iterator on the text.
FindTextIter
This iterator is produced by FindText::find_text() and searches a text for a single fragment. The search is case sensitive. See FindNoCaseTextIter for a case-insensitive variant. The iterator yields ResultTextSelection items (which encapsulates TextSelection).
FromHandles
Iterator that turns iterators over full handles into ResultItem<T>, holds a reference to the AnnotationStore
Handles
Holds a collection of items. The collection may be either owned or borrowed from the store (usually from a reverse index).
Offset
Text selection offset. Specifies begin and end offsets to select a range of a text, via two Cursor instances. The end-point is non-inclusive.
OwnedHandlesIter
Query
This represents a query that can be performed on an AnnotationStore via AnnotationStore::query() to obtain anything in the store. A query can be formulated in STAMQL, a dedicated query language (via Query::parse(), or it can be instantiated programmatically via Query::new().
QueryIter
Iterator over the results of a Query. Querying will be performed as the iterator is consumed (lazy evaluation). If it is not consumed, no actual querying will be done. See AnnotationStore::query() for an example.
QueryNames
This is a simple hashmap that can resolve all variable names used in the query to the internally used index numbers See AnnotationStore::query() for an example.
QueryResultItems
Represents an entire result row, each result stems from a query
Regex
A compiled regular expression for searching Unicode haystacks.
RegexSet
Match multiple, possibly overlapping, regexes in a single search.
ResultItem
This is a smart pointer that encapsulates both the item and the store that owns it. It allows the item to have some more introspection as it knows who its immediate parent is. It is heavily used as a return type all throughout the higher-level API. Most API traits are implemented for a particular variant of this type.
ResultIter
An iterator that may be sorted or not and knows a-priori whether it is or not, it may also be a completely empty iterator.
ResultTextSelections
Iterator that turns iterators over ResultItem<TextSelection> into ResultTextSelection.
SelectorIter
Iterator that returns the selector itself, plus all selectors under it (recursively)
SplitTextIter
This iterator is produced by FindText::split_text() and splits a text based on a delimiter. The iterator yields ResultTextSelection (which encapsulates TextSelection).
TextIter
An iterator over the actual text of text selections. This iterator yields &str instances and is typically produced by a .text() method when there may be multiple text slices.
TextResource
This holds the textual resource to be annotated. It holds the full text in memory.
TextResourceBuilder
This is a helper structure to build TextResource instances in a builder pattern. Example:
TextResourceHandle
Handle to an instance of TextResource in the store (AnnotationStore).
TextSelection
Corresponds to a slice of the text. This only contains minimal information; i.e. the begin offset, end offset and optionally a handle. if the textselection is already known in the model. This is similar to Offset, but that one uses cursors which may be relative. TextSelection specifies an offset in more absolute terms.
TextSelectionHandle
Handle to an instance of TextSelection in the store (TextResource).
TextSelectionIter
This iterator is used for iterating over TextSelections in a resource in a sorted fashion using the so-called position index.
TextSelectionSet
A TextSelectionSet holds one or more TextSelection items and a reference to the TextResource from which they’re drawn. All textselections in a set must reference the same resource, which implies they are comparable.
TextSelectionSetIntoIter
TextSelectionSetIter

Enums§

AnnotationDepth
This determines how far to look up or down in an annotation hierarchy tree formed by AnnotationSelectors.
BuildItem
BuildItem offers various ways of referring to a data structure of type T in the core STAM model It abstracts over public IDs (both owned an and borrowed), handles, and references.
Constraint
A constraint is a part of a Query that poses specific selection criteria that must be met. A query can have multiple constraints which must all be satisfied. See the documentation for Query for examples.
Cursor
A cursor points to a specific point in a text. I Used to select offsets. Units are unicode codepoints (not bytes!) and are 0-indexed.
DataFormat
Data formats for serialisation and deserialisation supported by the library.
DataOperator
This type defines a test that can be done on a DataValue (via DataValue::test()). The operator does not merely consist of the operator-part, but also holds the value that is tested against, which may be one of various types, hence the many variants of this type.
DataValue
This type encapsulates a value and its type. It is held by AnnotationData alongside a reference to a DataKey, resulting in a key/value pair.
FilterMode
This determines how a filter is applied when there the filter is provided with multiple reference instances to match against. It determines if the filter requires a match with any of the instances (default), or with all of them.
OffsetMode
The offset mode represents the ways in which the user can specify an Offset, it expresses whether the cursors (Cursor) for the begin and end positions of the offset are specified as begin-aligned or end-aligned.
PositionMode
Used as a parameter for TextResource::positions()
QueryResultItem
This structure encapsulates the different kind of result items that can be returned from queries. See AnnotationStore::query() for an example of it in use.
QueryResultIter
This type abstracts over all the main iterators. This abstraction uses dynamic dispatch so comes with a small performance cost
QueryType
Holds the type of a Query.
ResultTextSelection
This structure holds a TextSelection, along with references to its TextResource and the AnnotationStore and provides a high-level API on it.
SelectionQualifier
This is determines whether a query Constraint is applied normally or with a particular altered meaning.
Selector
A Selector identifies the target of an annotation and the part of the target that the annotation applies to. Selectors can be considered the labelled edges of the graph model, tying all nodes together. There are multiple types of selectors, all captured in this enum.
SelectorBuilder
A SelectorBuilder is a recipe that, when applied, identifies the target of an annotation and the part of the target that the annotation applies to. They produce a Selector. You turn a SelectorBuilder into a Selector using AnnotationStore::selector.
SelectorKind
See Selector, this is a simplified variant that carries only the type, not the target.
StamError
This enum groups the different kind of errors that this STAM library can produce
TextMode
Determines whether a text search is exact (case sensitive) or case insensitive.
TextSelectionOperator
The TextSelectionOperator, simply put, allows comparison of two TextSelection instances. It allows testing for all kinds of spatial relations (as embodied by this enum) in which two TextSelection instances can be, such as overlap, embedding, adjacency, etc…
Type
An enumeration of STAM data types. This is used for introspection via TypeInfo.

Traits§

AnnotationIterator
Trait for iteration over annotations (ResultItem<Annotation>; encapsulation over Annotation). Implements numerous filter methods to further constrain the iterator, as well as methods to map from annotations to other items.
AssociatedFile
Configurable
DataIterator
Trait for iteration over annotation data (ResultItem<AnnotationData>; encapsulation over AnnotationData). Implements numerous filter methods to further constrain the iterator, as well as methods to map from annotation data to other items.
FindText
This trait provides text-searching methods that operate on structures that hold or represent text content. It builds upon the lower-level Text trait.
FromCsv
FromJson
Handle
The handle trait is implemented for various handle types. They have in common that refer to the internal id of a Storable item in a struct implementing StoreFor by index. Types implementing this are lightweight and do not borrow anything, they can be passed and copied freely. This is a sealed trait, not implementable outside this crate.
KeyIterator
Trait for iteration over annotations (ResultItem<DataKey>; encapsulation over DataKey). Implements numerous filter methods to further constrain the iterator, as well as methods to map from keys to other items.
MaybeSortedIterator
An iterator that may be sorted or not and knows a-priori whether it is or not.
Request
This trait is implemented for types that can serve as a request for a specific item of type T from the store. It is typically implemented on strings (both owned and borrowed) in which case the request is for a particular public identifier, or it is implemented on handles.
ResourcesIterator
Trait for iteration over resources (ResultItem<TextResource>; encapsulation over TextResource). Implements numerous filter methods to further constrain the iterator, as well as methods to map from resources to other items.
SelfSelector
This trait is implemented by types that can return a Selector to themselves
SortTextualOrder
This trait allows sorting a collection in textual order, meaning that items are returned in the same order as they appear in the original text.
StamResult
This trait defines the Self::or_fail method that is used to turn an Option<T> into Result<T,StamError>.
Storable
This is a low-level trait that is implemented on the various STAM data structures that are held in a store, such as Annotation, AnnotationData,TextResource, etc.. All storable elements have a Handle, defined by the associated Self::HandleType. It corresponds directly to their index in a vector, so this type is a simple wrapper around usize. This is a sealed trait, not implementable outside this crate.
StoreFor
This trait is implemented on types that provide storage for a certain other generic type (T) It belongs to the low-level API. It is a sealed trait, not implementable outside this crate.
TestTextSelection
This trait defines the test() methods for testing relations between two text selections (or sets thereof).
TestableIterator
This iterator implements a simple .test() method that just checks whether an iterator is empty or yields results. It is implemented alongside traits like AnnotationIterator, DataIterator, etc…
Text
This trait provides methods that operate on structures that hold or represent text content. They are fairly low-level methods but are exposed in the public API. The FindText trait subsequently builds upon this one with high-level search methods.
TextSelectionIterator
Trait for iteration over text selections (ResultTextSelection; encapsulation over TextSelection). Implements numerous filter methods to further constrain the iterator, as well as methods to map from text selections to other items.
ToCsv
ToHandles
This trait is implemented on iterators over ResultItem<T> and turns effectively collects these items, by only their handles and a reference to a store, as Handles<T>. It is implemented alongside traits like AnnotationIterator, DataIterator, etc…
ToJson
TypeInfo
This trait provides some introspection on STAM data types. It is a sealed trait that can not be implemented.

Functions§

compare_annotation_textual_order

Type Aliases§

Annotations
Holds a collection of Annotation (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over annotations (ResultItem<Annotation>).
Data
Holds a collection of AnnotationData (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over data.
HandlesIter
Iterator over the handles in a Handles<T> collection.
Keys
Holds a collection of DataKey (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over keys (ResultItem<DataKey>).
Resources
Holds a collection of TextResource (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over resources (ResultItem<TextResource>).
Store
Type for Store elements. The struct that owns a field of this type should implement the trait StoreFor<T> This is a low-level construct. Do not confuse with AnnotationStore.
TextSelections
Holds a collection of TextSelection (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over texts selections.