Crate stam

Crate stam 

Source
Expand description

§Introduction

STAM is a standalone data model for stand-off text annotation. This is a software library to work with the model from Rust, and is the primary library/reference implementation for STAM. It aims to implement the full model as per the STAM specification and most of the extensions.

What can you do with this library?

  • Keep, build and manipulate an efficient in-memory store of texts and annotations on texts
  • Search in annotations, data and text, either programmatically or via the STAM Query Language.
    • Search annotations by data, textual content, relations between text fragments (overlap, embedding, adjacency, etc).
    • Search in text (incl. via regular expressions) and find annotations targeting found text selections.
    • Elementary text operations with regard for text offsets (splitting text on a delimiter, stripping text).
    • Search in data (set,key,value) and find annotations that use the data.
    • Convert between different kind of offsets (absolute, relative to other structures, UTF-8 bytes vs unicode codepoints, etc)
  • Read and write resources and annotations from/to STAM JSON, STAM CSV, or an optimised binary (CBOR) representation.
    • The underlying STAM model aims to be clear and simple. It is flexible and does not commit to any vocabulary or annotation paradigm other than stand-off annotation.

This STAM library is intended as a foundation upon which further applications can be built that deal with stand-off annotations on text. We implement all the low-level logic in dealing this so you no longer have to and can focus on your actual application. The library is written with performance in mind.

This is the root module for the STAM library. The STAM library consists of two APIs, a low-level API and a high-level API, the latter is of most interest to end users and is implemented in api/*.rs.

§Table of Contents (abridged)

Structs§

Annotation
Annotation represents a particular instance of annotation and is the central concept of the model. They can be considered the primary nodes of the graph model. The instance of annotation is strictly decoupled from the data or key/value of the annotation (AnnotationData). After all, multiple instances can be annotated with the same label (multiple annotations may share the same annotation data). Moreover, an Annotation can have multiple annotation data associated. The result is that multiple annotations with the exact same content require less storage space, and searching and indexing is facilitated.
AnnotationBuilder
This is the builder that builds Annotation. The actual building is done by passing this structure to AnnotationStore::annotate(), there is no build() method for this builder.
AnnotationData
AnnotationData holds the actual content of an annotation; a key/value pair. (the term feature is regularly seen for this in certain annotation paradigms). Annotation Data is deliberately decoupled from the actual Annotation instances so multiple annotation instances can point to the same content without causing any overhead in storage. Moreover, it facilitates indexing and searching. The annotation data is part of an AnnotationDataSet, which effectively defines a certain user-defined vocabulary.
AnnotationDataBuilder
This is the builder for AnnotationData. It contains public IDs or handles that will be resolved. This structure is usually not instantiated directly but via the AnnotationBuilder.with_data(), AnnotationDataSet.insert_data() or AnnotationDataSet.with_data() or AnnotationDataSet.build_insert_data() methods. It also does not have its own build() method but is resolved via the aforementioned methods.
AnnotationDataHandle
Handle to an instance of AnnotationData in the store (AnnotationDataSet).
AnnotationDataSet
An AnnotationDataSet stores the keys DataKey and values AnnotationData (which in turn encapsulates DataValue) that are used by annotations. It effectively defines a certain vocabulary, i.e. key/value pairs. The AnnotationDataSet does not store the Annotation instances, those are in the AnnotationStore. The datasets themselves are also held by the AnnotationStore.
AnnotationDataSetBuilder
AnnotationDataSetHandle
AnnotationHandle
Handle to an instance of Annotation in the store.
AnnotationStore
An Annotation Store is a collection of annotations, resources and annotation data sets. It can be seen as the root of the graph model and the glue that holds everything together. It is the entry point for any stam model.
AnnotationSubStore
A substore is a sub-collection of annotations that is serialised as an independent AnnotationStore, The actual contents are still defined and kept by the parent AnnotationStore. This structure only holds references used for serialisation purposes.
AnnotationSubStoreHandle
Config
This holds the configuration. It is not limited to configuring a single part of the model, but unifies all in a single configuration.
DataKey
The DataKey structure defines a vocabulary field or feature, as it is called in some annotation paradigms. it belongs to a certain AnnotationDataSet. An AnnotationData instance in turn makes reference to a DataKey and assigns it a value, producing a full key/value pair.
DataKeyHandle
Handle to an instance of DataKey in the store (AnnotationDataSet)
DateTime
ISO 8601 combined date and time with time zone.
FilterAllIter
FilteredAnnotations
An iterator that applies a filter to constrain annotations. This iterator implements AnnotationIterator and is itself produced by the various filter_*() methods on that trait.
FilteredData
An iterator that applies a filter to constrain annotation data. This iterator implements DataIterator and is itself produced by the various filter_*() methods on that trait.
FilteredDataSets
An iterator that applies a filter to constrain keys. This iterator implements KeyIterator and is itself produced by the various filter_*() methods on that trait.
FilteredKeys
An iterator that applies a filter to constrain keys. This iterator implements KeyIterator and is itself produced by the various filter_*() methods on that trait.
FilteredResources
An iterator that applies a filter to constrain resources. This iterator implements ResourcesIterator and is itself produced by the various filter*() methods on that trait.
FilteredTextSelections
An iterator that applies a filter to constrain text selections. This iterator implements TextSelectionIterator and is itself produced by the various filter_*() methods on that trait.
FindNoCaseTextIter
This iterator is produced by FindText::find_text_nocase() and searches a text for a single fragment, without regard for casing. It has more overhead than the exact (case sensitive) variant FindTextIter.
FindRegexIter
This iterator is produced by FindText::find_text_regex() and searches a text based on regular expressions.
FindRegexMatch
This match structure is returned by the FindRegexIter iterator, which is in turn produced by FindText::find_text_regex() and searches a text based on regular expressions. This structure represents a single regular-expression match of the iterator on the text.
FindTextIter
This iterator is produced by FindText::find_text() and searches a text for a single fragment. The search is case sensitive. See FindNoCaseTextIter for a case-insensitive variant. The iterator yields ResultTextSelection items (which encapsulates TextSelection).
FixedOffset
The time zone with fixed offset, from UTC-23:59:59 to UTC+23:59:59.
FromHandles
Iterator that turns iterators over full handles into ResultItem<T>, holds a reference to the AnnotationStore
Handles
Holds a collection of items. The collection may be either owned or borrowed from the store (usually from a reverse index).
LimitIter
Local
The local timescale.
Offset
Text selection offset. Specifies begin and end offsets to select a range of a text, via two Cursor instances. The end-point is non-inclusive.
OwnedHandlesIter
Query
This represents a query that can be performed on an AnnotationStore via AnnotationStore::query() to obtain anything in the store. A query can be formulated in STAMQL, a dedicated query language (via Query::parse(), or it can be instantiated programmatically via Query::new().
QueryIter
Iterator over the results of a Query. Querying will be performed as the iterator is consumed (lazy evaluation). If it is not consumed, no actual querying will be done. See AnnotationStore::query() for an example.
QueryResultItems
Represents an entire result row, each result stems from a query
Regex
A compiled regular expression for searching Unicode haystacks.
RegexBuilder
A configurable builder for a Regex.
RegexSet
Match multiple, possibly overlapping, regexes in a single search.
ResultItem
This is a smart pointer that encapsulates both the item and the store that owns it. It allows the item to have some more introspection as it knows who its immediate parent is. It is heavily used as a return type all throughout the higher-level API. Most API traits are implemented for a particular variant of this type.
ResultIter
An iterator that may be sorted or not and knows a-priori whether it is or not, it may also be a completely empty iterator.
ResultTextSelectionSet
A TextSelectionSet holds one or more TextSelection items and a reference to the TextResource from which they’re drawn. This structure encapsulates such a TextSelectionSet and contains a reference to the underlying AnnotationStore.
ResultTextSelections
Iterator that turns iterators over ResultItem<TextSelection> into ResultTextSelection.
SegmentationIter
SelectorIter
Iterator that returns the selector itself, plus all selectors under it (recursively)
SplitTextIter
This iterator is produced by FindText::split_text() and splits a text based on a delimiter. The iterator yields ResultTextSelection (which encapsulates TextSelection).
TextIter
An iterator over the actual text of text selections. This iterator yields &str instances and is typically produced by a .text() method when there may be multiple text slices.
TextResource
This holds the textual resource to be annotated. It holds the full text in memory.
TextResourceBuilder
This is a helper structure to build TextResource instances in a builder pattern. This structure can be passed to AnnotationStore::add_resource() or AnnotationStore::with_resource().
TextResourceHandle
Handle to an instance of TextResource in the store (AnnotationStore).
TextSelection
Corresponds to a slice of the text. This only contains minimal information; i.e. the begin offset, end offset and optionally a handle. if the textselection is already known in the model. This is similar to Offset, but that one uses cursors which may be relative. TextSelection specifies an offset in more absolute terms.
TextSelectionHandle
Handle to an instance of TextSelection in the store (TextResource).
TextSelectionIter
This iterator is used for iterating over TextSelections in a resource in a sorted fashion using the so-called position index.
TextSelectionSet
A TextSelectionSet holds one or more TextSelection items and a reference to the TextResource from which they’re drawn. All textselections in a set must reference the same resource, which implies they are comparable.
TextSelectionSetIntoIter
TextSelectionSetIter
TextValidationResult
TranslateConfig
TransposeConfig
Utc
The UTC time zone. This is the most efficient time zone when you don’t need the local time. It is also used as an offset (which is also a dummy type).
WebAnnoConfig

Enums§

AnnotationDepth
This determines how far to look up or down in an annotation hierarchy tree formed by AnnotationSelectors.
Assignment
An assignemnt is a part of an ADD Query that assigns data to a new annotation
BuildItem
BuildItem offers various ways of referring to a data structure of type T in the core STAM model It abstracts over public IDs (both owned an and borrowed), handles, and references.
Constraint
A constraint is a part of a Query that poses specific selection criteria that must be met. A query can have multiple constraints which must all be satisfied. See the documentation for Query for examples.
Cursor
A cursor points to a specific point in a text. I Used to select offsets. Units are unicode codepoints (not bytes!) and are 0-indexed.
DataFormat
Data formats for serialisation and deserialisation supported by the library.
DataOperator
This type defines a test that can be done on a DataValue (via DataValue::test()). The operator does not merely consist of the operator-part, but also holds the value that is tested against, which may be one of various types, hence the many variants of this type.
DataValue
This type encapsulates a value and its type. It is held by AnnotationData alongside a reference to a DataKey, resulting in a key/value pair.
FilterMode
This determines how a filter is applied when there the filter is provided with multiple reference instances to match against. It determines if the filter requires a match with any of the instances (default), or with all of them.
IdStrategy
OffsetMode
The offset mode represents the ways in which the user can specify an Offset, it expresses whether the cursors (Cursor) for the begin and end positions of the offset are specified as begin-aligned or end-aligned.
PositionMode
Used as a parameter for TextResource::positions()
QueryQualifier
This is determines whether a query is applied normally or with a particular altered meaning.
QueryResultItem
This structure encapsulates the different kind of result items that can be returned from queries. See AnnotationStore::query() for an example of it in use.
QueryResultIter
This type abstracts over all the main iterators. This abstraction uses dynamic dispatch so comes with a small performance cost
QueryType
Holds the type of a Query.
ReannotateMode
How to handle data in a reannotation
ResultTextSelection
This structure holds a TextSelection, along with references to its TextResource and the AnnotationStore and provides a high-level API on it.
SelectionQualifier
This is determines whether a query Constraint is applied normally or with a particular altered meaning.
Selector
A Selector identifies the target of an annotation and the part of the target that the annotation applies to. Selectors can be considered the labelled edges of the graph model, tying all nodes together. There are multiple types of selectors, all captured in this enum.
SelectorBuilder
A SelectorBuilder is a recipe that, when applied, identifies the target of an annotation and the part of the target that the annotation applies to. They produce a Selector. You turn a SelectorBuilder into a Selector using AnnotationStore::selector.
SelectorKind
See Selector, this is a simplified variant that carries only the type, not the target.
StamError
This enum groups the different kind of errors that this STAM library can produce
TextMode
Determines whether a text search is exact (case sensitive) or case insensitive.
TextSelectionOperator
The TextSelectionOperator, simply put, allows comparison of two TextSelection instances. It allows testing for all kinds of spatial relations (as embodied by this enum) in which two TextSelection instances can be, such as overlap, embedding, adjacency, etc…
TextValidationMode
TranslationSide
TranspositionSide
Type
An enumeration of STAM data types. This is used for introspection via TypeInfo.

Traits§

AnnotationIterator
Trait for iteration over annotations (ResultItem<Annotation>; encapsulation over Annotation). Implements numerous filter methods to further constrain the iterator, as well as methods to map from annotations to other items.
AssociateSubStore
AssociatedFile
Configurable
DataIterator
Trait for iteration over annotation data (ResultItem<AnnotationData>; encapsulation over AnnotationData). Implements numerous filter methods to further constrain the iterator, as well as methods to map from annotation data to other items.
DataSetIterator
Trait for iteration over datasets (ResultItem<AnnotationDataSet>; encapsulation over AnnotationDataSet). Implements numerous filter methods to further constrain the iterator, as well as methods to map from keys to other items.
FindText
This trait provides text-searching methods that operate on structures that hold or represent text content. It builds upon the lower-level Text trait.
FromCsv
FromJson
Handle
The handle trait is implemented for various handle types. They have in common that refer to the internal id of a Storable item in a struct implementing StoreFor by index. Types implementing this are lightweight and do not borrow anything, they can be passed and copied freely. This is a sealed trait, not implementable outside this crate.
IRI
IteratorToValue
An iterator that grabs the first available data value
KeyIterator
Trait for iteration over data keys (ResultItem<DataKey>; encapsulation over DataKey). Implements numerous filter methods to further constrain the iterator, as well as methods to map from keys to other items.
LimitIterator
An iterator that can extract an arbitrary subrange, even with relative coordinates (at which point it will allocate a buffer)
MaybeSortedIterator
An iterator that may be sorted or not and knows a-priori whether it is or not.
Request
This trait is implemented for types that can serve as a request for a specific item of type T from the store. It is typically implemented on strings (both owned and borrowed) in which case the request is for a particular public identifier, or it is implemented on handles.
ResourcesIterator
Trait for iteration over resources (ResultItem<TextResource>; encapsulation over TextResource). Implements numerous filter methods to further constrain the iterator, as well as methods to map from resources to other items.
SelfSelector
This trait is implemented by types that can return a Selector to themselves
SortTextualOrder
This trait allows sorting a collection in textual order, meaning that items are returned in the same order as they appear in the original text.
StamResult
This trait defines the Self::or_fail method that is used to turn an Option<T> into Result<T,StamError>.
Storable
This is a low-level trait that is implemented on the various STAM data structures that are held in a store, such as Annotation, AnnotationData,TextResource, etc.. All storable elements have a Handle, defined by the associated Self::HandleType. It corresponds directly to their index in a vector, so this type is a simple wrapper around usize. This is a sealed trait, not implementable outside this crate.
StoreFor
This trait is implemented on types that provide storage for a certain other generic type (T) It belongs to the low-level API. It is a sealed trait, not implementable outside this crate.
TestTextSelection
This trait defines the test() methods for testing relations between two text selections (or sets thereof).
TestableIterator
This iterator implements a simple .test() method that just checks whether an iterator is empty or yields results. It is implemented alongside traits like AnnotationIterator, DataIterator, etc…
Text
This trait provides methods that operate on structures that hold or represent text content. They are fairly low-level methods but are exposed in the public API. The FindText trait subsequently builds upon this one with high-level search methods.
TextSelectionIterator
Trait for iteration over text selections (ResultTextSelection; encapsulation over TextSelection). Implements numerous filter methods to further constrain the iterator, as well as methods to map from text selections to other items.
ToCsv
ToHandles
This trait is implemented on iterators over ResultItem<T> and turns effectively collects these items, by only their handles and a reference to a store, as Handles<T>. It is implemented alongside traits like AnnotationIterator, DataIterator, etc…
ToJson
Translatable
Transposable
TypeInfo
This trait provides some introspection on STAM data types. It is a sealed trait that can not be implemented.

Functions§

compare_annotation_textual_order
generate_id
Generate an ID with a random 21-byte and ID/URI-safe component This does no collision check (but they will be extremely unlikely)
is_iri
Tests whether a string is a valid IRI
regenerate_id
Take an existing ID an apply a update stategy to create a derived new ID

Type Aliases§

AnnotationDataSets
Holds a collection of AnnotationDataSet (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over keys (ResultItem<AnnotationDataSet>).
Annotations
Holds a collection of Annotation (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over annotations (ResultItem<Annotation>).
Data
Holds a collection of AnnotationData (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over data.
HandlesIter
Iterator over the handles in a Handles<T> collection.
Keys
Holds a collection of DataKey (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over keys (ResultItem<DataKey>).
QueryPath
This points to a particular subquery inside a query
QueryPathRef
Resources
Holds a collection of TextResource (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over resources (ResultItem<TextResource>).
Store
Type for Store elements. The struct that owns a field of this type should implement the trait StoreFor<T> This is a low-level construct. Do not confuse with AnnotationStore.
TextSelections
Holds a collection of TextSelection (by reference to an AnnotationStore and handles). This structure is produced by calling ToHandles::to_handles(), which is available on all iterators over texts selections.