Crate stam

source ·
Expand description

§Introduction

STAM is a standalone data model for stand-off text annotation. This is a software library to work with the model from Rust, and is the primary library/reference implementation for STAM. It aims to implement the full model as per the STAM specification and most of the extensions.

What can you do with this library?

  • Keep, build and manipulate an efficient in-memory store of texts and annotations on texts
  • Search in annotations, data and text, either programmatically or via the STAM Query Language.
    • Search annotations by data, textual content, relations between text fragments (overlap, embedding, adjacency, etc).
    • Search in text (incl. via regular expressions) and find annotations targeting found text selections.
    • Elementary text operations with regard for text offsets (splitting text on a delimiter, stripping text).
    • Search in data (set,key,value) and find annotations that use the data.
    • Convert between different kind of offsets (absolute, relative to other structures, UTF-8 bytes vs unicode codepoints, etc)
  • Read and write resources and annotations from/to STAM JSON, STAM CSV, or an optimised binary (CBOR) representation.
    • The underlying STAM model aims to be clear and simple. It is flexible and does not commit to any vocabulary or annotation paradigm other than stand-off annotation.

This STAM library is intended as a foundation upon which further applications can be built that deal with stand-off annotations on text. We implement all the low-level logic in dealing this so you no longer have to and can focus on your actual application. The library is written with performance in mind.

This is the root module for the STAM library. The STAM library consists of two APIs, a low-level API and a high-level API, the latter is of most interest to end users and is implemented in api/*.rs.

§Table of Contents (abridged)

Structs§

Enums§

  • This determines how far to look up or down in an annotation hierarchy tree formed by AnnotationSelectors.
  • BuildItem offers various ways of referring to a data structure of type T in the core STAM model It abstracts over public IDs (both owned an and borrowed), handles, and references.
  • A constraint is a part of a Query that poses specific selection criteria that must be met. A query can have multiple constraints which must all be satisfied. See the documentation for Query for examples.
  • A cursor points to a specific point in a text. I Used to select offsets. Units are unicode codepoints (not bytes!) and are 0-indexed.
  • Data formats for serialisation and deserialisation supported by the library.
  • This type defines a test that can be done on a DataValue (via DataValue::test()). The operator does not merely consist of the operator-part, but also holds the value that is tested against, which may be one of various types, hence the many variants of this type.
  • This type encapsulates a value and its type. It is held by AnnotationData alongside a reference to a DataKey, resulting in a key/value pair.
  • This determines how a filter is applied when there the filter is provided with multiple reference instances to match against. It determines if the filter requires a match with any of the instances (default), or with all of them.
  • The offset mode represents the ways in which the user can specify an Offset, it expresses whether the cursors (Cursor) for the begin and end positions of the offset are specified as begin-aligned or end-aligned.
  • Used as a parameter for TextResource::positions()
  • This structure encapsulates the different kind of result items that can be returned from queries. See AnnotationStore::query() for an example of it in use.
  • This type abstracts over all the main iterators. This abstraction uses dynamic dispatch so comes with a small performance cost
  • Holds the type of a Query.
  • This structure holds a TextSelection, along with references to its TextResource and the AnnotationStore and provides a high-level API on it.
  • This is determines whether a query Constraint is applied normally or with a particular altered meaning.
  • A Selector identifies the target of an annotation and the part of the target that the annotation applies to. Selectors can be considered the labelled edges of the graph model, tying all nodes together. There are multiple types of selectors, all captured in this enum.
  • A SelectorBuilder is a recipe that, when applied, identifies the target of an annotation and the part of the target that the annotation applies to. They produce a Selector. You turn a SelectorBuilder into a Selector using AnnotationStore::selector.
  • See Selector, this is a simplified variant that carries only the type, not the target.
  • This enum groups the different kind of errors that this STAM library can produce
  • Determines whether a text search is exact (case sensitive) or case insensitive.
  • The TextSelectionOperator, simply put, allows comparison of two TextSelection instances. It allows testing for all kinds of spatial relations (as embodied by this enum) in which two TextSelection instances can be, such as overlap, embedding, adjacency, etc…
  • An enumeration of STAM data types. This is used for introspection via TypeInfo.

Traits§

  • Trait for iteration over annotations (ResultItem<Annotation>; encapsulation over Annotation). Implements numerous filter methods to further constrain the iterator, as well as methods to map from annotations to other items.
  • Trait for iteration over annotation data (ResultItem<AnnotationData>; encapsulation over AnnotationData). Implements numerous filter methods to further constrain the iterator, as well as methods to map from annotation data to other items.
  • Trait for iteration over datasets (ResultItem<AnnotationDataSet>; encapsulation over AnnotationDataSet). Implements numerous filter methods to further constrain the iterator, as well as methods to map from keys to other items.
  • This trait provides text-searching methods that operate on structures that hold or represent text content. It builds upon the lower-level Text trait.
  • The handle trait is implemented for various handle types. They have in common that refer to the internal id of a Storable item in a struct implementing StoreFor by index. Types implementing this are lightweight and do not borrow anything, they can be passed and copied freely. This is a sealed trait, not implementable outside this crate.
  • Trait for iteration over data keys (ResultItem<DataKey>; encapsulation over DataKey). Implements numerous filter methods to further constrain the iterator, as well as methods to map from keys to other items.
  • An iterator that may be sorted or not and knows a-priori whether it is or not.
  • This trait is implemented for types that can serve as a request for a specific item of type T from the store. It is typically implemented on strings (both owned and borrowed) in which case the request is for a particular public identifier, or it is implemented on handles.
  • Trait for iteration over resources (ResultItem<TextResource>; encapsulation over TextResource). Implements numerous filter methods to further constrain the iterator, as well as methods to map from resources to other items.
  • This trait is implemented by types that can return a Selector to themselves
  • This trait allows sorting a collection in textual order, meaning that items are returned in the same order as they appear in the original text.
  • This trait defines the Self::or_fail method that is used to turn an Option<T> into Result<T,StamError>.
  • This is a low-level trait that is implemented on the various STAM data structures that are held in a store, such as Annotation, AnnotationData,TextResource, etc.. All storable elements have a Handle, defined by the associated Self::HandleType. It corresponds directly to their index in a vector, so this type is a simple wrapper around usize. This is a sealed trait, not implementable outside this crate.
  • This trait is implemented on types that provide storage for a certain other generic type (T) It belongs to the low-level API. It is a sealed trait, not implementable outside this crate.
  • This trait defines the test() methods for testing relations between two text selections (or sets thereof).
  • This iterator implements a simple .test() method that just checks whether an iterator is empty or yields results. It is implemented alongside traits like AnnotationIterator, DataIterator, etc…
  • This trait provides methods that operate on structures that hold or represent text content. They are fairly low-level methods but are exposed in the public API. The FindText trait subsequently builds upon this one with high-level search methods.
  • Trait for iteration over text selections (ResultTextSelection; encapsulation over TextSelection). Implements numerous filter methods to further constrain the iterator, as well as methods to map from text selections to other items.
  • This trait is implemented on iterators over ResultItem<T> and turns effectively collects these items, by only their handles and a reference to a store, as Handles<T>. It is implemented alongside traits like AnnotationIterator, DataIterator, etc…
  • This trait provides some introspection on STAM data types. It is a sealed trait that can not be implemented.

Functions§

Type Aliases§