Module oscar_io::oscar_doc

source ·
Expand description

OSCAR Schema v2 (OSCAR 22.01) types, readers and writers.

Each document is materialized by a Document, holding Metadata, WarcHeaders and content (that is a String).

Structs

  • A Document is a structure holding content, WARC headers and OSCAR-specific metadata.
  • OSCAR Metadata. Contains document identification, annotations and sentence-level identifications.
  • Document reader. The inner type has to implement BufRead.
  • In the case where we have multiple splits for a given subcorpus.

Type Definitions

  • Simple alias to a HashMap<String, String> for simplifying the sourcecode.