Expand description
OSCAR Schema v2 (OSCAR 22.01) types, readers and writers.
Each document is materialized by a Document, holding Metadata, WarcHeaders and content
(that is a String).
Structs
- A Document is a structure holding content, WARC headers and OSCAR-specific metadata.
- OSCAR Metadata. Contains document identification, annotations and sentence-level identifications.
- Document reader. The inner type has to implement BufRead.
- In the case where we have multiple splits for a given subcorpus.
Type Definitions
- Simple alias to a HashMap<String, String> for simplifying the sourcecode.