Expand description
§oscar-io
Types and IO (Reader/Writer) for OSCAR Corpus processing and generation.
The crate provides basic abstractions around Corpus items and generic readers/writers useable in OSCAR Corpus files. At some time, it should replace reader implementations in both Ungoliant and oscar-tools.
§Features
oscar-io aims to provide readers/writers for numerous types of OSCAR Corpora.
§OSCAR v2
- Reader
- Uncompressed oscar_doc::Reader::new
- GZipped oscar_doc::Reader::from_gzip
- Parquet
- Writer
- Uncompressed oscar_doc::Writer::new
-
GZipped oscar_doc::Writer::new (using a [GzEncoder] reader,
from_gzipnot yet implemented) - Parquet
- SplitReader (Should be unified with SplitReader with
split_size: Option<u64>)- Uncompressed
- GZipped
- SplitWriter (Same)
- Uncompressed
- GZipped
§OSCAR v1.1
- Reader
- Writer
-
SplitReader (Should be unified with SplitReader with
split_size: Option<u64>) - SplitWriter (Same)
§OSCAR v1
- Reader
- Writer
- SplitReader
- SplitWriter
Re-exports§
pub use error::Error;