Docs.rs
  • ungoliant-2.0.0
    • ungoliant 2.0.0
    • Permalink
    • Docs.rs crate page
    • Apache-2.0
    • Links
    • Homepage
    • Repository
    • crates.io
    • Source
    • Owners
    • pjox
    • Uinelj
    • Dependencies
      • avro-rs ^0.13.0 normal
      • bytes ^1 normal
      • csv ^1.1.6 normal
      • ctclib-pp ^0.2.0 normal optional
      • env_logger ^0.8.3 normal
      • fasttext ^0.7.6 normal
      • flate2 ^1.0.20 normal
      • futures ^0.3 normal
      • futures-core ^0.3 normal
      • futures-util ^0.3 normal
      • glob ^0.3.0 normal
      • itertools ^0.10.0 normal
      • language-tags ^0.3.2 normal
      • lazy_static ^1.4.0 normal
      • log ^0.4.14 normal
      • oscar-io ^0.2.2 normal
      • oxilangtag ^0.1.3 normal
      • rand ^0.8.4 normal
      • rayon ^1 normal
      • reqwest ^0.11 normal
      • runiq-lib ^1.2.2 normal
      • schemars ^0.8.3 normal
      • serde ^1 normal
      • serde_json ^1 normal
      • sha2 ^0.9.5 normal
      • structopt ^0.3.21 normal
      • tlsh-fixed ^0.1.1 normal
      • tokio ^1 normal
      • tokio-util ^0.6.6 normal
      • twox-hash ^1.6 normal
      • unic-ucd ^0.9.0 normal
      • unicode-script ^0.5.4 normal
      • unicode-segmentation ^1.8.0 normal
      • url ^2.2.2 normal
      • ut1_blocklist ^0.3.0 normal
      • warc ^0.3.0 normal
      • criterion ^0.3 dev
      • rand_distr ^0.4.2 dev
      • serial_test ^0.5.1 dev
      • sha-1 ^0.9 dev
      • tempfile ^3.2.0 dev
      • test-log ^0.2.11 dev
    • Versions
    • 58.43% of the crate is documented
  • Platform
    • x86_64-unknown-linux-gnu
  • Feature flags
  • docs.rs
    • About docs.rs
    • Privacy policy
  • Rust
    • Rust website
    • The Book
    • Standard Library API Reference
    • Rust by Example
    • The Cargo Guide
    • Clippy Documentation

ungoliant2.0.0

Crate Items

  • Structs
  • Enums
  • Traits
  • Functions

List of all items

Structs

  • filtering::record::PFilter
  • filtering::sentence::Length
  • filtering::sentence::MeanLength
  • identifiers::Multilingual
  • identifiers::StrictMultilingual
  • io::LangFilesDoc
  • pipelines::oscardoc::OscarDoc
  • pipelines::oscardoc::types::Document
  • pipelines::oscardoc::types::IncompleteLocation
  • pipelines::oscardoc::types::Location
  • pipelines::oscardoc::types::LocationBuilder
  • pipelines::oscardoc::types::Metadata
  • pipelines::oscardoc::types::RebuildInformation
  • pipelines::oscardoc::types::RebuildWriters
  • pipelines::oscardoc::types::ShardResult
  • processing::check::Zipf
  • processing::check::ZipfEntry
  • processing::rebuild::Rebuilder
  • processing::rebuild::RecordIterator
  • processing::rebuild::SRIterator
  • sources::commoncrawl::Wet
  • transformers::Annotator
  • transformers::ContentDetector
  • transformers::Conv
  • transformers::Header
  • transformers::LSH
  • transformers::Noisy
  • transformers::RemoveShortSentences
  • transformers::ShortSentences
  • transformers::TinyDocument

Enums

  • error::Error
  • filtering::record::FilterKind

Traits

  • filtering::Filter
  • filtering::FilterMut
  • pipelines::pipeline::Pipeline
  • transformers::Annotate
  • transformers::Transform

Functions

  • processing::check::check