ngram_rs 0.1.1

Facilitate creating ngrams in Rust to be used in the polars plugin.
Documentation
  • Coverage
  • 100%
    5 out of 5 items documented3 out of 5 items with examples
  • Size
  • Source code size: 13.21 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.21 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 13s Average build duration of successful builds.
  • all releases: 13s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • ericqu/ngram-rs
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • ericqu

N-Gram Generation Toolkit

A high-performance n-gram generation library with Rust core and Polars plugin integration.

Features

  • Blazing Fast: Optimized Rust implementation for n-gram generation
  • Memory Efficient: Uses Cow (Copy-on-Write) for minimal allocations
  • Flexible N-Ranges: Generate n-grams for multiple values of n simultaneously
  • Custom Delimiters: Support for any string delimiter between tokens
  • Polars Integration: Seamless integration with Polars DataFrames
  • Iterator Support: Lazy n-gram generation for memory-constrained environments

Components

ngram_rs (Core Library)

The core Rust library providing:

  • Three different APIs for various use cases
  • Optimized implementations for common cases (unigrams, bigrams)
  • Iterator-based lazy generation

Quick Start

use ngram_rs::generate_ngrams;

let words = vec!["the", "quick", "brown", "fox"]
    .into_iter()
    .map(String::from)
    .collect::<Vec<_>>();

let ngrams = generate_ngrams(&words, &[1, 2, 3], Some(" "));

Performance

The library is optimized for:

  • Minimal memory allocations through Cow
  • Specialized implementations for unigrams and bigrams
  • Efficient windowing algorithms for higher-order n-grams
  • Zero-copy operations where possible