sklears-feature-extraction 0.1.0-beta.1

Feature extraction from raw data (text, images)
Documentation

sklears-feature-extraction

Crates.io Documentation License Minimum Rust Version

Latest release: 0.1.0-beta.1 (January 1, 2026). See the workspace release notes for highlights and upgrade guidance.

Overview

sklears-feature-extraction contains text, signal, and image feature transformers designed to mirror scikit-learn’s feature extraction API with Rust-first performance.

Key Features

  • Text Processing: CountVectorizer, TfidfVectorizer, HashingVectorizer, N-gram analyzers, character models.
  • Image Features: Patch extraction, HOG descriptors, SIFT-like outlines, and GPU pipelines.
  • Signal Features: Windowed statistics, spectrograms, wavelet transforms, and FFT-based descriptors.
  • Pipeline Support: Integrates with sklears preprocessing, selection, and model selection crates.

Quick Start

use sklears_feature_extraction::text::TfidfVectorizer;

let docs = vec![
    "Rust brings fearless concurrency",
    "Machine learning in Rust is fast",
];

let vectorizer = TfidfVectorizer::builder()
    .ngram_range((1, 2))
    .min_df(1)
    .max_features(Some(4096))
    .build();

let tfidf = vectorizer.fit_transform(&docs)?;

Status

  • Extensively tested via the 11,292 passing workspace suites shipped in 0.1.0-beta.1.
  • Offers >99% parity with scikit-learn’s feature extraction module, plus GPU paths.
  • Additional work (streaming text ingestion, audio-specific transforms) documented in TODO.md.