Crate lrtc

Source
Expand description

Rust implementation of low-resource text classification

This crate is a Rust implementation of Jiang et al (2023), using text compressors to efficiently classify text snippets via k-nearest neighbors.

Full method citation: Zhiying Jiang, Matthew Yang, Mikhail Tsirlin, Raphael Tang, Yiqin Dai, and Jimmy Lin. 2023. “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6810–6828, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.findings-acl.426

§Examples

use lrtc::{CompressionAlgorithm, classify};

let training = vec!["some normal sentence".to_string(), "godzilla ate mars in June".into(),];
let training_labels = vec!["normal".to_string(), "godzilla".into(),];
let queries = vec!["another normal sentence".to_string(), "godzilla eats marshes in August".into(),];
// Using a compression level of 3, and 1 nearest neighbor:
println!("{:?}", classify(&training, &training_labels, &queries, 3i32, CompressionAlgorithm::Gzip, 1usize));

Structs§

NCD
NCD struct
TrainingData
Training data struct

Enums§

CompressionAlgorithm
Available compression algorithms

Functions§

classify
Classify sentences based on their distance from a set of labeled training data.
compressed_length
Calculate the length of an input string once compressed
ncd
Calculate a vector of NCD values for a given query