lrtc: Low-resource text classification
This crate is a Rust implementation of the low-resource text classification method introduced in Jiang et al. (2023). This implementation allows you to choose from gzip, zstd, zlib, or deflate compression algorithms, at various levels of compression.
use ;
let training = vec!;
let training_labels = vec!;
let queries = vec!;
// Using a compression level of 3, and 1 nearest neighbor:
println!;
This method seems to perform decently well for relatively sparse training sets, and does not require the same amount of tuning as neural net methods.
use Reader;
use ;
use File;
let imdb = open.unwrap;
let mut reader = from_reader;
let imdb = open.unwrap;
let mut reader = from_reader;
let mut content = Vecwith_capacity;
let mut label = Vecwith_capacity;
for record in reader.records
let predictions = classify
let correct = predictions
.iter
.zip
.filter
.count;
println!
// 0.623
References
Zhiying Jiang, Matthew Yang, Mikhail Tsirlin, Raphael Tang, Yiqin Dai, and Jimmy Lin. 2023. “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6810–6828, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.findings-acl.426