Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Rust TF-IDF
Library to calculate TF-IDF (Term Frequency - Inverse Document Frequency)
for generic documents. The library provides strategies to act on objects
that implement certain document traits (NaiveDocument
, ProcessedDocument
,
ExpandableDocument
).
For more information on the strategies that were implemented, check out Wikipedia.
Document Types
A document is defined as a collection of terms. The documents don't make assumptions about the term types (the terms are not normalized in any way).
These document types are of my design. The terminology isn't standard, but they are fairly straight forward to understand.
-
NaiveDocument
- A document is 'naive' if it only knows if a term is contained within it or not, but does not know HOW MANY of the instances of the term it contains. -
ProcessedDocument
- A document is 'processed' if it knows how many instances of each term is contained within it. -
ExpandableDocument
- A document is 'expandable' if provides a way to access each term contained within it.
Example
The most simple way to calculate the TfIdf of a document is with the default
implementation. Note, the library provides implementation of
ProcessedDocument
, for a Vec<(T, usize)>
.
use ;
let mut docs = Vec new;
let doc1 = vec!;
let doc2 = vec!;
docs.push;
docs.push;
assert_eq!;
assert!;
You can also roll your own strategies to calculate tf-idf using some strategies included in the library.
use ;
use ;
use ;
;
let mut docs = Vec new;
let doc1 = vec!;
let doc2 = vec!;
docs.push;
docs.push;
assert!;
assert!;