[−][src]Crate text_analysis
Text_Analysis
Analyze text stored as *.txt or *pdf in provided file or directory. Doesn't read files in subdirectories. Counting all words and then searching for every unique word in the vicinity (+-5 words). Stores results in file [date/time]results_word_analysis.txt in given directory.
Usage: text_analysis path/to/directory_or_file
Example
use text_analysis::{count_words, save_file, sort_map_to_vec, trim_to_words, words_near}; use std::collections::HashMap; //Create an example string. Would normally be read from file. Words constisting of just one char will be ingnored. let content_string: String = "An example phrase including two times the word two".to_string(); let content_vec: Vec<String> = trim_to_words(content_string).unwrap(); //Count frequency in HashMap and sort HashMap to Vec according to frequency let word_frequency = count_words(&content_vec).unwrap(); let words_sorted = sort_map_to_vec(word_frequency).unwrap(); //Search for words +-5 near each unique word, count them and insert in Hashmap let mut index_rang: usize = 0; let mut words_near_map: HashMap<String, HashMap<String, u32>> = HashMap::new(); for word in &words_sorted { words_near_map.extend(words_near(&word, index_rang, &content_vec, &words_sorted).unwrap()); index_rang += 1; } //prepare output as String. Afterwards you may e.g. write this String to a file. let mut result_as_string = String::new(); //fill the String with word, frequency, words near for word in words_sorted { let (word_only, frequency) = &word; let words_near = &words_near_map[word_only]; let combined = format!( "Word: {:?}, Frequency: {:?},\nWords near: {:?} \n\n", word_only, frequency, sort_map_to_vec(words_near.to_owned()).unwrap() ); result_as_string.push_str(&combined); } //print resulting String println!("{:?}", result_as_string);
Functions
count_words | Count words included in given &Vec |
save_file | save file to path. Return result. |
sort_map_to_vec | Sort words in HashMap<Word, Frequency> according to frequency into Vector. Returns result. |
trim_to_words | Splits content of file into singe words as Vector |
words_near | Search for words +-5 around given word. Returns result. |