pub fn load_vocabulary() -> &'static HashSet<String>
Loads a vocabulary for compound word splitting This is a simplified version that could be expanded with a real dictionary