pub struct CommitMessageProcessor { /* private fields */ }Expand description
Commit message preprocessor that applies NLP transformations.
This processor applies a standard NLP pipeline:
- Tokenization (word-level with punctuation handling)
- Lowercasing
- Stop words filtering (with custom software engineering stop words)
- Stemming (Porter stemmer)
§Examples
use organizational_intelligence_plugin::nlp::CommitMessageProcessor;
let processor = CommitMessageProcessor::new();
let message = "fix: race condition in mutex lock";
let tokens = processor.preprocess(message).unwrap();
assert!(tokens.contains(&"race".to_string()));
assert!(tokens.contains(&"condit".to_string())); // StemmedImplementations§
Source§impl CommitMessageProcessor
impl CommitMessageProcessor
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new commit message processor with default settings.
Uses:
- WordTokenizer for tokenization
- English stop words with custom software engineering adjustments
- Porter stemmer for normalization
§Examples
use organizational_intelligence_plugin::nlp::CommitMessageProcessor;
let processor = CommitMessageProcessor::new();Sourcepub fn with_custom_stop_words<I, S>(custom_stop_words: I) -> Self
pub fn with_custom_stop_words<I, S>(custom_stop_words: I) -> Self
Create a processor with custom stop words.
Useful for domain-specific filtering (e.g., transpiler development).
§Arguments
custom_stop_words- Additional stop words to filter
§Examples
use organizational_intelligence_plugin::nlp::CommitMessageProcessor;
let processor = CommitMessageProcessor::with_custom_stop_words(vec!["depyler", "internal"]);Sourcepub fn preprocess(&self, message: &str) -> Result<Vec<String>>
pub fn preprocess(&self, message: &str) -> Result<Vec<String>>
Preprocess a commit message into normalized tokens.
Pipeline:
- Tokenize into words
- Lowercase
- Filter stop words
- Stem to root forms
§Arguments
message- Raw commit message
§Returns
Ok(Vec<String>)- Normalized tokensErr- If preprocessing fails
§Examples
use organizational_intelligence_plugin::nlp::CommitMessageProcessor;
let processor = CommitMessageProcessor::new();
let tokens = processor.preprocess("fix: memory leak in parser").unwrap();
assert!(tokens.contains(&"memori".to_string())); // Stemmed "memory"
assert!(tokens.contains(&"leak".to_string()));
assert!(tokens.len() >= 2); // At least "memori" and "leak"Sourcepub fn extract_ngrams(&self, tokens: &[String], n: usize) -> Result<Vec<String>>
pub fn extract_ngrams(&self, tokens: &[String], n: usize) -> Result<Vec<String>>
Extract n-grams from a list of tokens.
N-grams are contiguous sequences of n tokens. Useful for detecting multi-word patterns like “null pointer” or “race condition”.
§Arguments
tokens- Input tokensn- Size of n-grams (1 = unigrams, 2 = bigrams, 3 = trigrams)
§Returns
Ok(Vec<String>)- N-grams joined with underscoresErr- If n is 0 or greater than token count
§Examples
use organizational_intelligence_plugin::nlp::CommitMessageProcessor;
let processor = CommitMessageProcessor::new();
let tokens: Vec<String> = vec![
"fix".to_string(),
"race".to_string(),
"condition".to_string(),
"mutex".to_string(),
];
let bigrams = processor.extract_ngrams(&tokens, 2).unwrap();
assert!(bigrams.contains(&"fix_race".to_string()));
assert!(bigrams.contains(&"race_condition".to_string()));Sourcepub fn preprocess_with_ngrams(
&self,
message: &str,
) -> Result<(Vec<String>, Vec<String>)>
pub fn preprocess_with_ngrams( &self, message: &str, ) -> Result<(Vec<String>, Vec<String>)>
Preprocess and extract both unigrams and bigrams.
Convenience method that combines preprocessing with n-gram extraction. Useful for feature extraction in ML models.
§Arguments
message- Raw commit message
§Returns
Ok((Vec<String>, Vec<String>))- (unigrams, bigrams)Err- If preprocessing fails
§Examples
use organizational_intelligence_plugin::nlp::CommitMessageProcessor;
let processor = CommitMessageProcessor::new();
let (unigrams, bigrams) = processor.preprocess_with_ngrams("fix: memory leak defect").unwrap();
assert!(unigrams.contains(&"memori".to_string())); // Stemmed "memory"
assert!(unigrams.contains(&"leak".to_string()));
assert!(!bigrams.is_empty()); // Should have bigramsTrait Implementations§
Source§impl Clone for CommitMessageProcessor
impl Clone for CommitMessageProcessor
Source§fn clone(&self) -> CommitMessageProcessor
fn clone(&self) -> CommitMessageProcessor
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for CommitMessageProcessor
impl Debug for CommitMessageProcessor
Auto Trait Implementations§
impl Freeze for CommitMessageProcessor
impl RefUnwindSafe for CommitMessageProcessor
impl Send for CommitMessageProcessor
impl Sync for CommitMessageProcessor
impl Unpin for CommitMessageProcessor
impl UnwindSafe for CommitMessageProcessor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.