Expand description
String Kernel Approximations
This module implements various string kernel approximation methods for sequence and text analysis. String kernels measure similarity between sequences of symbols (characters, words, etc.) by counting shared subsequences or n-grams.
§Key Features
- N-gram Kernels: Count shared n-grams between sequences
- Spectrum Kernels: Fixed-length contiguous substring kernels
- Subsequence Kernels: Count all shared subsequences with gaps
- Edit Distance Approximations: Approximate edit distance kernels
- Mismatch Kernels: Allow for mismatches in n-gram comparisons
- Weighted Subsequence Kernels: Weight subsequences by length and gaps
§Mathematical Background
String kernel between sequences s and t: K(s, t) = Σ φ(s)[u] * φ(t)[u]
Where φ(s)[u] is the feature map that counts occurrences of substring u.
§References
- Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis
- Lodhi, H., et al. (2002). Text classification using string kernels
Structs§
- Edit
Distance Kernel - Edit distance approximation kernel EditDistanceKernel
- Fitted
Edit Distance Kernel - Fitted edit distance kernel FittedEditDistanceKernel
- Fitted
Mismatch Kernel - Fitted mismatch kernel FittedMismatchKernel
- FittedN
Gram Kernel - Fitted n-gram kernel FittedNGramKernel
- Fitted
Spectrum Kernel - Fitted spectrum kernel FittedSpectrumKernel
- Fitted
Subsequence Kernel - Fitted subsequence kernel (computes full kernel matrix) FittedSubsequenceKernel
- Mismatch
Kernel - Mismatch kernel that allows k mismatches in n-grams MismatchKernel
- NGram
Kernel - N-gram kernel for sequences NGramKernel
- Spectrum
Kernel - Spectrum kernel for fixed-length contiguous substrings SpectrumKernel
- Subsequence
Kernel - Subsequence kernel that counts all shared subsequences (with gaps) SubsequenceKernel
Enums§
- NGram
Mode - N-gram extraction mode NGramMode