Module string_kernels

Expand description

String Kernel Approximations

This module implements various string kernel approximation methods for sequence and text analysis. String kernels measure similarity between sequences of symbols (characters, words, etc.) by counting shared subsequences or n-grams.

§Key Features

N-gram Kernels: Count shared n-grams between sequences
Spectrum Kernels: Fixed-length contiguous substring kernels
Subsequence Kernels: Count all shared subsequences with gaps
Edit Distance Approximations: Approximate edit distance kernels
Mismatch Kernels: Allow for mismatches in n-gram comparisons
Weighted Subsequence Kernels: Weight subsequences by length and gaps

§Mathematical Background

String kernel between sequences s and t: K(s, t) = Σ φ(s)[u] * φ(t)[u]

Where φ(s)[u] is the feature map that counts occurrences of substring u.

§References

Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis
Lodhi, H., et al. (2002). Text classification using string kernels

Structs§

EditDistanceKernel: Edit distance approximation kernel EditDistanceKernel
FittedEditDistanceKernel: Fitted edit distance kernel FittedEditDistanceKernel
FittedMismatchKernel: Fitted mismatch kernel FittedMismatchKernel
FittedNGramKernel: Fitted n-gram kernel FittedNGramKernel
FittedSpectrumKernel: Fitted spectrum kernel FittedSpectrumKernel
FittedSubsequenceKernel: Fitted subsequence kernel (computes full kernel matrix) FittedSubsequenceKernel
MismatchKernel: Mismatch kernel that allows k mismatches in n-grams MismatchKernel
NGramKernel: N-gram kernel for sequences NGramKernel
SpectrumKernel: Spectrum kernel for fixed-length contiguous substrings SpectrumKernel
SubsequenceKernel: Subsequence kernel that counts all shared subsequences (with gaps) SubsequenceKernel

Enums§

NGramMode: N-gram extraction mode NGramMode

Module string_kernels

Module string_kernels Copy item path

§Key Features

§Mathematical Background

§References

Structs§

Enums§

Module string_kernels