Expand description
§text-transcripts
Transcript parsing, ASR command adapters, and native whisper.cpp support for moritzbrantner-video-analysis.
§Feature flags
external-tests: enables ignored CLI-backed smoke testsnative: builds whisper.cpp support for offline transcription. Repository builds usevendor/whisper.cpp; crates.io builds must setWHISPER_CPP_SOURCE_DIRto a local whisper.cpp source checkout.
§Stable contract
The stable surface is transcript contracts, segment/word normalization,
SRT/WebVTT/plain/Whisper JSON parsing, formatting, conversion to
TextSegmentContract, and transcript-specific text pipeline analyzers.
§Quality and limits
Default package operations parse and format text only. ASR command adapters and native whisper.cpp transcription remain explicit runtime paths and are not invoked by default package-surface operations.
§Example
use text_transcripts::{parse_whisper_json, TranscriptionContract};
let parsed = parse_whisper_json(include_bytes!("../../../../tests/fixtures/whisper-sample.json"))?;
let transcript = TranscriptionContract::from(parsed).normalized()?;
assert!(!transcript.text_or_joined().is_empty());§Package surface
- Primary workflow:
transcripts.parseparses plain text, Whisper JSON, SRT, or WebVTT into the normalized transcript contract. - Workflow operations:
transcripts.parse,transcripts.normalize,transcripts.formatSrt,transcripts.formatWebVtt, andtranscripts.toTextSegments. - Debug operations:
describeinspects package metadata and operation support. - Runtime support: pure Rust parsing/formatting package-surface operations are available through library, CLI, server, and WASM wrappers.
- Sample output includes
title,message,summary,result, and operation-specific fields such assegments,text,srt, orwebVtt. - Package-surface operations do not invoke whisper.cpp or external ASR tools; native transcription remains feature-gated.
§Native whisper.cpp
The transcript parsers are loadable in default builds. whisper.cpp catalog and
model-store validation is available behind native; transcription only runs
when the requested model file is present or an opt-in setup flow downloads it.
cargo test -p text-transcripts --features native,external-tests -- --ignoredBrowser benchmarks cover parse, normalize, and SRT formatting workflows through
bun run text-wasm:bench:all.
§Related crates
text-corevideo-analysis-ingestvideo-analysis-use-cases
Re-exports§
pub use contracts::text_segment_contract_with_source;pub use contracts::TranscriptCharContract;pub use contracts::TranscriptSegmentContract;pub use contracts::TranscriptWordContract;pub use contracts::TranscriptionContract;
Modules§
Structs§
- Command
Transcriber - Data type for command transcriber.
- Command
Transcriber Options - Options for command transcriber construction.
- Native
Whisper CppTranscriber - Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp transcriber.
- Subtitle
Normalization Options - Options for subtitle text normalization.
- Transcript
Heuristic Analyzer - Transcript-specific deterministic analyzer.
- Transcript
Segment - Data type for transcript segment.
- Transcript
Segment Source - Data type for transcript segment source.
- Transcript
Word - Optional word-level transcript timing.
- Transcription
Result - Data type for transcription result.
- Whisper
CliTranscriber - Data type for whisper cli transcriber.
- Whisper
CliTranscriber Options - Options for whisper CLI transcriber construction.
- Whisper
CppCatalog - Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp catalog.
- Whisper
CppConfig - Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp config.
- Whisper
CppModel Status - Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp model status.
- Whisper
CppModel Store - Re-exports the text transcript native whisper.cpp API. Data type for model store.
- Whisper
CppProgress Event - Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp progress event.
- Whisper
CppSegment - Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp segment.
- Whisper
CppTranscriber - Data type for whisper cpp transcriber.
Enums§
- Transcript
Format - Variants describing transcript format.
- Transcription
Error - Variants describing transcription error.
- Whisper
CppError - Re-exports the text transcript native whisper.cpp API. Variants describing whisper cpp error.
- Whisper
CppModel - Re-exports the text transcript native whisper.cpp API. Variants describing whisper cpp model.
- Whisper
CppPhase - Re-exports the text transcript native whisper.cpp API. Variants describing whisper cpp phase.
Traits§
- Transcriber
- Trait for transcriber implementations.
Functions§
- format_
srt - Returns format srt.
- format_
srt_ timestamp - Returns format srt timestamp.
- format_
webvtt - Returns format webvtt.
- normalize_
imported_ segments - Builds and normalizes a transcription contract from imported transcript segments.
- normalize_
subtitle_ text - Normalizes subtitle cue text without parsing timing blocks.
- normalize_
transcription_ contract - Normalizes an existing transcription contract.
- parse_
normalized_ transcript_ file - Parses and normalizes a transcript file into the stable transcript contract.
- parse_
plain_ lines - Parses parse plain lines.
- parse_
srt - Parses parse srt.
- parse_
transcript_ file - Parses a transcript file by inferring the format from its extension.
- parse_
webvtt - Parses parse webvtt.
- parse_
whisper_ json - Parses parse whisper JSON.
- parse_
whisperx_ json - Parses WhisperX JSON into the shared transcription contract.
- segment_
to_ owned_ text_ segment - Returns segment to owned text segment.
- transcribe_
waveform_ batch - Returns transcribe waveform batch.
- whisper_
cpp_ catalog - Re-exports the text transcript native whisper.cpp API. Returns transcription catalog.
- whisper_
cpp_ system_ info - Re-exports the text transcript native whisper.cpp API. Returns whisper cpp system info.
- write_
srt - Writes srt.
Type Aliases§
- Result
- Type alias for result.