Skip to main content

Crate text_transcripts

Crate text_transcripts 

Source
Expand description

§text-transcripts

Transcript parsing, ASR command adapters, and native whisper.cpp support for moenarch-video-analysis.

§Feature flags

  • external-tests: enables ignored CLI-backed smoke tests
  • native: builds whisper.cpp support for offline transcription. Builds use the packaged vendor/whisper.cpp source unless WHISPER_CPP_SOURCE_DIR points to another local whisper.cpp checkout.

§Stable contract

The stable surface is transcript contracts, segment/word normalization, SRT/WebVTT/plain/Whisper JSON parsing, formatting, conversion to TextSegmentContract, and transcript-specific text pipeline analyzers.

§Quality and limits

Default package operations parse and format text only. ASR command adapters and native whisper.cpp transcription remain explicit runtime paths and are not invoked by default package-surface operations.

§Example

use text_transcripts::{parse_whisper_json, TranscriptionContract};

let parsed = parse_whisper_json(include_bytes!("../../../../tests/fixtures/whisper-sample.json"))?;
let transcript = TranscriptionContract::from(parsed).normalized()?;

assert!(!transcript.text_or_joined().is_empty());

§Package surface

  • Primary workflow: transcripts.parse parses plain text, Whisper JSON, SRT, or WebVTT into the normalized transcript contract.
  • Workflow operations: transcripts.parse, transcripts.normalize, transcripts.formatSrt, transcripts.formatWebVtt, and transcripts.toTextSegments.
  • Debug operations: describe inspects package metadata and operation support.
  • Runtime support: pure Rust parsing/formatting package-surface operations are available through library, CLI, server, and WASM wrappers.
  • Sample output includes title, message, summary, result, and operation-specific fields such as segments, text, srt, or webVtt.
  • Package-surface operations do not invoke whisper.cpp or external ASR tools; native transcription remains feature-gated.

§Native whisper.cpp

The transcript parsers are loadable in default builds. whisper.cpp catalog and model-store validation is available behind native; transcription only runs when the requested model file is present or an opt-in setup flow downloads it.

cargo test -p text-transcripts --features native,external-tests -- --ignored

Browser benchmarks cover parse, normalize, and SRT formatting workflows through bun run text-wasm:bench:all.

  • text-core
  • video-analysis-ingest
  • video-analysis-use-cases

Re-exports§

pub use contracts::text_segment_contract_with_source;
pub use contracts::TranscriptCharContract;
pub use contracts::TranscriptSegmentContract;
pub use contracts::TranscriptWordContract;
pub use contracts::TranscriptionContract;

Modules§

contracts
surface
Library-owned runtime surface for text-transcripts.

Structs§

CommandTranscriber
Data type for command transcriber.
CommandTranscriberOptions
Options for command transcriber construction.
NativeWhisperCppTranscriber
Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp transcriber.
SubtitleNormalizationOptions
Options for subtitle text normalization.
TranscriptHeuristicAnalyzer
Transcript-specific deterministic analyzer.
TranscriptSegment
Data type for transcript segment.
TranscriptSegmentSource
Data type for transcript segment source.
TranscriptWord
Optional word-level transcript timing.
TranscriptionResult
Data type for transcription result.
WhisperCliTranscriber
Data type for whisper cli transcriber.
WhisperCliTranscriberOptions
Options for whisper CLI transcriber construction.
WhisperCppCatalog
Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp catalog.
WhisperCppConfig
Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp config.
WhisperCppModelStatus
Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp model status.
WhisperCppModelStore
Re-exports the text transcript native whisper.cpp API. Data type for model store.
WhisperCppProgressEvent
Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp progress event.
WhisperCppSegment
Re-exports the text transcript native whisper.cpp API. Data type for whisper cpp segment.
WhisperCppTranscriber
Data type for whisper cpp transcriber.

Enums§

TranscriptFormat
Variants describing transcript format.
TranscriptionError
Variants describing transcription error.
WhisperCppError
Re-exports the text transcript native whisper.cpp API. Variants describing whisper cpp error.
WhisperCppModel
Re-exports the text transcript native whisper.cpp API. Variants describing whisper cpp model.
WhisperCppPhase
Re-exports the text transcript native whisper.cpp API. Variants describing whisper cpp phase.

Traits§

Transcriber
Trait for transcriber implementations.

Functions§

format_srt
Returns format srt.
format_srt_timestamp
Returns format srt timestamp.
format_webvtt
Returns format webvtt.
normalize_imported_segments
Builds and normalizes a transcription contract from imported transcript segments.
normalize_subtitle_text
Normalizes subtitle cue text without parsing timing blocks.
normalize_transcription_contract
Normalizes an existing transcription contract.
parse_normalized_transcript_file
Parses and normalizes a transcript file into the stable transcript contract.
parse_plain_lines
Parses parse plain lines.
parse_srt
Parses parse srt.
parse_transcript_file
Parses a transcript file by inferring the format from its extension.
parse_webvtt
Parses parse webvtt.
parse_whisper_json
Parses parse whisper JSON.
parse_whisperx_json
Parses WhisperX JSON into the shared transcription contract.
segment_to_owned_text_segment
Returns segment to owned text segment.
transcribe_waveform_batch
Returns transcribe waveform batch.
whisper_cpp_catalog
Re-exports the text transcript native whisper.cpp API. Returns transcription catalog.
whisper_cpp_system_info
Re-exports the text transcript native whisper.cpp API. Returns whisper cpp system info.
write_srt
Writes srt.

Type Aliases§

Result
Type alias for result.