srt_subtitles_parser 0.1.1

A Rust-based parser that processes .srt (SubRip Subtitle) files and converts them into a structured data format
Documentation

Srt Subtitles Parser

Brief Description

Srt Subtitles Parser is a Rust-based parser that processes .srt (SubRip Subtitle) files. The parser reads .srt files validates their structure, and extracts subtitle entries consisting of index number, a start timestamp, an end timestamp and one or more lines of subtitle text. The parser converts the file into a structured data format, which can be used for:

  • Converting subtitles to other formats such as (WebVTT, JSON, CSV)
  • Performing time-based analysis (total duration, reading speed, gaps detection)
  • Validating subtitle file consistency (sequential numbering, non-overlapping timestamps)
  • Filtering, searching, or manipulating subtitle text
  • Synchronizing subtitles by shifting timecodes

Parsing Process

What is Being Parsed

The parser processes SRT subtitle files with the following structure:

1
00:00:01,000 --> 00:00:04,000
First line

2
00:00:05,500 --> 00:00:08,000
Second line

Each subtitle entry consists of:

  • Index: sequential number identifying the subtitle
  • Timecode: start and end times in format HH:MM:SS,mmm --> HH:MM:SS,mmm
    • hours: 00-99
    • minutes: 00-59
    • seconds: 00-59
    • milliseconds: 000-999
  • Text: one or more lines of text content
  • Separator: empty line between entries

Parsing Process

The parsing process includes:

  1. Reading: .srt text input
  2. Tokenization: splitting input into subtitle blocks using Pest grammar rules
  3. Extracting: parsing each block to extract: index, start and end timestamps and text content
  4. Validating: checking timestamps (format, valid time ranges, non-overlapping) and block order
  5. Transforming: parsing data into a structured format (Subtitle structs)

How Results Are Used

The structured subtitle data enables:

  • Format Conversion: export to JSON, WebVTT, or CSV formats
  • Text Analysis: extract text for translation or word count
  • Quality Control: detect timing errors, missing indices, or overlapping subtitles
  • Statistics: calculate total duration, average subtitle length, reading speed
  • Timecode Manipulation: shift all timestamps by a fixed offset