clockwords 0.3.1

Find and resolve natural-language time expressions across multiple languages
Documentation

clockwords

Find and resolve natural-language time expressions in text.

CI Crates.io Docs.rs License

clockwords scans free-form text for relative time expressions like "last Friday from 9 to eleven", "yesterday at 3pm", or "letzten Freitag von 9 bis 12 Uhr" and returns their byte-offset spans together with resolved DateTime<Utc> values. It supports English, German, French, and Spanish out of the box.

Built for real-time GUI applications (time-tracking, note-taking, calendars) where the user types naturally and the app highlights detected time references as they appear. Timezone-aware — times the user enters are interpreted in their local timezone (configurable, defaults to UTC).

Features

  • Four languages: English, German, French, Spanish
  • Timezone-aware: User input is interpreted in a configurable timezone (defaults to UTC for backward compatibility)
  • Byte-offset spans: Directly usable for text highlighting in any GUI framework
  • Resolved times: Every match resolves to a concrete DateTime<Utc> point or range
  • Incremental typing support: Detects partial matches (e.g. "yester" while the user is still typing "yesterday")
  • Accent-tolerant: Handles días/dias, à/a, mañana/manana, dernière/derniere
  • Fast rejection: Aho-Corasick keyword prefilter skips text with no time-related words in sub-microsecond time
  • Zero allocations on rejection: If no keywords are found, scan() returns immediately
  • No unsafe code
  • Defensive: All internal date arithmetic returns Option — no panics from edge-case dates

Quick Start

Add to your Cargo.toml:

[dependencies]
clockwords = "0.3"

Basic Usage

use clockwords::{default_scanner, ResolvedTime};
use chrono::Utc;

fn main() {
    // Create a scanner with all four languages enabled
    let scanner = default_scanner();
    let now = Utc::now();

    let text = "The last hour I coded the initial code for the time library";
    let matches = scanner.scan(text, now);

    for m in &matches {
        println!(
            "Found '{}' at bytes {}..{} ({:?})",
            &text[m.span.as_range()],
            m.span.start,
            m.span.end,
            m.kind,
        );

        match &m.resolved {
            ResolvedTime::Point(dt) => println!("  Resolved to: {dt}"),
            ResolvedTime::Range { start, end } => {
                println!("  Resolved to: {start} .. {end}")
            }
        }
    }
}

Output:

Found 'The last hour' at bytes 0..13 (TimeRange)
  Resolved to: 2026-02-08T12:30:00Z .. 2026-02-08T13:30:00Z

Select Specific Languages

use clockwords::scanner_for_languages;

// Only English and German
let scanner = scanner_for_languages(&["en", "de"]);

Timezone Support

By default, all times are interpreted in UTC. To interpret user input in a specific timezone, configure ParserConfig::timezone or use scan_with_tz():

use clockwords::{ParserConfig, TimeExpressionScanner, Tz, default_scanner};
use chrono::Utc;

// Option 1: Set timezone in config
let config = ParserConfig {
    timezone: Tz::Europe__Berlin,
    ..Default::default()
};
// Pass config when constructing the scanner (e.g. via TimeExpressionScanner::new)

// Option 2: Override per scan call
let scanner = default_scanner();
let matches = scanner.scan_with_tz("yesterday at 3pm", Utc::now(), Tz::Europe__Berlin);
// "3pm" is interpreted as 15:00 Berlin time → resolves to 14:00 UTC (in winter)

When a timezone is set, all day boundaries (midnight), time-of-day values, and weekday calculations use the user's local timezone. The resolved output always remains in UTC. For example, with Europe/Berlin (CET, UTC+1 in winter):

  • "today" at 23:30 UTC (= 00:30 CET next day) → the range covers the next calendar day in Berlin
  • "at 3pm" → resolves to 14:00 UTC (not 15:00 UTC)
  • "the last hour" → unchanged (duration-based, timezone-independent)

Supported Expressions

Relative Days

Language Examples
English today, tomorrow, yesterday
German heute, morgen, gestern
French aujourd'hui, demain, hier
Spanish hoy, mañana, ayer

Resolves to a full-day Range (midnight to midnight in the configured timezone).

Relative Weekdays

Language Examples
English last Friday, next Monday, this Wednesday
German letzten Freitag, nächsten Montag, diesen Mittwoch
French vendredi dernier, lundi prochain, ce mercredi
Spanish el viernes pasado, el próximo lunes, este miércoles

Resolves to a full-day Range (midnight to midnight in the configured timezone). French and Spanish support both pre- and post-positive word order (e.g. lundi prochain and prochain lundi). Spanish also supports el viernes que viene.

Day Offsets

Language Examples
English in 4 days, two days ago, in three days
German in 3 Tagen, vor zwei Tagen
French dans 3 jours, il y a deux jours
Spanish en 3 días, hace 2 dias

Supports both digits and written-out number words (1–30).

Time Specifications

Language Examples
English at 3pm, at 3 am, 13 o'clock, at 3:30pm, 11:30am, at 15:30
German um 15 Uhr, um 15:30 Uhr, um 15:30
French à 13h, à 13h30, à 13:30
Spanish a las 3, a las 15:30

Colon-delimited minutes (H:MM) are supported in all languages. In English, am/pm is optional — bare H:MM with at is treated as 24-hour time. French supports both h and : as separators (13h30 and 13:30).

Resolves to a Point in time.

Time Ranges

Language Examples
English the last hour, last minute, between 9 and 12, from 9 to 12
German die letzte Stunde, von 9 bis 12 Uhr, zwischen 9 und 12
French la dernière heure, entre 9 et 12 heures
Spanish la última hora, entre las 9 y las 12

English supports both between X and Y and from X to Y with number words (from nine to five).

Combined Expressions

Any day reference (relative day, weekday, or day offset) can be combined with a time specification or time range in a single expression. The entire phrase is detected as one match:

Relative day + time:

Language Examples
English yesterday at 3pm, yesterday at 3:30pm, yesterday at 15:30, tomorrow between 9 and 12, yesterday from 9 to 11
German gestern um 15 Uhr, gestern um 15:30 Uhr, gestern um 15:30, gestern von 9 bis 12 Uhr
French hier à 13h, hier à 13h30, hier à 13:30, hier entre 9 et 12 heures
Spanish ayer a las 3, ayer a las 15:30, ayer entre las 9 y las 12

Weekday + time:

Language Examples
English last Friday at 3pm, last Friday at 3:30pm, last Friday at 15:30, last Friday from 9 to eleven, next Monday between 9 and 12
German letzten Freitag um 15 Uhr, letzten Freitag um 15:30 Uhr, nächsten Montag um 9:15, diesen Mittwoch zwischen 9 und 11
French vendredi dernier à 13h, vendredi dernier à 13h30, vendredi dernier à 13:30, ce lundi à 14h30, ce mercredi entre 9 et 11 heures
Spanish el viernes pasado a las 3, el viernes pasado a las 3:30, el próximo lunes a las 9:30, el pasado viernes entre las 9 y las 12

Combined expressions resolve to either a Point (day + time spec) or a Range (day + time range) on the specified day.

Architecture

How Scanning Works

Input text
    │
    ▼
┌─────────────────────┐
│ Aho-Corasick        │  Fast keyword check (~ns)
│ Prefilter           │  Rejects text with no time words
└─────────┬───────────┘
          │ keywords found
          ▼
┌─────────────────────┐
│ Per-Language         │  Regex rules with resolver closures
│ Grammar Rules       │  Run for each enabled language
└─────────┬───────────┘
          │ raw matches
          ▼
┌─────────────────────┐
│ Deduplication       │  Prefer Complete > Partial, longer > shorter
│ & Sorting           │  Remove overlapping inferior matches
└─────────┬───────────┘
          │
          ▼
     Vec<TimeMatch>

Buffer-Rescan Strategy

Rather than maintaining an incremental parser state machine, clockwords re-scans the full text buffer on every call to scan(). This is the right trade-off for GUI text input:

  • Input buffers are typically < 1 KB
  • Full regex scan of a short buffer completes in microseconds
  • Dramatically simpler than maintaining parser state across edits
  • No edge cases around cursor position, insertions, or deletions

Type Overview

Type Description
TimeExpressionScanner Main entry point — holds language parsers and prefilter
TimeMatch A single match result: span + confidence + resolved time + kind
Span Byte-offset range (start..end) for slicing the original text
ResolvedTime Point(DateTime<Utc>) or Range { start, end }
MatchConfidence Partial (user still typing) or Complete
ExpressionKind RelativeDay, RelativeDayOffset, TimeSpecification, TimeRange, Combined
ParserConfig Settings: report_partial (default true), max_matches (default 10), timezone (default Tz::UTC)
Tz Re-exported from chrono-tz — IANA timezone (e.g. Tz::Europe__Berlin, Tz::US__Eastern)

GUI Integration

clockwords is designed for real-time text highlighting. Here's how to wire it up:

use clockwords::{default_scanner, MatchConfidence, TimeExpressionScanner};
use chrono::Utc;

struct App {
    scanner: TimeExpressionScanner,
}

impl App {
    fn new() -> Self {
        Self {
            scanner: default_scanner(),
        }
    }

    /// Call this on every keystroke
    fn on_text_changed(&self, text: &str) {
        let matches = self.scanner.scan(text, Utc::now());

        for m in &matches {
            let range = m.span.start..m.span.end;
            let style = match m.confidence {
                MatchConfidence::Complete => "solid_underline",
                MatchConfidence::Partial  => "dotted_underline",
            };
            // Apply `style` to the character range in your text widget
            println!("Highlight bytes {range:?} with {style}");
        }
    }
}

Partial Match Highlighting

When the user types "I worked yester", the scanner returns a Partial match on "yester". Your GUI can show a dimmed or dotted underline to hint that a time expression is being formed. Once the user completes "yesterday", the match upgrades to Complete with a fully resolved time.

To disable partial matching:

use clockwords::{ParserConfig, TimeExpressionScanner};

let config = ParserConfig {
    report_partial: false,
    ..Default::default()
};

Adding a New Language

  1. Create src/lang/xx.rs (copy an existing language file as a template)
  2. Implement the LanguageParser trait:
    • lang_id() — return the ISO 639-1 code (e.g. "it")
    • keywords() — return Aho-Corasick trigger words
    • keyword_prefixes() — return typing prefixes (length >= 3)
    • parse() — call apply_rules() with your GrammarRule list
  3. Add number-word mappings to src/lang/numbers.rs
  4. Register the language in src/lib.rsscanner_for_languages()
  5. Add tests in tests/

Each GrammarRule is a compiled regex paired with a resolver closure:

GrammarRule {
    pattern: Regex::new(r"(?i)\b(?P<day>oggi|domani|ieri)\b").unwrap(),
    kind: ExpressionKind::RelativeDay,
    resolver: |caps, now, tz| {
        let offset = match caps.name("day")?.as_str().to_lowercase().as_str() {
            "oggi" => 0,
            "domani" => 1,
            "ieri" => -1,
            _ => return None,
        };
        resolve::resolve_relative_day(offset, now, tz)
    },
}

Performance

Scenario Approximate Time
No keywords in text (fast rejection) ~1 µs
Short sentence with 1 match ~10 µs
Paragraph with multiple matches ~10 µs

The Aho-Corasick prefilter means that text without any time-related words is rejected in microseconds — the regex engine is never invoked.

Running Tests

cargo test

The test suite includes 141 integration tests + 1 doctest covering:

  • All four languages with various expression types
  • Combined weekday + time expressions across all languages
  • Timezone-aware resolution (Europe/Berlin, US/Eastern, UTC)
  • Cross-midnight timezone boundary handling
  • Accent-tolerant variants (with and without diacritics)
  • Embedded expressions in longer sentences
  • Colon-delimited time parsing (3:30pm, 15:30, 13h30, 13:30)
  • from X to Y with number words (nine to five)
  • Incremental/partial matching
  • Edge cases (empty input, no false positives)
  • Cross-language default scanner

Running the TUI Demo

An interactive terminal demo is included:

cargo run --example tui_demo

Type time expressions and watch them get parsed in real time. Press ESC to quit.

Dependencies

Crate Purpose
chrono Date/time types and arithmetic
chrono-tz IANA timezone database for timezone-aware resolution
regex Per-language grammar patterns
aho-corasick Fast multi-keyword prefilter

License

Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).