clockwords

Find and resolve natural-language time expressions in text.

clockwords scans free-form text for relative time expressions like "last Friday from 9 to eleven", "yesterday at 3pm", or "letzten Freitag von 9 bis 12 Uhr" and returns their byte-offset spans together with resolved DateTime<Utc> values. It supports English, German, French, and Spanish out of the box.

Built for real-time GUI applications (time-tracking, note-taking, calendars) where the user types naturally and the app highlights detected time references as they appear. Timezone-aware — times the user enters are interpreted in their local timezone (configurable, defaults to UTC).

Features

Four languages: English, German, French, Spanish
Timezone-aware: User input is interpreted in a configurable timezone (defaults to UTC for backward compatibility)
Byte-offset spans: Directly usable for text highlighting in any GUI framework
Resolved times: Every match resolves to a concrete DateTime<Utc> point or range
Incremental typing support: Detects partial matches (e.g. "yester" while the user is still typing "yesterday")
Accent-tolerant: Handles días/dias, à/a, mañana/manana, dernière/derniere
Fast rejection: Aho-Corasick keyword prefilter skips text with no time-related words in sub-microsecond time
Zero allocations on rejection: If no keywords are found, scan() returns immediately
No unsafe code
Defensive: All internal date arithmetic returns Option — no panics from edge-case dates

Quick Start

Add to your Cargo.toml:

[dependencies]
clockwords = "0.3"

Basic Usage

use clockwords::{default_scanner, ResolvedTime};
use chrono::Utc;

fn main() {
    // Create a scanner with all four languages enabled
    let scanner = default_scanner();
    let now = Utc::now();

    let text = "The last hour I coded the initial code for the time library";
    let matches = scanner.scan(text, now);

    for m in &matches {
        println!(
            "Found '{}' at bytes {}..{} ({:?})",
            &text[m.span.as_range()],
            m.span.start,
            m.span.end,
            m.kind,
        );

        match &m.resolved {
            ResolvedTime::Point(dt) => println!("  Resolved to: {dt}"),
            ResolvedTime::Range { start, end } => {
                println!("  Resolved to: {start} .. {end}")
            }
        }
    }
}

Output:

Found 'The last hour' at bytes 0..13 (TimeRange)
  Resolved to: 2026-02-08T12:30:00Z .. 2026-02-08T13:30:00Z

Select Specific Languages

use clockwords::scanner_for_languages;

// Only English and German
let scanner = scanner_for_languages(&["en", "de"]);

Timezone Support

By default, all times are interpreted in UTC. To interpret user input in a specific timezone, configure ParserConfig::timezone or use scan_with_tz():

use clockwords::{ParserConfig, TimeExpressionScanner, Tz, default_scanner};
use chrono::Utc;

// Option 1: Set timezone in config
let config = ParserConfig {
    timezone: Tz::Europe__Berlin,
    ..Default::default()
};
// Pass config when constructing the scanner (e.g. via TimeExpressionScanner::new)

// Option 2: Override per scan call
let scanner = default_scanner();
let matches = scanner.scan_with_tz("yesterday at 3pm", Utc::now(), Tz::Europe__Berlin);
// "3pm" is interpreted as 15:00 Berlin time → resolves to 14:00 UTC (in winter)

When a timezone is set, all day boundaries (midnight), time-of-day values, and weekday calculations use the user's local timezone. The resolved output always remains in UTC. For example, with Europe/Berlin (CET, UTC+1 in winter):

"today" at 23:30 UTC (= 00:30 CET next day) → the range covers the next calendar day in Berlin
"at 3pm" → resolves to 14:00 UTC (not 15:00 UTC)
"the last hour" → unchanged (duration-based, timezone-independent)

Supported Expressions

Relative Days

Language	Examples
English	`today`, `tomorrow`, `yesterday`
German	`heute`, `morgen`, `gestern`
French	`aujourd'hui`, `demain`, `hier`
Spanish	`hoy`, `mañana`, `ayer`

Resolves to a full-day Range (midnight to midnight in the configured timezone).

Relative Weekdays

Language	Examples
English	`last Friday`, `next Monday`, `this Wednesday`
German	`letzten Freitag`, `nächsten Montag`, `diesen Mittwoch`
French	`vendredi dernier`, `lundi prochain`, `ce mercredi`
Spanish	`el viernes pasado`, `el próximo lunes`, `este miércoles`

Resolves to a full-day Range (midnight to midnight in the configured timezone). French and Spanish support both pre- and post-positive word order (e.g. lundi prochain and prochain lundi). Spanish also supports el viernes que viene.

Day Offsets

Language	Examples
English	`in 4 days`, `two days ago`, `in three days`
German	`in 3 Tagen`, `vor zwei Tagen`
French	`dans 3 jours`, `il y a deux jours`
Spanish	`en 3 días`, `hace 2 dias`

Supports both digits and written-out number words (1–30).

Time Specifications

Language	Examples
English	`at 3pm`, `at 3 am`, `13 o'clock`, `at 3:30pm`, `11:30am`, `at 15:30`
German	`um 15 Uhr`, `um 15:30 Uhr`, `um 15:30`
French	`à 13h`, `à 13h30`, `à 13:30`
Spanish	`a las 3`, `a las 15:30`

Colon-delimited minutes (H:MM) are supported in all languages. In English, am/pm is optional — bare H:MM with at is treated as 24-hour time. French supports both h and : as separators (13h30 and 13:30).

Resolves to a Point in time.

Time Ranges

Language	Examples
English	`the last hour`, `last minute`, `between 9 and 12`, `from 9 to 12`
German	`die letzte Stunde`, `von 9 bis 12 Uhr`, `zwischen 9 und 12`
French	`la dernière heure`, `entre 9 et 12 heures`
Spanish	`la última hora`, `entre las 9 y las 12`

English supports both between X and Y and from X to Y with number words (from nine to five).

Combined Expressions

Any day reference (relative day, weekday, or day offset) can be combined with a time specification or time range in a single expression. The entire phrase is detected as one match:

Relative day + time:

Language	Examples
English	`yesterday at 3pm`, `yesterday at 3:30pm`, `yesterday at 15:30`, `tomorrow between 9 and 12`, `yesterday from 9 to 11`
German	`gestern um 15 Uhr`, `gestern um 15:30 Uhr`, `gestern um 15:30`, `gestern von 9 bis 12 Uhr`
French	`hier à 13h`, `hier à 13h30`, `hier à 13:30`, `hier entre 9 et 12 heures`
Spanish	`ayer a las 3`, `ayer a las 15:30`, `ayer entre las 9 y las 12`

Weekday + time:

Language	Examples
English	`last Friday at 3pm`, `last Friday at 3:30pm`, `last Friday at 15:30`, `last Friday from 9 to eleven`, `next Monday between 9 and 12`
German	`letzten Freitag um 15 Uhr`, `letzten Freitag um 15:30 Uhr`, `nächsten Montag um 9:15`, `diesen Mittwoch zwischen 9 und 11`
French	`vendredi dernier à 13h`, `vendredi dernier à 13h30`, `vendredi dernier à 13:30`, `ce lundi à 14h30`, `ce mercredi entre 9 et 11 heures`
Spanish	`el viernes pasado a las 3`, `el viernes pasado a las 3:30`, `el próximo lunes a las 9:30`, `el pasado viernes entre las 9 y las 12`

Combined expressions resolve to either a Point (day + time spec) or a Range (day + time range) on the specified day.

Architecture

How Scanning Works

Input text
    │
    ▼
┌─────────────────────┐
│ Aho-Corasick        │  Fast keyword check (~ns)
│ Prefilter           │  Rejects text with no time words
└─────────┬───────────┘
          │ keywords found
          ▼
┌─────────────────────┐
│ Per-Language         │  Regex rules with resolver closures
│ Grammar Rules       │  Run for each enabled language
└─────────┬───────────┘
          │ raw matches
          ▼
┌─────────────────────┐
│ Deduplication       │  Prefer Complete > Partial, longer > shorter
│ & Sorting           │  Remove overlapping inferior matches
└─────────┬───────────┘
          │
          ▼
     Vec<TimeMatch>

Buffer-Rescan Strategy

Rather than maintaining an incremental parser state machine, clockwords re-scans the full text buffer on every call to scan(). This is the right trade-off for GUI text input:

Input buffers are typically < 1 KB
Full regex scan of a short buffer completes in microseconds
Dramatically simpler than maintaining parser state across edits
No edge cases around cursor position, insertions, or deletions

Type Overview

Type	Description
`TimeExpressionScanner`	Main entry point — holds language parsers and prefilter
`TimeMatch`	A single match result: span + confidence + resolved time + kind
`Span`	Byte-offset range (`start..end`) for slicing the original text
`ResolvedTime`	`Point(DateTime<Utc>)` or `Range { start, end }`
`MatchConfidence`	`Partial` (user still typing) or `Complete`
`ExpressionKind`	`RelativeDay`, `RelativeDayOffset`, `TimeSpecification`, `TimeRange`, `Combined`
`ParserConfig`	Settings: `report_partial` (default `true`), `max_matches` (default `10`), `timezone` (default `Tz::UTC`)
`Tz`	Re-exported from `chrono-tz` — IANA timezone (e.g. `Tz::Europe__Berlin`, `Tz::US__Eastern`)

GUI Integration

clockwords is designed for real-time text highlighting. Here's how to wire it up:

use clockwords::{default_scanner, MatchConfidence, TimeExpressionScanner};
use chrono::Utc;

struct App {
    scanner: TimeExpressionScanner,
}

impl App {
    fn new() -> Self {
        Self {
            scanner: default_scanner(),
        }
    }

    /// Call this on every keystroke
    fn on_text_changed(&self, text: &str) {
        let matches = self.scanner.scan(text, Utc::now());

        for m in &matches {
            let range = m.span.start..m.span.end;
            let style = match m.confidence {
                MatchConfidence::Complete => "solid_underline",
                MatchConfidence::Partial  => "dotted_underline",
            };
            // Apply `style` to the character range in your text widget
            println!("Highlight bytes {range:?} with {style}");
        }
    }
}

Partial Match Highlighting

When the user types "I worked yester", the scanner returns a Partial match on "yester". Your GUI can show a dimmed or dotted underline to hint that a time expression is being formed. Once the user completes "yesterday", the match upgrades to Complete with a fully resolved time.

To disable partial matching:

use clockwords::{ParserConfig, TimeExpressionScanner};

let config = ParserConfig {
    report_partial: false,
    ..Default::default()
};

Adding a New Language

Create src/lang/xx.rs (copy an existing language file as a template)
Implement the LanguageParser trait:
- lang_id() — return the ISO 639-1 code (e.g. "it")
- keywords() — return Aho-Corasick trigger words
- keyword_prefixes() — return typing prefixes (length >= 3)
- parse() — call apply_rules() with your GrammarRule list
Add number-word mappings to src/lang/numbers.rs
Register the language in src/lib.rs → scanner_for_languages()
Add tests in tests/

Each GrammarRule is a compiled regex paired with a resolver closure:

GrammarRule {
    pattern: Regex::new(r"(?i)\b(?P<day>oggi|domani|ieri)\b").unwrap(),
    kind: ExpressionKind::RelativeDay,
    resolver: |caps, now, tz| {
        let offset = match caps.name("day")?.as_str().to_lowercase().as_str() {
            "oggi" => 0,
            "domani" => 1,
            "ieri" => -1,
            _ => return None,
        };
        resolve::resolve_relative_day(offset, now, tz)
    },
}

Performance

Scenario	Approximate Time
No keywords in text (fast rejection)	~8 µs
Short sentence with 1 match	~17 µs
Paragraph with multiple matches	~18 µs

The Aho-Corasick prefilter means that text without any time-related words is rejected in microseconds — the regex engine is never invoked.

Running Tests

cargo test

The test suite includes 141 integration tests + 1 doctest covering:

All four languages with various expression types
Combined weekday + time expressions across all languages
Timezone-aware resolution (Europe/Berlin, US/Eastern, UTC)
Cross-midnight timezone boundary handling
Accent-tolerant variants (with and without diacritics)
Embedded expressions in longer sentences
Colon-delimited time parsing (3:30pm, 15:30, 13h30, 13:30)
from X to Y with number words (nine to five)
Incremental/partial matching
Edge cases (empty input, no false positives)
Cross-language default scanner

Running the TUI Demo

An interactive terminal demo is included:

cargo run --example tui_demo

Type time expressions and watch them get parsed in real time. Press ESC to quit.

Dependencies

Crate	Purpose
`chrono`	Date/time types and arithmetic
`chrono-tz`	IANA timezone database for timezone-aware resolution
`regex`	Per-language grammar patterns
`aho-corasick`	Fast multi-keyword prefilter

License

Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).

clockwords 0.3.0