# clockwords
**Find and resolve natural-language time expressions in text.**
[](https://github.com/hg8496/clockwords/actions/workflows/ci.yml)
[](https://crates.io/crates/clockwords)
[](https://docs.rs/clockwords)
[](LICENSE)
`clockwords` scans free-form text for relative time expressions like *"last Friday from 9 to eleven"*, *"yesterday at 3pm"*, or *"letzten Freitag von 9 bis 12 Uhr"* and returns their byte-offset spans together with resolved `DateTime<Utc>` values. It supports **English**, **German**, **French**, and **Spanish** out of the box.
Built for **real-time GUI applications** (time-tracking, note-taking, calendars) where the user types naturally and the app highlights detected time references as they appear. Timezone-aware — times the user enters are interpreted in their local timezone (configurable, defaults to UTC).
## Features
- **Four languages**: English, German, French, Spanish
- **Timezone-aware**: User input is interpreted in a configurable timezone (defaults to UTC for backward compatibility)
- **Byte-offset spans**: Directly usable for text highlighting in any GUI framework
- **Resolved times**: Every match resolves to a concrete `DateTime<Utc>` point or range
- **Incremental typing support**: Detects partial matches (e.g. `"yester"` while the user is still typing `"yesterday"`)
- **Accent-tolerant**: Handles `días`/`dias`, `à`/`a`, `mañana`/`manana`, `dernière`/`derniere`
- **Fast rejection**: Aho-Corasick keyword prefilter skips text with no time-related words in sub-microsecond time
- **Zero allocations on rejection**: If no keywords are found, `scan()` returns immediately
- **No unsafe code**
- **Defensive**: All internal date arithmetic returns `Option` — no panics from edge-case dates
## Quick Start
Add to your `Cargo.toml`:
```toml
[dependencies]
clockwords = "0.3"
```
### Basic Usage
```rust
use clockwords::{default_scanner, ResolvedTime};
use chrono::Utc;
fn main() {
// Create a scanner with all four languages enabled
let scanner = default_scanner();
let now = Utc::now();
let text = "The last hour I coded the initial code for the time library";
let matches = scanner.scan(text, now);
for m in &matches {
println!(
"Found '{}' at bytes {}..{} ({:?})",
&text[m.span.as_range()],
m.span.start,
m.span.end,
m.kind,
);
match &m.resolved {
ResolvedTime::Point(dt) => println!(" Resolved to: {dt}"),
ResolvedTime::Range { start, end } => {
println!(" Resolved to: {start} .. {end}")
}
}
}
}
```
**Output:**
```
Found 'The last hour' at bytes 0..13 (TimeRange)
Resolved to: 2026-02-08T12:30:00Z .. 2026-02-08T13:30:00Z
```
### Select Specific Languages
```rust
use clockwords::scanner_for_languages;
// Only English and German
let scanner = scanner_for_languages(&["en", "de"]);
```
### Timezone Support
By default, all times are interpreted in UTC. To interpret user input in a specific timezone, configure `ParserConfig::timezone` or use `scan_with_tz()`:
```rust
use clockwords::{ParserConfig, TimeExpressionScanner, Tz, default_scanner};
use chrono::Utc;
// Option 1: Set timezone in config
let config = ParserConfig {
timezone: Tz::Europe__Berlin,
..Default::default()
};
// Pass config when constructing the scanner (e.g. via TimeExpressionScanner::new)
// Option 2: Override per scan call
let scanner = default_scanner();
let matches = scanner.scan_with_tz("yesterday at 3pm", Utc::now(), Tz::Europe__Berlin);
// "3pm" is interpreted as 15:00 Berlin time → resolves to 14:00 UTC (in winter)
```
When a timezone is set, all day boundaries (midnight), time-of-day values, and weekday calculations use the user's local timezone. The resolved output always remains in UTC. For example, with `Europe/Berlin` (CET, UTC+1 in winter):
- `"today"` at 23:30 UTC (= 00:30 CET next day) → the range covers the *next* calendar day in Berlin
- `"at 3pm"` → resolves to 14:00 UTC (not 15:00 UTC)
- `"the last hour"` → unchanged (duration-based, timezone-independent)
## Supported Expressions
### Relative Days
| English | `today`, `tomorrow`, `yesterday` |
| German | `heute`, `morgen`, `gestern` |
| French | `aujourd'hui`, `demain`, `hier` |
| Spanish | `hoy`, `mañana`, `ayer` |
Resolves to a full-day `Range` (midnight to midnight in the configured timezone).
### Relative Weekdays
| English | `last Friday`, `next Monday`, `this Wednesday` |
| German | `letzten Freitag`, `nächsten Montag`, `diesen Mittwoch` |
| French | `vendredi dernier`, `lundi prochain`, `ce mercredi` |
| Spanish | `el viernes pasado`, `el próximo lunes`, `este miércoles` |
Resolves to a full-day `Range` (midnight to midnight in the configured timezone). French and Spanish support both pre- and post-positive word order (e.g. `lundi prochain` and `prochain lundi`). Spanish also supports `el viernes que viene`.
### Day Offsets
| English | `in 4 days`, `two days ago`, `in three days` |
| German | `in 3 Tagen`, `vor zwei Tagen` |
| French | `dans 3 jours`, `il y a deux jours` |
| Spanish | `en 3 días`, `hace 2 dias` |
Supports both digits and written-out number words (1–30).
### Time Specifications
| English | `at 3pm`, `at 3 am`, `13 o'clock`, `at 3:30pm`, `11:30am`, `at 15:30` |
| German | `um 15 Uhr`, `um 15:30 Uhr`, `um 15:30` |
| French | `à 13h`, `à 13h30`, `à 13:30` |
| Spanish | `a las 3`, `a las 15:30` |
Colon-delimited minutes (`H:MM`) are supported in all languages. In English, am/pm is optional — bare `H:MM` with `at` is treated as 24-hour time. French supports both `h` and `:` as separators (`13h30` and `13:30`).
Resolves to a `Point` in time.
### Time Ranges
| English | `the last hour`, `last minute`, `between 9 and 12`, `from 9 to 12` |
| German | `die letzte Stunde`, `von 9 bis 12 Uhr`, `zwischen 9 und 12` |
| French | `la dernière heure`, `entre 9 et 12 heures` |
| Spanish | `la última hora`, `entre las 9 y las 12` |
English supports both `between X and Y` and `from X to Y` with number words (`from nine to five`).
### Combined Expressions
Any day reference (relative day, weekday, or day offset) can be combined with a time specification or time range in a single expression. The entire phrase is detected as one match:
**Relative day + time:**
| English | `yesterday at 3pm`, `yesterday at 3:30pm`, `yesterday at 15:30`, `tomorrow between 9 and 12`, `yesterday from 9 to 11` |
| German | `gestern um 15 Uhr`, `gestern um 15:30 Uhr`, `gestern um 15:30`, `gestern von 9 bis 12 Uhr` |
| French | `hier à 13h`, `hier à 13h30`, `hier à 13:30`, `hier entre 9 et 12 heures` |
| Spanish | `ayer a las 3`, `ayer a las 15:30`, `ayer entre las 9 y las 12` |
**Weekday + time:**
| English | `last Friday at 3pm`, `last Friday at 3:30pm`, `last Friday at 15:30`, `last Friday from 9 to eleven`, `next Monday between 9 and 12` |
| German | `letzten Freitag um 15 Uhr`, `letzten Freitag um 15:30 Uhr`, `nächsten Montag um 9:15`, `diesen Mittwoch zwischen 9 und 11` |
| French | `vendredi dernier à 13h`, `vendredi dernier à 13h30`, `vendredi dernier à 13:30`, `ce lundi à 14h30`, `ce mercredi entre 9 et 11 heures` |
| Spanish | `el viernes pasado a las 3`, `el viernes pasado a las 3:30`, `el próximo lunes a las 9:30`, `el pasado viernes entre las 9 y las 12` |
Combined expressions resolve to either a `Point` (day + time spec) or a `Range` (day + time range) on the specified day.
## Architecture
### How Scanning Works
```
Input text
│
▼
┌─────────────────────┐
│ Aho-Corasick │ Fast keyword check (~ns)
│ Prefilter │ Rejects text with no time words
└─────────┬───────────┘
│ keywords found
▼
┌─────────────────────┐
│ Per-Language │ Regex rules with resolver closures
│ Grammar Rules │ Run for each enabled language
└─────────┬───────────┘
│ raw matches
▼
┌─────────────────────┐
│ Deduplication │ Prefer Complete > Partial, longer > shorter
│ & Sorting │ Remove overlapping inferior matches
└─────────┬───────────┘
│
▼
Vec<TimeMatch>
```
### Buffer-Rescan Strategy
Rather than maintaining an incremental parser state machine, `clockwords` re-scans the full text buffer on every call to `scan()`. This is the right trade-off for GUI text input:
- Input buffers are typically < 1 KB
- Full regex scan of a short buffer completes in microseconds
- Dramatically simpler than maintaining parser state across edits
- No edge cases around cursor position, insertions, or deletions
### Type Overview
| `TimeExpressionScanner` | Main entry point — holds language parsers and prefilter |
| `TimeMatch` | A single match result: span + confidence + resolved time + kind |
| `Span` | Byte-offset range (`start..end`) for slicing the original text |
| `ResolvedTime` | `Point(DateTime<Utc>)` or `Range { start, end }` |
| `MatchConfidence` | `Partial` (user still typing) or `Complete` |
| `ExpressionKind` | `RelativeDay`, `RelativeDayOffset`, `TimeSpecification`, `TimeRange`, `Combined` |
| `ParserConfig` | Settings: `report_partial` (default `true`), `max_matches` (default `10`), `timezone` (default `Tz::UTC`) |
| `Tz` | Re-exported from `chrono-tz` — IANA timezone (e.g. `Tz::Europe__Berlin`, `Tz::US__Eastern`) |
## GUI Integration
`clockwords` is designed for real-time text highlighting. Here's how to wire it up:
```rust
use clockwords::{default_scanner, MatchConfidence, TimeExpressionScanner};
use chrono::Utc;
struct App {
scanner: TimeExpressionScanner,
}
impl App {
fn new() -> Self {
Self {
scanner: default_scanner(),
}
}
/// Call this on every keystroke
fn on_text_changed(&self, text: &str) {
let matches = self.scanner.scan(text, Utc::now());
for m in &matches {
let range = m.span.start..m.span.end;
let style = match m.confidence {
MatchConfidence::Complete => "solid_underline",
MatchConfidence::Partial => "dotted_underline",
};
// Apply `style` to the character range in your text widget
println!("Highlight bytes {range:?} with {style}");
}
}
}
```
### Partial Match Highlighting
When the user types `"I worked yester"`, the scanner returns a **Partial** match on `"yester"`. Your GUI can show a dimmed or dotted underline to hint that a time expression is being formed. Once the user completes `"yesterday"`, the match upgrades to **Complete** with a fully resolved time.
To disable partial matching:
```rust
use clockwords::{ParserConfig, TimeExpressionScanner};
let config = ParserConfig {
report_partial: false,
..Default::default()
};
```
## Adding a New Language
1. Create `src/lang/xx.rs` (copy an existing language file as a template)
2. Implement the `LanguageParser` trait:
- `lang_id()` — return the ISO 639-1 code (e.g. `"it"`)
- `keywords()` — return Aho-Corasick trigger words
- `keyword_prefixes()` — return typing prefixes (length >= 3)
- `parse()` — call `apply_rules()` with your `GrammarRule` list
3. Add number-word mappings to `src/lang/numbers.rs`
4. Register the language in `src/lib.rs` → `scanner_for_languages()`
5. Add tests in `tests/`
Each `GrammarRule` is a compiled regex paired with a resolver closure:
```rust
GrammarRule {
pattern: Regex::new(r"(?i)\b(?P<day>oggi|domani|ieri)\b").unwrap(),
kind: ExpressionKind::RelativeDay,
resolver: |caps, now, tz| {
let offset = match caps.name("day")?.as_str().to_lowercase().as_str() {
"oggi" => 0,
"domani" => 1,
"ieri" => -1,
_ => return None,
};
resolve::resolve_relative_day(offset, now, tz)
},
}
```
## Performance
| No keywords in text (fast rejection) | ~1 µs |
| Short sentence with 1 match | ~10 µs |
| Paragraph with multiple matches | ~10 µs |
The Aho-Corasick prefilter means that text without any time-related words is rejected in microseconds — the regex engine is never invoked.
## Running Tests
```bash
cargo test
```
The test suite includes **141 integration tests + 1 doctest** covering:
- All four languages with various expression types
- Combined weekday + time expressions across all languages
- Timezone-aware resolution (Europe/Berlin, US/Eastern, UTC)
- Cross-midnight timezone boundary handling
- Accent-tolerant variants (with and without diacritics)
- Embedded expressions in longer sentences
- Colon-delimited time parsing (`3:30pm`, `15:30`, `13h30`, `13:30`)
- `from X to Y` with number words (`nine to five`)
- Incremental/partial matching
- Edge cases (empty input, no false positives)
- Cross-language default scanner
## Running the TUI Demo
An interactive terminal demo is included:
```bash
cargo run --example tui_demo
```
Type time expressions and watch them get parsed in real time. Press **ESC** to quit.
## Dependencies
| [`chrono`](https://crates.io/crates/chrono) | Date/time types and arithmetic |
| [`chrono-tz`](https://crates.io/crates/chrono-tz) | IANA timezone database for timezone-aware resolution |
| [`regex`](https://crates.io/crates/regex) | Per-language grammar patterns |
| [`aho-corasick`](https://crates.io/crates/aho-corasick) | Fast multi-keyword prefilter |
## License
Licensed under the Apache License, Version 2.0 ([LICENSE](LICENSE) or <http://www.apache.org/licenses/LICENSE-2.0>).