# markdown-org-extract
[](https://crates.io/crates/markdown-org-extract)
[](https://github.com/VitalyOstanin/markdown-org-extract/actions/workflows/ci.yml?query=branch%3Amaster)
[](https://github.com/VitalyOstanin/markdown-org-extract/blob/master/LICENSE)
CLI utility for extracting tasks from markdown files with support for Emacs Org-mode markers.
## Table of contents
- [Installation and build](#installation-and-build)
- [Usage](#usage)
- [Example files](#example-files)
- [Agenda modes](#agenda-modes)
- [Supported markers](#supported-markers)
- [Locale support](#locale-support)
- [Output format](#output-format)
- [Repeating tasks](#repeating-tasks)
- [Project layout](#project-layout)
- [Dependencies](#dependencies)
- [License](#license)
## Installation and build
### Requirements
- Rust 1.85 or newer (the `comrak` 0.50+ upgrade requires the 2024 edition)
- Cargo
### Install from crates.io
If you only need the binary and do not want to clone the repository:
```bash
cargo install markdown-org-extract
```
After installation the binary lands in `~/.cargo/bin/markdown-org-extract`
(this path must be on your `PATH`).
### Building the project
Debug build:
```bash
cargo build
```
Optimised release build:
```bash
cargo build --release
```
The resulting binary appears in:
- Debug: `target/debug/markdown-org-extract`
- Release: `target/release/markdown-org-extract`
### Running
After building, run the utility:
```bash
# Debug build
./target/debug/markdown-org-extract [OPTIONS]
# Release build
./target/release/markdown-org-extract [OPTIONS]
```
Or use cargo to run it without an explicit build step:
```bash
cargo run -- [OPTIONS]
```
### Testing
Run the test suite:
```bash
cargo test
```
Run with verbose output:
```bash
cargo test -- --nocapture
```
Static checks:
```bash
cargo check
cargo clippy
```
#### Workday-handling test coverage
`holidays` module:
- Loading the holiday calendar
- Distinguishing regular weekends and working days
- 2025 New Year holidays (1–8 January) and 2026 (1–9 January)
- 2026 holiday shifts (8 March → 9 March, 9 May → 11 May)
- Skipping weekends and holidays when locating the next working day
`timestamp::repeater` module:
- Parsing repeaters `+1wd`, `+2wd`, `++1wd`, `.+1wd`
- Computing the next occurrence over working days
- Skipping holidays in repeater arithmetic
`timestamp::parser` module:
- Parsing timestamps that carry `+1wd` and `+2wd`
## Usage
```bash
markdown-org-extract [OPTIONS]
```
### Options
- `--dir <DIR>` — directory to scan (default: `.`)
- `--glob <GLOB>` — file filter pattern (default: `*.md`)
- `--format <FORMAT>` — output format: `json`, `md`, `html` (default: `json`)
- `--output <OUTPUT>` — file to write the result to; `-` means stdout (default: stdout)
- `--locale <LOCALE>` — weekday locales, comma-separated (default: `ru,en`)
- `--agenda <MODE>` — agenda mode: `day`, `week`, `month`, `tasks` (default: `day`)
- `--tasks` — show all TODO tasks sorted by priority (alias for `--agenda tasks`)
- `--date <DATE>` — date for `day` mode in `YYYY-MM-DD` (default: today)
- `--from <DATE>` — start date for `week`/`month` mode in `YYYY-MM-DD` (default: Monday of the current week / first day of the current month)
- `--to <DATE>` — end date for `week`/`month` mode in `YYYY-MM-DD` (default: Sunday of the current week / last day of the current month)
- `--tz <TIMEZONE>` — IANA timezone for determining the current date (default: `Europe/Moscow`)
- `--current-date <DATE>` — explicit current date for overdue calculation in `YYYY-MM-DD` (default: today in the configured timezone)
- `--holidays <YEAR>` — print the holiday list for the given year (1900–2100) as JSON
- `--absolute-paths` — emit absolute file paths instead of paths relative to `--dir`
- `--max-tasks <N>` — task limit (1..=10_000_000, default 10_000). Applied both as a per-file cap and as a global ceiling
- `-v`, `--verbose` — verbose stderr log (`-v` = info, `-vv` = debug, `-vvv` = trace). Mutually exclusive with `--quiet`
- `-q`, `--quiet` — suppress all diagnostic messages except critical errors
- `--color <MODE>` — control ANSI colour in logs: `auto` (default), `always`, `never`
- `--no-color` — disable ANSI colour in logs; equivalent to `--color never`. The `NO_COLOR` environment variable has the same effect (see [no-color.org](https://no-color.org))
### Examples
Extract tasks from the current directory as JSON:
```bash
markdown-org-extract
```
Extract tasks from a specific directory:
```bash
markdown-org-extract --dir ./notes
```
Save the result to an HTML file:
```bash
markdown-org-extract --dir ./notes --format html --output agenda.html
```
Emit markdown:
```bash
markdown-org-extract --dir ./notes --format md
```
Run against the bundled examples:
```bash
markdown-org-extract --dir ./examples
markdown-org-extract --dir ./examples --format md
markdown-org-extract --dir ./examples --format html --output examples-agenda.html
```
Use only Russian weekday names:
```bash
markdown-org-extract --dir ./notes --locale ru
```
Use only English weekday names:
```bash
markdown-org-extract --dir ./notes --locale en
```
#### Agenda examples
Today's tasks (default):
```bash
markdown-org-extract --dir ./notes
```
Tasks for a specific date:
```bash
markdown-org-extract --dir ./notes --agenda day --date 2025-12-10
```
Retrieve the holiday list for a year:
```bash
markdown-org-extract --holidays 2025
markdown-org-extract --holidays 2026
```
Sample holiday output:
```json
[
"2025-01-01",
"2025-01-02",
"2025-01-03",
"2025-01-04",
"2025-01-05",
"2025-01-06",
"2025-01-07",
"2025-01-08",
"2025-02-23",
"2025-03-08",
"2025-05-01",
"2025-05-09",
"2025-06-12",
"2025-11-04"
]
```
Tasks for the current week:
```bash
markdown-org-extract --dir ./notes --agenda week
```
Tasks for the current month:
```bash
markdown-org-extract --dir ./notes --agenda month
```
Tasks across a date range:
```bash
markdown-org-extract --dir ./notes --agenda week --from 2025-12-01 --to 2025-12-07
markdown-org-extract --dir ./notes --agenda month --from 2025-12-01 --to 2025-12-31
```
All TODO tasks sorted by priority:
```bash
markdown-org-extract --dir ./notes --tasks
```
Use a different timezone:
```bash
markdown-org-extract --dir ./notes --tz UTC
markdown-org-extract --dir ./notes --tz America/New_York
```
Use an explicit current date (useful for tests and deterministic output):
```bash
markdown-org-extract --dir ./notes --agenda week --current-date 2024-12-05
```
Cap the number of extracted tasks (useful for batch processing of very large trees):
```bash
markdown-org-extract --dir ./notes --max-tasks 1000
```
Enable verbose processing logs on stderr:
```bash
markdown-org-extract --dir ./notes -v
```
## Example files
The `examples/` directory contains markdown files with various markers.
The integration tests in `tests/cli.rs` exercise the same files.
General scenarios:
- `project-tasks.md` — project development tasks
- `personal-notes.md` — personal notes and tasks
- `meeting-notes.md` — meeting notes
- `work-log.md` — mixed log with SCHEDULED, DEADLINE, and CLOCK entries
Org-mode marker demonstrations:
- `priorities.md` — tasks with priorities `[#A]`, `[#B]`, `[#C]`
- `org-mode-timestamps.md` — timestamp forms, ranges, and repeaters
- `created-test.md` — using `CREATED:` for the creation date
- `workdays-test.md` — workday repeaters (`+1wd`, `+2wd`) interacting
with the holiday calendar
CLOCK-block demonstrations (time tracking):
- `clock-formats.md` — every supported CLOCK line form
- `clock-inline.md` — CLOCK inside inline code (`` `CLOCK: ...` ``)
- `clock-test.md` — closed CLOCK intervals with `=> HH:MM`
- `simple-clock.md` — CLOCK inside fenced code blocks
- `done-clock.md` — CLOCK attached to a DONE task (post-completion accounting)
Try running:
```bash
./target/release/markdown-org-extract --dir ./examples --format md
```
## Agenda modes
The utility supports four task-listing modes, mirroring Emacs Org-mode:
### day — tasks for a single day
Shows tasks whose timestamps (SCHEDULED, DEADLINE) fall on the given date.
The default is today in the configured timezone.
```bash
# Today's tasks
markdown-org-extract --agenda day
# Tasks for a specific date
markdown-org-extract --agenda day --date 2025-12-10
```
### week — tasks for a week
Shows tasks whose timestamps fall within a date range. The default is the
current week (Monday–Sunday).
Each day lists:
- Tasks scheduled for that day (scheduled)
- Upcoming tasks relative to that day (upcoming)
- Overdue tasks (overdue) — only for the current date
```bash
# Current week
markdown-org-extract --agenda week
# Explicit range
markdown-org-extract --agenda week --from 2025-12-01 --to 2025-12-07
```
### month — tasks for a month
Shows tasks whose timestamps fall within a date range. The default is the
current month (first to last day).
Behaves the same way as `week` — each day surfaces scheduled, upcoming,
and overdue tasks.
```bash
# Current month
markdown-org-extract --agenda month
# Explicit range
markdown-org-extract --agenda month --from 2025-12-01 --to 2025-12-31
```
### tasks — all TODO tasks
Lists every task whose state is TODO, sorted by priority
(A → B → C → no priority). Timestamps are ignored.
```bash
# All TODO tasks by priority
markdown-org-extract --tasks
```
### Timezones
The `--tz` option controls which timezone is used to derive the current
date and current week. All standard IANA timezones are accepted.
```bash
# Moscow time (default)
markdown-org-extract --agenda day --tz Europe/Moscow
# UTC
markdown-org-extract --agenda day --tz UTC
# New York
markdown-org-extract --agenda day --tz America/New_York
```
## Supported markers
### Task markers
The utility recognises TODO and DONE markers in headings:
```markdown
### TODO Implement feature
### DONE Complete task
```
### Task priorities
Priorities follow the org-mode convention (letters A–Z inside square brackets):
```markdown
### TODO [#A] Critical task
### TODO [#B] Important task
### TODO [#C] Regular task
### DONE [#A] Completed high-priority task
```
The priority appears after the TODO/DONE marker and before the task text.
The most common priorities are:
- `[#A]` — high priority (critical tasks)
- `[#B]` — medium priority (important tasks)
- `[#C]` — low priority (regular tasks)
Priority is optional.
### Timestamps
Timestamps must be wrapped in backticks:
**Simple timestamp:**
```markdown
`<2024-12-10 Mon 10:00-12:00>`
```
**Planning markers:**
```markdown
`CREATED: <2024-12-01 Mon>`
`DEADLINE: <2024-12-15 Sun>`
`SCHEDULED: <2024-12-05 Wed>`
`CLOSED: <2024-12-01 Mon>`
```
**Date range:**
```markdown
`<2024-12-20 Mon>--<2024-12-22 Wed>`
```
**Inactive timestamps (NOT extracted):**
```markdown
`[2024-12-10 Mon]` — square brackets denote an inactive timestamp
```
**Note:** `CREATED` is extracted separately from the other timestamps and
stored in the `created` field. This lets consumers track the task
creation date independently of SCHEDULED, DEADLINE, and CLOSED.
### Time tracking (CLOCK)
The utility supports CLOCK entries for tracking time spent on tasks,
mirroring Emacs Org-mode.
**CLOCK format inside backticks (same as timestamps):**
```markdown
### TODO Implement feature
`SCHEDULED: <2024-12-10 Tue>`
`CLOCK: <2024-12-09 Mon 10:00>--<2024-12-09 Mon 12:30> => 2:30`
`CLOCK: <2024-12-09 Mon 14:00>--<2024-12-09 Mon 16:15> => 2:15`
```
**Alternative format inside code blocks (as in org-mode):**
```markdown
### TODO Implement feature
`SCHEDULED: <2024-12-10 Tue>`
```
CLOCK: [2024-12-09 Mon 10:00]--[2024-12-09 Mon 12:30] => 2:30
CLOCK: [2024-12-09 Mon 14:00]--[2024-12-09 Mon 16:15] => 2:15
```
```
**Open CLOCK entry (active work):**
```markdown
`CLOCK: <2024-12-10 Tue 09:00>`
```
**Features:**
- Automatic extraction of every CLOCK entry under a heading
- Total time (`total_clock_time`) summed across all entries
- Open (active) CLOCK entries without a close time
- Rendering in JSON, Markdown, and HTML
- Both square `[...]` (org-mode style) and angle `<...>` brackets are accepted
**Sample JSON output:**
```json
{
"heading": "Implement feature",
"clocks": [
{
"start": "2024-12-09 Mon 10:00",
"end": "2024-12-09 Mon 12:30",
"duration": "2:30"
},
{
"start": "2024-12-09 Mon 14:00",
"end": "2024-12-09 Mon 16:15",
"duration": "2:15"
}
],
"total_clock_time": "4:45"
}
```
**Sample Markdown output:**
```markdown
## Implement feature
**Total Time:** 4:45
**Clock:**
- 2024-12-09 Mon 10:00 → 2024-12-09 Mon 12:30 (2:30)
- 2024-12-09 Mon 14:00 → 2024-12-09 Mon 16:15 (2:15)
```
## Locale support
The utility recognises weekday names in different languages via the
`--locale` option.
### Supported locales
- `en` — English (Mon, Tue, Wed, Thu, Fri, Sat, Sun, Monday, Tuesday, ...)
- `ru` — Russian (Пн, Вт, Ср, Чт, Пт, Сб, Вс, Понедельник, Вторник, ...)
The default is both locales: `--locale ru,en`.
### Russian-weekday examples
```markdown
### TODO Встреча
`<2024-12-10 Пн 10:00>`
### Конференция
`<2024-12-20 Понедельник>--<2024-12-22 Среда>`
### TODO Задача
`DEADLINE: <2024-12-15 Вс>`
```
Russian weekday names are normalised to the English form during extraction.
## Output format
The output format depends on the agenda mode.
### `--tasks` mode (task list)
#### JSON
Optional fields (`priority`, `created`, `timestamp_time`,
`timestamp_end_time`, `clocks`, `total_clock_time`, `task_type`) are
omitted when absent rather than serialised as `null`. This matches the
`#[serde(skip_serializing_if = "Option::is_none")]` convention used in
`src/types.rs`.
Example below is the actual output of
`--dir examples --glob 'project-tasks.md' --tasks --max-tasks 1
--current-date 2025-12-05`.
```json
[
{
"file": "project-tasks.md",
"line": 5,
"heading": "Design database schema",
"content": "Need to finalize the database structure before implementation.",
"task_type": "TODO",
"priority": "A",
"timestamp": "SCHEDULED: <2024-12-05 Wed>",
"timestamp_type": "SCHEDULED",
"timestamp_date": "2024-12-05"
}
]
```
#### Markdown
```markdown
# Tasks
## Design database schema
**File:** `project-tasks.md:5`
**Type:** TODO
**Priority:** A
**Time:** `SCHEDULED: <2024-12-05 Wed>`
Need to finalize the database structure before implementation.
```
### `--agenda day` and `--agenda week` modes (day-grouped agenda)
In these modes tasks are grouped by day. Each day contains task
categories (in display order):
1. **Overdue** (only for the current date) — overdue tasks, oldest first
2. **Scheduled (with time)** — that day's tasks with a time, earliest first
3. **Scheduled (no time)** — that day's tasks without a time
4. **Upcoming** — upcoming tasks relative to that day, nearest first
**Important:** Each day shows upcoming tasks relative to that day, not
relative to a global reference date.
#### JSON
File paths are emitted relative to `--dir` (or absolute when
`--absolute-paths` is set). Optional fields are omitted when absent, as
in `--tasks` mode.
```json
[
{
"date": "2025-12-05",
"overdue": [
{
"file": "project-tasks.md",
"line": 5,
"heading": "Design database schema",
"content": "Need to finalize the database structure before implementation.",
"task_type": "TODO",
"priority": "A",
"timestamp": "SCHEDULED: <2024-12-05 Wed>",
"timestamp_type": "SCHEDULED",
"timestamp_date": "2024-12-05",
"days_offset": -365
}
],
"scheduled_timed": [],
"scheduled_no_time": [],
"upcoming": [
{
"file": "project-tasks.md",
"line": 47,
"heading": "Review pull request #42",
"content": "Critical bug fix needs review.",
"task_type": "TODO",
"timestamp": "DEADLINE: <2025-12-06 Sat>",
"timestamp_type": "DEADLINE",
"timestamp_date": "2025-12-06",
"days_offset": 1
}
]
}
]
```
The `days_offset` field encodes:
- Positive number — days until the deadline (upcoming)
- Negative number — days the task is overdue
- Absent for tasks belonging to the day itself (scheduled)
#### Markdown
File paths and timestamps are wrapped in inline code (`` `...` ``) to
preserve formatting. `Type:` uses `TODO` / `DONE` (not `Todo` / `Done`);
`Priority:` is shown as a bare letter without the `[#]` wrapper.
```markdown
# Agenda
## 2025-12-05
### Overdue
#### Design database schema (365 days ago)
**File:** `project-tasks.md:5`
**Type:** TODO
**Priority:** A
**Time:** `SCHEDULED: <2024-12-05 Wed>`
Need to finalize the database structure before implementation.
### Scheduled
#### Daily standup
**File:** `project-tasks.md:33`
**Time:** `<2025-12-05 Friday 09:00-09:15>`
Daily standup meeting.
### Upcoming
#### Review pull request \#42 (in 1 days)
**File:** `project-tasks.md:47`
**Type:** TODO
**Time:** `DEADLINE: <2025-12-06 Sat>`
Critical bug fix needs review.
```
#### Parsed timestamp fields
To let downstream consumers render agendas without re-parsing the
`timestamp` string, the timestamp is split into structured fields:
- `timestamp_type` — `SCHEDULED`, `DEADLINE`, `CLOSED`, or `PLAIN`
- `timestamp_date` — date as `YYYY-MM-DD`
- `timestamp_time` — start time, e.g. `10:00` (when present)
- `timestamp_end_time` — end time, e.g. `12:00` (when a range was given)
## Repeating tasks
The utility honours org-mode repeater syntax for automatically scheduling
follow-up occurrences.
### Repeater kinds
Every standard org-mode unit is supported:
- `+Nh` — every N hours
- `+Nd` — every N days (strict; preserves the original date offset)
- `+Nw` — every N weeks
- `+Nm` — every N months
- `+Ny` — every N years
- `+Nwd` — **every N working days** (project extension; honours RF
holidays and weekends)
Repeater modifiers:
- `+` — strict (cumulative); preserves the date offset
- `++` — catch-up (smart); preserves the weekday
- `.+` — restart-from-completion (relative to the close date)
### Working days
Repeaters with the `wd` (workday) suffix take into account:
- Regular weekends (Saturday, Sunday)
- Official RF holidays
- Holiday shifts
Holiday data lives in `holidays_ru.json`. At build time (`build.rs`) the
data is compiled into static Rust constants — the JSON is parsed once
during compilation rather than at runtime.
### Examples
```markdown
### TODO Hourly check
`SCHEDULED: <2025-12-05 Thu 10:00 +1h>`
### TODO Daily task
`SCHEDULED: <2025-12-05 Thu +1d>`
### TODO Weekly meeting
`SCHEDULED: <2025-12-05 Thu +1w>`
### TODO Monthly report
`SCHEDULED: <2025-12-05 Thu +1m>`
### TODO Annual review
`SCHEDULED: <2025-12-05 Thu +1y>`
### TODO Workday-only task
`SCHEDULED: <2025-12-05 Thu +1wd>`
### TODO Every two working days
`SCHEDULED: <2025-12-05 Thu +2wd>`
```
## Project layout
```
markdown-org-extract/
├── src/
│ ├── main.rs # CLI entry point, file walker, file I/O
│ ├── cli.rs # Argument parsing (clap), tracing init
│ ├── agenda.rs # Agenda logic (day/week/month), repeaters
│ ├── parser.rs # Task extraction from the markdown AST
│ ├── render.rs # Markdown/HTML rendering
│ ├── format.rs # OutputFormat (clap ValueEnum)
│ ├── error.rs # AppError
│ ├── types.rs # Task / Priority / DayAgenda / ProcessingStats
│ ├── clock.rs # CLOCK parsing and time aggregation
│ ├── holidays.rs # RF workday calendar (singleton, binary search)
│ ├── regex_limits.rs # `compile_bounded`: regex with size/DFA caps
│ └── timestamp/ # Org-mode timestamp parsing
│ ├── parser.rs # <2024-12-05 Thu 10:00 +1d> → ParsedTimestamp
│ ├── extract.rs # pull timestamp/CREATED out of arbitrary text
│ ├── repeater.rs # parsing and arithmetic of repeaters (+1d, ++2w, .+1wd…)
│ └── weekdays.rs # normalisation of Russian weekday names
├── tests/
│ └── cli.rs # CLI integration tests (assert_cmd)
├── examples/ # Sample markdown files
├── docs/ # Supplementary documentation
├── holidays_ru.json # RF holiday / workday calendar
├── build.rs # Generates holidays_data.rs at build time
├── rustfmt.toml # Formatter settings (edition 2021, width 100)
├── rust-toolchain.toml # Pinned channel = stable, components rustfmt+clippy
├── .github/workflows/
│ ├── ci.yml # PR/push CI: lint + test matrix (Linux/macOS/Windows) + cargo audit
│ ├── release.yml # Publish to crates.io on tag v* (+ workflow_dispatch)
│ └── outdated.yml # Weekly non-blocking `cargo outdated`
├── Cargo.toml
├── CHANGELOG.md
├── TODO.md # Deferred technical tasks
├── LICENSE # MIT
└── README.md
```
See also:
- [docs/CLOCK_IMPLEMENTATION.md](docs/CLOCK_IMPLEMENTATION.md) — CLOCK marker implementation details
- [docs/org-mode-keywords.md](docs/org-mode-keywords.md) — supported-keyword reference
- [CHANGELOG.md](CHANGELOG.md) — version history
- [TODO.md](TODO.md) — deferred technical tasks
## Dependencies
- `clap` — command-line argument parsing
- `comrak` — markdown parsing (without onig/syntect: `default-features = false`)
- `regex` — regular expressions (with size/DFA caps)
- `serde` / `serde_json` — data serialisation
- `chrono` / `chrono-tz` — dates and timezones
- `grep-regex` / `grep-searcher` — fast pre-filter over keywords
- `ignore` — directory tree walk that honours `.gitignore`
- `globset` — glob compilation for `--glob`
- `tracing` / `tracing-subscriber` — structured diagnostic logging (`--verbose`, `--quiet`, `--color`, `--no-color`)
Lazily initialised `static` regular expressions use `std::sync::LazyLock`
from the standard library (Rust 1.80+; the project itself requires 1.85).
## License
MIT — see the [LICENSE](LICENSE) file.
### Provenance of `holidays_ru.json`
The holiday calendar and weekend-shift table in `holidays_ru.json` was
compiled by the project author from the official RF government decrees
on weekend rescheduling. This is public factual information and is not
subject to copyright. For packaging convenience the file is distributed
under the same MIT licence as the rest of the code.
Attribution and a schema description are duplicated inside the file
itself under the `_meta` key (`build.rs` ignores underscore-prefixed
top-level keys, so the block has no effect on the compiled output).