markdown-org-extract
CLI utility for extracting tasks from markdown files with support for Emacs Org-mode markers.
Table of contents
- Installation and build
- Usage
- Example files
- Agenda modes
- Supported markers
- Locale support
- Output format
- Repeating tasks
- Project layout
- Dependencies
- License
Installation and build
Requirements
- Rust 1.85 or newer (the
comrak0.50+ upgrade requires the 2024 edition) - Cargo
Install from crates.io
If you only need the binary and do not want to clone the repository:
After installation the binary lands in ~/.cargo/bin/markdown-org-extract
(this path must be on your PATH).
Building the project
Debug build:
Optimised release build:
The resulting binary appears in:
- Debug:
target/debug/markdown-org-extract - Release:
target/release/markdown-org-extract
Running
After building, run the utility:
# Debug build
# Release build
Or use cargo to run it without an explicit build step:
Testing
Run the test suite:
Run with verbose output:
Static checks:
Workday-handling test coverage
holidays module:
- Loading the holiday calendar
- Distinguishing regular weekends and working days
- 2025 New Year holidays (1–8 January) and 2026 (1–9 January)
- 2026 holiday shifts (8 March → 9 March, 9 May → 11 May)
- Skipping weekends and holidays when locating the next working day
timestamp::repeater module:
- Parsing repeaters
+1wd,+2wd,++1wd,.+1wd - Computing the next occurrence over working days
- Skipping holidays in repeater arithmetic
timestamp::parser module:
- Parsing timestamps that carry
+1wdand+2wd
Usage
Options
--dir <DIR>— directory to scan (default:.)--glob <GLOB>— file filter pattern (default:*.md)--format <FORMAT>— output format:json,md,html(default:json)--output <OUTPUT>— file to write the result to;-means stdout (default: stdout)--locale <LOCALE>— weekday locales, comma-separated (default:ru,en)--agenda <MODE>— agenda mode:day,week,month,tasks(default:day)--tasks— show all TODO tasks sorted by priority (alias for--agenda tasks)--date <DATE>— date fordaymode inYYYY-MM-DD(default: today)--from <DATE>— start date forweek/monthmode inYYYY-MM-DD(default: Monday of the current week / first day of the current month)--to <DATE>— end date forweek/monthmode inYYYY-MM-DD(default: Sunday of the current week / last day of the current month)--tz <TIMEZONE>— IANA timezone for determining the current date (default:Europe/Moscow)--current-date <DATE>— explicit current date for overdue calculation inYYYY-MM-DD(default: today in the configured timezone)--holidays <YEAR>— print the holiday list for the given year (1900–2100) as JSON--absolute-paths— emit absolute file paths instead of paths relative to--dir--max-tasks <N>— task limit (1..=10_000_000, default 10_000). Applied both as a per-file cap and as a global ceiling-v,--verbose— verbose stderr log (-v= info,-vv= debug,-vvv= trace). Mutually exclusive with--quiet-q,--quiet— suppress all diagnostic messages except critical errors--color <MODE>— control ANSI colour in logs:auto(default),always,never--no-color— disable ANSI colour in logs; equivalent to--color never. TheNO_COLORenvironment variable has the same effect (see no-color.org)
Examples
Extract tasks from the current directory as JSON:
Extract tasks from a specific directory:
Save the result to an HTML file:
Emit markdown:
Run against the bundled examples:
Use only Russian weekday names:
Use only English weekday names:
Agenda examples
Today's tasks (default):
Tasks for a specific date:
Retrieve the holiday list for a year:
Sample holiday output:
Tasks for the current week:
Tasks for the current month:
Tasks across a date range:
All TODO tasks sorted by priority:
Use a different timezone:
Use an explicit current date (useful for tests and deterministic output):
Cap the number of extracted tasks (useful for batch processing of very large trees):
Enable verbose processing logs on stderr:
Example files
The examples/ directory contains markdown files with various markers.
The integration tests in tests/cli.rs exercise the same files.
General scenarios:
project-tasks.md— project development taskspersonal-notes.md— personal notes and tasksmeeting-notes.md— meeting noteswork-log.md— mixed log with SCHEDULED, DEADLINE, and CLOCK entries
Org-mode marker demonstrations:
priorities.md— tasks with priorities[#A],[#B],[#C]org-mode-timestamps.md— timestamp forms, ranges, and repeaterscreated-test.md— usingCREATED:for the creation dateworkdays-test.md— workday repeaters (+1wd,+2wd) interacting with the holiday calendar
CLOCK-block demonstrations (time tracking):
clock-formats.md— every supported CLOCK line formclock-inline.md— CLOCK inside inline code (`CLOCK: ...`)clock-test.md— closed CLOCK intervals with=> HH:MMsimple-clock.md— CLOCK inside fenced code blocksdone-clock.md— CLOCK attached to a DONE task (post-completion accounting)
Try running:
Agenda modes
The utility supports four task-listing modes, mirroring Emacs Org-mode:
day — tasks for a single day
Shows tasks whose timestamps (SCHEDULED, DEADLINE) fall on the given date. The default is today in the configured timezone.
# Today's tasks
# Tasks for a specific date
week — tasks for a week
Shows tasks whose timestamps fall within a date range. The default is the current week (Monday–Sunday).
Each day lists:
- Tasks scheduled for that day (scheduled)
- Upcoming tasks relative to that day (upcoming)
- Overdue tasks (overdue) — only for the current date
# Current week
# Explicit range
month — tasks for a month
Shows tasks whose timestamps fall within a date range. The default is the current month (first to last day).
Behaves the same way as week — each day surfaces scheduled, upcoming,
and overdue tasks.
# Current month
# Explicit range
tasks — all TODO tasks
Lists every task whose state is TODO, sorted by priority (A → B → C → no priority). Timestamps are ignored.
# All TODO tasks by priority
Timezones
The --tz option controls which timezone is used to derive the current
date and current week. All standard IANA timezones are accepted.
# Moscow time (default)
# UTC
# New York
Supported markers
Task markers
The utility recognises TODO and DONE markers in headings:
Task priorities
Priorities follow the org-mode convention (letters A–Z inside square brackets):
The priority appears after the TODO/DONE marker and before the task text. The most common priorities are:
[#A]— high priority (critical tasks)[#B]— medium priority (important tasks)[#C]— low priority (regular tasks)
Priority is optional.
Timestamps
Timestamps must be wrapped in backticks:
Simple timestamp:
`<2024-12-10 Mon 10:00-12:00>`
Planning markers:
`CREATED: <2024-12-01 Mon>`
`DEADLINE: <2024-12-15 Sun>`
`SCHEDULED: <2024-12-05 Wed>`
`CLOSED: <2024-12-01 Mon>`
Date range:
`<2024-12-20 Mon>--<2024-12-22 Wed>`
Inactive timestamps (NOT extracted):
`[2024-12-10 Mon]` — square brackets denote an inactive timestamp
Note: CREATED is extracted separately from the other timestamps and
stored in the created field. This lets consumers track the task
creation date independently of SCHEDULED, DEADLINE, and CLOSED.
Time tracking (CLOCK)
The utility supports CLOCK entries for tracking time spent on tasks, mirroring Emacs Org-mode.
CLOCK format inside backticks (same as timestamps):
`SCHEDULED: <2024-12-10 Tue>`
`CLOCK: <2024-12-09 Mon 10:00>--<2024-12-09 Mon 12:30> => 2:30`
`CLOCK: <2024-12-09 Mon 14:00>--<2024-12-09 Mon 16:15> => 2:15`
Alternative format inside code blocks (as in org-mode):
`SCHEDULED: <2024-12-10 Tue>`
CLOCK: [2024-12-09 Mon 10:00]--[2024-12-09 Mon 12:30] => 2:30 CLOCK: [2024-12-09 Mon 14:00]--[2024-12-09 Mon 16:15] => 2:15
Open CLOCK entry (active work):
`CLOCK: <2024-12-10 Tue 09:00>`
Features:
- Automatic extraction of every CLOCK entry under a heading
- Total time (
total_clock_time) summed across all entries - Open (active) CLOCK entries without a close time
- Rendering in JSON, Markdown, and HTML
- Both square
[...](org-mode style) and angle<...>brackets are accepted
Sample JSON output:
Sample Markdown output:
**Total Time:** 4:45
**Clock:**
- -
Locale support
The utility recognises weekday names in different languages via the
--locale option.
Supported locales
en— English (Mon, Tue, Wed, Thu, Fri, Sat, Sun, Monday, Tuesday, ...)ru— Russian (Пн, Вт, Ср, Чт, Пт, Сб, Вс, Понедельник, Вторник, ...)
The default is both locales: --locale ru,en.
Russian-weekday examples
`<2024-12-10 Пн 10:00>`
`<2024-12-20 Понедельник>--<2024-12-22 Среда>`
`DEADLINE: <2024-12-15 Вс>`
Russian weekday names are normalised to the English form during extraction.
Output format
The output format depends on the agenda mode.
--tasks mode (task list)
JSON
Optional fields (priority, created, timestamp_time,
timestamp_end_time, clocks, total_clock_time, task_type) are
omitted when absent rather than serialised as null. This matches the
#[serde(skip_serializing_if = "Option::is_none")] convention used in
src/types.rs.
Example below is the actual output of
--dir examples --glob 'project-tasks.md' --tasks --max-tasks 1 --current-date 2025-12-05.
Markdown
**File:** `project-tasks.md:5`
**Type:** TODO
**Priority:** A
**Time:** `SCHEDULED: <2024-12-05 Wed>`
Need to finalize the database structure before implementation.
--agenda day and --agenda week modes (day-grouped agenda)
In these modes tasks are grouped by day. Each day contains task categories (in display order):
- Overdue (only for the current date) — overdue tasks, oldest first
- Scheduled (with time) — that day's tasks with a time, earliest first
- Scheduled (no time) — that day's tasks without a time
- Upcoming — upcoming tasks relative to that day, nearest first
Important: Each day shows upcoming tasks relative to that day, not relative to a global reference date.
JSON
File paths are emitted relative to --dir (or absolute when
--absolute-paths is set). Optional fields are omitted when absent, as
in --tasks mode.
The days_offset field encodes:
- Positive number — days until the deadline (upcoming)
- Negative number — days the task is overdue
- Absent for tasks belonging to the day itself (scheduled)
Markdown
File paths and timestamps are wrapped in inline code (`...`) to
preserve formatting. Type: uses TODO / DONE (not Todo / Done);
Priority: is shown as a bare letter without the [#] wrapper.
**File:** `project-tasks.md:5`
**Type:** TODO
**Priority:** A
**Time:** `SCHEDULED: <2024-12-05 Wed>`
Need to finalize the database structure before implementation.
**File:** `project-tasks.md:33`
**Time:** `<2025-12-05 Friday 09:00-09:15>`
Daily standup meeting.
**File:** `project-tasks.md:47`
**Type:** TODO
**Time:** `DEADLINE: <2025-12-06 Sat>`
Critical bug fix needs review.
Parsed timestamp fields
To let downstream consumers render agendas without re-parsing the
timestamp string, the timestamp is split into structured fields:
timestamp_type—SCHEDULED,DEADLINE,CLOSED, orPLAINtimestamp_date— date asYYYY-MM-DDtimestamp_time— start time, e.g.10:00(when present)timestamp_end_time— end time, e.g.12:00(when a range was given)
Repeating tasks
The utility honours org-mode repeater syntax for automatically scheduling follow-up occurrences.
Repeater kinds
Every standard org-mode unit is supported:
+Nh— every N hours+Nd— every N days (strict; preserves the original date offset)+Nw— every N weeks+Nm— every N months+Ny— every N years+Nwd— every N working days (project extension; honours RF holidays and weekends)
Repeater modifiers:
+— strict (cumulative); preserves the date offset++— catch-up (smart); preserves the weekday.+— restart-from-completion (relative to the close date)
Working days
Repeaters with the wd (workday) suffix take into account:
- Regular weekends (Saturday, Sunday)
- Official RF holidays
- Holiday shifts
Holiday data lives in holidays_ru.json. At build time (build.rs) the
data is compiled into static Rust constants — the JSON is parsed once
during compilation rather than at runtime.
Examples
`SCHEDULED: <2025-12-05 Thu 10:00 +1h>`
`SCHEDULED: <2025-12-05 Thu +1d>`
`SCHEDULED: <2025-12-05 Thu +1w>`
`SCHEDULED: <2025-12-05 Thu +1m>`
`SCHEDULED: <2025-12-05 Thu +1y>`
`SCHEDULED: <2025-12-05 Thu +1wd>`
`SCHEDULED: <2025-12-05 Thu +2wd>`
Project layout
markdown-org-extract/
├── src/
│ ├── main.rs # CLI entry point, file walker, file I/O
│ ├── cli.rs # Argument parsing (clap), tracing init
│ ├── agenda.rs # Agenda logic (day/week/month), repeaters
│ ├── parser.rs # Task extraction from the markdown AST
│ ├── render.rs # Markdown/HTML rendering
│ ├── format.rs # OutputFormat (clap ValueEnum)
│ ├── error.rs # AppError
│ ├── types.rs # Task / Priority / DayAgenda / ProcessingStats
│ ├── clock.rs # CLOCK parsing and time aggregation
│ ├── holidays.rs # RF workday calendar (singleton, binary search)
│ ├── regex_limits.rs # `compile_bounded`: regex with size/DFA caps
│ └── timestamp/ # Org-mode timestamp parsing
│ ├── parser.rs # <2024-12-05 Thu 10:00 +1d> → ParsedTimestamp
│ ├── extract.rs # pull timestamp/CREATED out of arbitrary text
│ ├── repeater.rs # parsing and arithmetic of repeaters (+1d, ++2w, .+1wd…)
│ └── weekdays.rs # normalisation of Russian weekday names
├── tests/
│ └── cli.rs # CLI integration tests (assert_cmd)
├── examples/ # Sample markdown files
├── docs/ # Supplementary documentation
├── holidays_ru.json # RF holiday / workday calendar
├── build.rs # Generates holidays_data.rs at build time
├── rustfmt.toml # Formatter settings (edition 2021, width 100)
├── rust-toolchain.toml # Pinned channel = stable, components rustfmt+clippy
├── .github/workflows/
│ ├── ci.yml # PR/push CI: lint + test matrix (Linux/macOS/Windows) + cargo audit
│ ├── release.yml # Publish to crates.io on tag v* (+ workflow_dispatch)
│ └── outdated.yml # Weekly non-blocking `cargo outdated`
├── Cargo.toml
├── CHANGELOG.md
├── TODO.md # Deferred technical tasks
├── LICENSE # MIT
└── README.md
See also:
- docs/CLOCK_IMPLEMENTATION.md — CLOCK marker implementation details
- docs/org-mode-keywords.md — supported-keyword reference
- CHANGELOG.md — version history
- TODO.md — deferred technical tasks
Dependencies
clap— command-line argument parsingcomrak— markdown parsing (without onig/syntect:default-features = false)regex— regular expressions (with size/DFA caps)serde/serde_json— data serialisationchrono/chrono-tz— dates and timezonesgrep-regex/grep-searcher— fast pre-filter over keywordsignore— directory tree walk that honours.gitignoreglobset— glob compilation for--globtracing/tracing-subscriber— structured diagnostic logging (--verbose,--quiet,--color,--no-color)
Lazily initialised static regular expressions use std::sync::LazyLock
from the standard library (Rust 1.80+; the project itself requires 1.85).
License
MIT — see the LICENSE file.
Provenance of holidays_ru.json
The holiday calendar and weekend-shift table in holidays_ru.json was
compiled by the project author from the official RF government decrees
on weekend rescheduling. This is public factual information and is not
subject to copyright. For packaging convenience the file is distributed
under the same MIT licence as the rest of the code.
Attribution and a schema description are duplicated inside the file
itself under the _meta key (build.rs ignores underscore-prefixed
top-level keys, so the block has no effect on the compiled output).