markdown-org-extract 0.3.1

CLI utility for extracting tasks from markdown files with Emacs Org-mode support
markdown-org-extract-0.3.1 is not a library.

markdown-org-extract

crates.io CI license

CLI utility for extracting tasks from markdown files with support for Emacs Org-mode markers.

Table of contents

Installation and build

Requirements

  • Rust 1.85 or newer (the comrak 0.50+ upgrade requires the 2024 edition)
  • Cargo

Install from crates.io

If you only need the binary and do not want to clone the repository:

cargo install markdown-org-extract

After installation the binary lands in ~/.cargo/bin/markdown-org-extract (this path must be on your PATH).

Building the project

Debug build:

cargo build

Optimised release build:

cargo build --release

The resulting binary appears in:

  • Debug: target/debug/markdown-org-extract
  • Release: target/release/markdown-org-extract

Running

After building, run the utility:

# Debug build
./target/debug/markdown-org-extract [OPTIONS]

# Release build
./target/release/markdown-org-extract [OPTIONS]

Or use cargo to run it without an explicit build step:

cargo run -- [OPTIONS]

Testing

Run the test suite:

cargo test

Run with verbose output:

cargo test -- --nocapture

Static checks:

cargo check
cargo clippy

Workday-handling test coverage

holidays module:

  • Loading the holiday calendar
  • Distinguishing regular weekends and working days
  • 2025 New Year holidays (1–8 January) and 2026 (1–9 January)
  • 2026 holiday shifts (8 March → 9 March, 9 May → 11 May)
  • Skipping weekends and holidays when locating the next working day

timestamp::repeater module:

  • Parsing repeaters +1wd, +2wd, ++1wd, .+1wd
  • Computing the next occurrence over working days
  • Skipping holidays in repeater arithmetic

timestamp::parser module:

  • Parsing timestamps that carry +1wd and +2wd

Usage

markdown-org-extract [OPTIONS]

Options

  • --dir <DIR> — directory to scan (default: .)
  • --glob <GLOB> — file filter pattern (default: *.md)
  • --format <FORMAT> — output format: json, md, html (default: json)
  • --output <OUTPUT> — file to write the result to; - means stdout (default: stdout)
  • --locale <LOCALE> — weekday locales, comma-separated (default: ru,en)
  • --agenda <MODE> — agenda mode: day, week, month, tasks (default: day)
  • --tasks — show all TODO tasks sorted by priority (alias for --agenda tasks)
  • --date <DATE> — date for day mode in YYYY-MM-DD (default: today)
  • --from <DATE> — start date for week/month mode in YYYY-MM-DD (default: Monday of the current week / first day of the current month)
  • --to <DATE> — end date for week/month mode in YYYY-MM-DD (default: Sunday of the current week / last day of the current month)
  • --tz <TIMEZONE> — IANA timezone for determining the current date (default: Europe/Moscow)
  • --current-date <DATE> — explicit current date for overdue calculation in YYYY-MM-DD (default: today in the configured timezone)
  • --holidays <YEAR> — print the holiday list for the given year (1900–2100) as JSON
  • --absolute-paths — emit absolute file paths instead of paths relative to --dir
  • --max-tasks <N> — task limit (1..=10_000_000, default 10_000). Applied both as a per-file cap and as a global ceiling
  • -v, --verbose — verbose stderr log (-v = info, -vv = debug, -vvv = trace). Mutually exclusive with --quiet
  • -q, --quiet — suppress all diagnostic messages except critical errors
  • --color <MODE> — control ANSI colour in logs: auto (default), always, never
  • --no-color — disable ANSI colour in logs; equivalent to --color never. The NO_COLOR environment variable has the same effect (see no-color.org)

Examples

Extract tasks from the current directory as JSON:

markdown-org-extract

Extract tasks from a specific directory:

markdown-org-extract --dir ./notes

Save the result to an HTML file:

markdown-org-extract --dir ./notes --format html --output agenda.html

Emit markdown:

markdown-org-extract --dir ./notes --format md

Run against the bundled examples:

markdown-org-extract --dir ./examples
markdown-org-extract --dir ./examples --format md
markdown-org-extract --dir ./examples --format html --output examples-agenda.html

Use only Russian weekday names:

markdown-org-extract --dir ./notes --locale ru

Use only English weekday names:

markdown-org-extract --dir ./notes --locale en

Agenda examples

Today's tasks (default):

markdown-org-extract --dir ./notes

Tasks for a specific date:

markdown-org-extract --dir ./notes --agenda day --date 2025-12-10

Retrieve the holiday list for a year:

markdown-org-extract --holidays 2025
markdown-org-extract --holidays 2026

Sample holiday output:

[
  "2025-01-01",
  "2025-01-02",
  "2025-01-03",
  "2025-01-04",
  "2025-01-05",
  "2025-01-06",
  "2025-01-07",
  "2025-01-08",
  "2025-02-23",
  "2025-03-08",
  "2025-05-01",
  "2025-05-09",
  "2025-06-12",
  "2025-11-04"
]

Tasks for the current week:

markdown-org-extract --dir ./notes --agenda week

Tasks for the current month:

markdown-org-extract --dir ./notes --agenda month

Tasks across a date range:

markdown-org-extract --dir ./notes --agenda week --from 2025-12-01 --to 2025-12-07
markdown-org-extract --dir ./notes --agenda month --from 2025-12-01 --to 2025-12-31

All TODO tasks sorted by priority:

markdown-org-extract --dir ./notes --tasks

Use a different timezone:

markdown-org-extract --dir ./notes --tz UTC
markdown-org-extract --dir ./notes --tz America/New_York

Use an explicit current date (useful for tests and deterministic output):

markdown-org-extract --dir ./notes --agenda week --current-date 2024-12-05

Cap the number of extracted tasks (useful for batch processing of very large trees):

markdown-org-extract --dir ./notes --max-tasks 1000

Enable verbose processing logs on stderr:

markdown-org-extract --dir ./notes -v

Example files

The examples/ directory contains markdown files with various markers. The integration tests in tests/cli.rs exercise the same files.

General scenarios:

  • project-tasks.md — project development tasks
  • personal-notes.md — personal notes and tasks
  • meeting-notes.md — meeting notes
  • work-log.md — mixed log with SCHEDULED, DEADLINE, and CLOCK entries

Org-mode marker demonstrations:

  • priorities.md — tasks with priorities [#A], [#B], [#C]
  • org-mode-timestamps.md — timestamp forms, ranges, and repeaters
  • created-test.md — using CREATED: for the creation date
  • workdays-test.md — workday repeaters (+1wd, +2wd) interacting with the holiday calendar

CLOCK-block demonstrations (time tracking):

  • clock-formats.md — every supported CLOCK line form
  • clock-inline.md — CLOCK inside inline code (`CLOCK: ...`)
  • clock-test.md — closed CLOCK intervals with => HH:MM
  • simple-clock.md — CLOCK inside fenced code blocks
  • done-clock.md — CLOCK attached to a DONE task (post-completion accounting)

Try running:

./target/release/markdown-org-extract --dir ./examples --format md

Agenda modes

The utility supports four task-listing modes, mirroring Emacs Org-mode:

day — tasks for a single day

Shows tasks whose timestamps (SCHEDULED, DEADLINE) fall on the given date. The default is today in the configured timezone.

# Today's tasks
markdown-org-extract --agenda day

# Tasks for a specific date
markdown-org-extract --agenda day --date 2025-12-10

week — tasks for a week

Shows tasks whose timestamps fall within a date range. The default is the current week (Monday–Sunday).

Each day lists:

  • Tasks scheduled for that day (scheduled)
  • Upcoming tasks relative to that day (upcoming)
  • Overdue tasks (overdue) — only for the current date
# Current week
markdown-org-extract --agenda week

# Explicit range
markdown-org-extract --agenda week --from 2025-12-01 --to 2025-12-07

month — tasks for a month

Shows tasks whose timestamps fall within a date range. The default is the current month (first to last day).

Behaves the same way as week — each day surfaces scheduled, upcoming, and overdue tasks.

# Current month
markdown-org-extract --agenda month

# Explicit range
markdown-org-extract --agenda month --from 2025-12-01 --to 2025-12-31

tasks — all TODO tasks

Lists every task whose state is TODO, sorted by priority (A → B → C → no priority). Timestamps are ignored.

# All TODO tasks by priority
markdown-org-extract --tasks

Timezones

The --tz option controls which timezone is used to derive the current date and current week. All standard IANA timezones are accepted.

# Moscow time (default)
markdown-org-extract --agenda day --tz Europe/Moscow

# UTC
markdown-org-extract --agenda day --tz UTC

# New York
markdown-org-extract --agenda day --tz America/New_York

Supported markers

Task markers

The utility recognises TODO and DONE markers in headings:

### TODO Implement feature
### DONE Complete task

Task priorities

Priorities follow the org-mode convention (letters A–Z inside square brackets):

### TODO [#A] Critical task
### TODO [#B] Important task
### TODO [#C] Regular task
### DONE [#A] Completed high-priority task

The priority appears after the TODO/DONE marker and before the task text. The most common priorities are:

  • [#A] — high priority (critical tasks)
  • [#B] — medium priority (important tasks)
  • [#C] — low priority (regular tasks)

Priority is optional.

Timestamps

Timestamps must be wrapped in backticks:

Simple timestamp:

`<2024-12-10 Mon 10:00-12:00>`

Planning markers:

`CREATED: <2024-12-01 Mon>`
`DEADLINE: <2024-12-15 Sun>`
`SCHEDULED: <2024-12-05 Wed>`
`CLOSED: <2024-12-01 Mon>`

Date range:

`<2024-12-20 Mon>--<2024-12-22 Wed>`

Inactive timestamps (NOT extracted):

`[2024-12-10 Mon]` — square brackets denote an inactive timestamp

Note: CREATED is extracted separately from the other timestamps and stored in the created field. This lets consumers track the task creation date independently of SCHEDULED, DEADLINE, and CLOSED.

Time tracking (CLOCK)

The utility supports CLOCK entries for tracking time spent on tasks, mirroring Emacs Org-mode.

CLOCK format inside backticks (same as timestamps):

### TODO Implement feature

`SCHEDULED: <2024-12-10 Tue>`
`CLOCK: <2024-12-09 Mon 10:00>--<2024-12-09 Mon 12:30> => 2:30`
`CLOCK: <2024-12-09 Mon 14:00>--<2024-12-09 Mon 16:15> => 2:15`

Alternative format inside code blocks (as in org-mode):

### TODO Implement feature

`SCHEDULED: <2024-12-10 Tue>`

CLOCK: [2024-12-09 Mon 10:00]--[2024-12-09 Mon 12:30] => 2:30 CLOCK: [2024-12-09 Mon 14:00]--[2024-12-09 Mon 16:15] => 2:15

Open CLOCK entry (active work):

`CLOCK: <2024-12-10 Tue 09:00>`

Features:

  • Automatic extraction of every CLOCK entry under a heading
  • Total time (total_clock_time) summed across all entries
  • Open (active) CLOCK entries without a close time
  • Rendering in JSON, Markdown, and HTML
  • Both square [...] (org-mode style) and angle <...> brackets are accepted

Sample JSON output:

{
  "heading": "Implement feature",
  "clocks": [
    {
      "start": "2024-12-09 Mon 10:00",
      "end": "2024-12-09 Mon 12:30",
      "duration": "2:30"
    },
    {
      "start": "2024-12-09 Mon 14:00",
      "end": "2024-12-09 Mon 16:15",
      "duration": "2:15"
    }
  ],
  "total_clock_time": "4:45"
}

Sample Markdown output:

## Implement feature
**Total Time:** 4:45

**Clock:**
- 2024-12-09 Mon 10:00 → 2024-12-09 Mon 12:30 (2:30)
- 2024-12-09 Mon 14:00 → 2024-12-09 Mon 16:15 (2:15)

Locale support

The utility recognises weekday names in different languages via the --locale option.

Supported locales

  • en — English (Mon, Tue, Wed, Thu, Fri, Sat, Sun, Monday, Tuesday, ...)
  • ru — Russian (Пн, Вт, Ср, Чт, Пт, Сб, Вс, Понедельник, Вторник, ...)

The default is both locales: --locale ru,en.

Russian-weekday examples

### TODO Встреча
`<2024-12-10 Пн 10:00>`

### Конференция
`<2024-12-20 Понедельник>--<2024-12-22 Среда>`

### TODO Задача
`DEADLINE: <2024-12-15 Вс>`

Russian weekday names are normalised to the English form during extraction.

Output format

The output format depends on the agenda mode.

--tasks mode (task list)

JSON

Optional fields (priority, created, timestamp_time, timestamp_end_time, clocks, total_clock_time, task_type) are omitted when absent rather than serialised as null. This matches the #[serde(skip_serializing_if = "Option::is_none")] convention used in src/types.rs.

Example below is the actual output of --dir examples --glob 'project-tasks.md' --tasks --max-tasks 1 --current-date 2025-12-05.

[
  {
    "file": "project-tasks.md",
    "line": 5,
    "heading": "Design database schema",
    "content": "Need to finalize the database structure before implementation.",
    "task_type": "TODO",
    "priority": "A",
    "timestamp": "SCHEDULED: <2024-12-05 Wed>",
    "timestamp_type": "SCHEDULED",
    "timestamp_date": "2024-12-05"
  }
]

Markdown

# Tasks

## Design database schema
**File:** `project-tasks.md:5`
**Type:** TODO
**Priority:** A
**Time:** `SCHEDULED: <2024-12-05 Wed>`

Need to finalize the database structure before implementation.

--agenda day and --agenda week modes (day-grouped agenda)

In these modes tasks are grouped by day. Each day contains task categories (in display order):

  1. Overdue (only for the current date) — overdue tasks, oldest first
  2. Scheduled (with time) — that day's tasks with a time, earliest first
  3. Scheduled (no time) — that day's tasks without a time
  4. Upcoming — upcoming tasks relative to that day, nearest first

Important: Each day shows upcoming tasks relative to that day, not relative to a global reference date.

JSON

File paths are emitted relative to --dir (or absolute when --absolute-paths is set). Optional fields are omitted when absent, as in --tasks mode.

[
  {
    "date": "2025-12-05",
    "overdue": [
      {
        "file": "project-tasks.md",
        "line": 5,
        "heading": "Design database schema",
        "content": "Need to finalize the database structure before implementation.",
        "task_type": "TODO",
        "priority": "A",
        "timestamp": "SCHEDULED: <2024-12-05 Wed>",
        "timestamp_type": "SCHEDULED",
        "timestamp_date": "2024-12-05",
        "days_offset": -365
      }
    ],
    "scheduled_timed": [],
    "scheduled_no_time": [],
    "upcoming": [
      {
        "file": "project-tasks.md",
        "line": 47,
        "heading": "Review pull request #42",
        "content": "Critical bug fix needs review.",
        "task_type": "TODO",
        "timestamp": "DEADLINE: <2025-12-06 Sat>",
        "timestamp_type": "DEADLINE",
        "timestamp_date": "2025-12-06",
        "days_offset": 1
      }
    ]
  }
]

The days_offset field encodes:

  • Positive number — days until the deadline (upcoming)
  • Negative number — days the task is overdue
  • Absent for tasks belonging to the day itself (scheduled)

Markdown

File paths and timestamps are wrapped in inline code (`...`) to preserve formatting. Type: uses TODO / DONE (not Todo / Done); Priority: is shown as a bare letter without the [#] wrapper.

# Agenda

## 2025-12-05

### Overdue

#### Design database schema (365 days ago)
**File:** `project-tasks.md:5`
**Type:** TODO
**Priority:** A
**Time:** `SCHEDULED: <2024-12-05 Wed>`

Need to finalize the database structure before implementation.

### Scheduled

#### Daily standup
**File:** `project-tasks.md:33`
**Time:** `<2025-12-05 Friday 09:00-09:15>`

Daily standup meeting.

### Upcoming

#### Review pull request \#42 (in 1 days)
**File:** `project-tasks.md:47`
**Type:** TODO
**Time:** `DEADLINE: <2025-12-06 Sat>`

Critical bug fix needs review.

Parsed timestamp fields

To let downstream consumers render agendas without re-parsing the timestamp string, the timestamp is split into structured fields:

  • timestamp_typeSCHEDULED, DEADLINE, CLOSED, or PLAIN
  • timestamp_date — date as YYYY-MM-DD
  • timestamp_time — start time, e.g. 10:00 (when present)
  • timestamp_end_time — end time, e.g. 12:00 (when a range was given)

Repeating tasks

The utility honours org-mode repeater syntax for automatically scheduling follow-up occurrences.

Repeater kinds

Every standard org-mode unit is supported:

  • +Nh — every N hours
  • +Nd — every N days (strict; preserves the original date offset)
  • +Nw — every N weeks
  • +Nm — every N months
  • +Ny — every N years
  • +Nwdevery N working days (project extension; honours RF holidays and weekends)

Repeater modifiers:

  • + — strict (cumulative); preserves the date offset
  • ++ — catch-up (smart); preserves the weekday
  • .+ — restart-from-completion (relative to the close date)

Working days

Repeaters with the wd (workday) suffix take into account:

  • Regular weekends (Saturday, Sunday)
  • Official RF holidays
  • Holiday shifts

Holiday data lives in holidays_ru.json. At build time (build.rs) the data is compiled into static Rust constants — the JSON is parsed once during compilation rather than at runtime.

Examples

### TODO Hourly check
`SCHEDULED: <2025-12-05 Thu 10:00 +1h>`

### TODO Daily task
`SCHEDULED: <2025-12-05 Thu +1d>`

### TODO Weekly meeting
`SCHEDULED: <2025-12-05 Thu +1w>`

### TODO Monthly report
`SCHEDULED: <2025-12-05 Thu +1m>`

### TODO Annual review
`SCHEDULED: <2025-12-05 Thu +1y>`

### TODO Workday-only task
`SCHEDULED: <2025-12-05 Thu +1wd>`

### TODO Every two working days
`SCHEDULED: <2025-12-05 Thu +2wd>`

Project layout

markdown-org-extract/
├── src/
│   ├── main.rs             # CLI entry point, file walker, file I/O
│   ├── cli.rs              # Argument parsing (clap), tracing init
│   ├── agenda.rs           # Agenda logic (day/week/month), repeaters
│   ├── parser.rs           # Task extraction from the markdown AST
│   ├── render.rs           # Markdown/HTML rendering
│   ├── format.rs           # OutputFormat (clap ValueEnum)
│   ├── error.rs            # AppError
│   ├── types.rs            # Task / Priority / DayAgenda / ProcessingStats
│   ├── clock.rs            # CLOCK parsing and time aggregation
│   ├── holidays.rs         # RF workday calendar (singleton, binary search)
│   ├── regex_limits.rs     # `compile_bounded`: regex with size/DFA caps
│   └── timestamp/          # Org-mode timestamp parsing
│       ├── parser.rs       #   <2024-12-05 Thu 10:00 +1d> → ParsedTimestamp
│       ├── extract.rs      #   pull timestamp/CREATED out of arbitrary text
│       ├── repeater.rs     #   parsing and arithmetic of repeaters (+1d, ++2w, .+1wd…)
│       └── weekdays.rs     #   normalisation of Russian weekday names
├── tests/
│   └── cli.rs              # CLI integration tests (assert_cmd)
├── examples/               # Sample markdown files
├── docs/                   # Supplementary documentation
├── holidays_ru.json        # RF holiday / workday calendar
├── build.rs                # Generates holidays_data.rs at build time
├── rustfmt.toml            # Formatter settings (edition 2021, width 100)
├── rust-toolchain.toml     # Pinned channel = stable, components rustfmt+clippy
├── .github/workflows/
│   ├── ci.yml              # PR/push CI: lint + test matrix (Linux/macOS/Windows) + cargo audit
│   ├── release.yml         # Publish to crates.io on tag v* (+ workflow_dispatch)
│   └── outdated.yml        # Weekly non-blocking `cargo outdated`
├── Cargo.toml
├── CHANGELOG.md
├── TODO.md                 # Deferred technical tasks
├── LICENSE                 # MIT
└── README.md

See also:

Dependencies

  • clap — command-line argument parsing
  • comrak — markdown parsing (without onig/syntect: default-features = false)
  • regex — regular expressions (with size/DFA caps)
  • serde / serde_json — data serialisation
  • chrono / chrono-tz — dates and timezones
  • grep-regex / grep-searcher — fast pre-filter over keywords
  • ignore — directory tree walk that honours .gitignore
  • globset — glob compilation for --glob
  • tracing / tracing-subscriber — structured diagnostic logging (--verbose, --quiet, --color, --no-color)

Lazily initialised static regular expressions use std::sync::LazyLock from the standard library (Rust 1.80+; the project itself requires 1.85).

License

MIT — see the LICENSE file.

Provenance of holidays_ru.json

The holiday calendar and weekend-shift table in holidays_ru.json was compiled by the project author from the official RF government decrees on weekend rescheduling. This is public factual information and is not subject to copyright. For packaging convenience the file is distributed under the same MIT licence as the rest of the code.

Attribution and a schema description are duplicated inside the file itself under the _meta key (build.rs ignores underscore-prefixed top-level keys, so the block has no effect on the compiled output).