# `dsc topic pull` - full thread export
> **Status: Phase 1 implemented in v0.10.11.** `dsc topic pull <discourse>
> <topic_id> --full` ships and writes a YAML-frontmatter + per-post-headings
> Markdown file. `PostStream::stream` and `Post::username` added to the
> model. Batch-fetch via `/t/{id}/posts.json?post_ids[]=…&include_raw=1`.
> Default (no `--full`) behaviour unchanged. Phase 2 (`--since`,
> `--format json`) remains planned.
Spec for pulling an entire topic thread (all posts, not just the OP) to a local Markdown file. Goal: make `dsc topic pull` useful for reading, archiving, and summarising a thread - not just for the pull/push OP-editing workflow. Driver: real-world use by an LLM agent reading a forum thread to draft a response.
## Motivation
`dsc topic pull <discourse> <topic_id>` currently writes only the first post (the OP) to a local file. This is the right behaviour for the pull → edit → push workflow, but it makes the command useless for any read-oriented use case: reading a long thread, archiving a discussion, feeding a full conversation to an LLM, or producing a human-readable snapshot.
The workaround today is two sequential `curl` calls against the Discourse JSON API with manual pagination, then stripping HTML from the `cooked` field because `include_raw=1` on the topic endpoint returns raw only for the first page's posts:
```
GET /t/364.json?include_raw=1 # posts 1-20 with raw
GET /t/364.json?page=2&include_raw=1 # posts 21+ with raw
```
This workaround is fiddly, requires knowing the pagination boundary, and gives cooked HTML rather than raw Markdown for later-page posts unless `include_raw=1` is set consistently.
## Current state (as of 2026-06-10)
`dsc topic pull` calls `client.fetch_topic(topic_id, true)` which hits `/t/{id}.json?include_raw=1` and returns the `TopicResponse`. It then does:
```rust
Only the OP raw content is written. The `PostStream` model has `posts: Vec<Post>` but does not capture `post_stream.stream` (the flat array of all post IDs that Discourse includes in the response). Posts beyond the first page (Discourse returns 20 per page) are not fetched at all.
`topic push` similarly targets only the OP - that behaviour is correct and should not change.
## Proposed CLI surface
```text
dsc topic pull <discourse> <topic_id> [local_path] [--full]
```
- **Without `--full`** (current behaviour): write only the OP raw Markdown to `<local_path>`. No change.
- **With `--full`**: fetch all posts in the thread, write a single Markdown file containing all posts in order, each demarcated by a heading with post number, username, and timestamp.
Output format with `--full`:
```markdown
---
title: Sitekit, eRedBook and Harris Health Alliance Acquisition
topic_id: 364
url: https://forum.rcpch.tech/t/sitekit-eredbook-and-harris-health-alliance-acquisition-24-03-2026/364
posts_count: 27
pulled_at: 2026-06-10T11:34:00Z
---
## Post 1 · pacharanero · 2026-03-24
[raw content of post 1]
---
## Post 2 · pacharanero · 2026-03-25
[raw content of post 2]
---
```
No `push` counterpart is needed or in scope for `--full`. A full-thread file is read-only output.
## Reference: API calls observed in the field
Tested against forum.rcpch.tech (Discourse 3.x), topic 364, 27 posts.
```
GET /t/364.json?include_raw=1
Api-Key: <redacted>
Api-Username: pacharanero
→ 200 OK
{
"title": "Sitekit, eRedBook and Harris Health Alliance Acquisition 24-03-2026",
"slug": "sitekit-eredbook-and-harris-health-alliance-acquisition-24-03-2026",
"posts_count": 27,
"post_stream": {
"posts": [ /* first 20 posts, each with "raw" field present */ ],
"stream": [1111, 1112, 1113, ..., 1137] /* all 27 post IDs */
}
}
```
```
GET /t/364.json?page=2&include_raw=1
Api-Key: <redacted>
Api-Username: pacharanero
→ 200 OK
{
"post_stream": {
"posts": [ /* posts 21-27, each with "raw" field present */ ]
}
}
```
The `stream` array (all post IDs) is present on page 1 only. Page size is 20. The `?page=N` parameter is 1-indexed and implicit page 1 is the default. Alternatively, specific posts can be fetched by ID via:
```
GET /t/{id}/posts.json?post_ids[]=1111&post_ids[]=1112&include_raw=1
```
This avoids page arithmetic and is preferable when `stream` is available - fetch all IDs from `stream`, chunk into batches of ~20, request each batch. This approach is used by the Discourse JS client internally.
### Model changes needed
`PostStream` needs a new optional field:
```rust
pub struct PostStream {
pub posts: Vec<Post>,
#[serde(default)]
pub stream: Vec<u64>, // all post IDs; present on first-page response only
}
```
`Post` needs `username` and `created_at` for the output heading:
```rust
pub struct Post {
pub id: u64,
#[serde(default)]
pub username: Option<String>,
#[serde(default)]
pub raw: Option<String>,
#[serde(default)]
pub updated_at: Option<String>,
#[serde(default)]
pub created_at: Option<String>,
}
```
(`username` is already returned by the API but not currently captured in the model.)
## Phases
### Phase 1 - blocking
- [x] Add `stream: Vec<u64>` to `PostStream` model
- [x] Add `username: Option<String>` to `Post` model (already in API response, just not modelled)
- [x] Add `fetch_topic_all_posts(topic_id)` to `DiscourseClient`: fetch page 1, extract `stream`, chunk remaining IDs, batch-fetch via `/t/{id}/posts.json?post_ids[]=…&include_raw=1`, merge into ordered `Vec<Post>`
- [x] Add `--full` flag to `dsc topic pull` CLI
- [x] Write full-thread Markdown output (YAML frontmatter + `## Post N · username · date` headings + raw body + `---` separators)
- [x] `topic_pull` without `--full`: no behaviour change
### Phase 2 - iteration ergonomics
- [ ] `--since <post_number>` - pull only posts from post N onwards (useful for following a thread over time)
- [ ] `--format json` - emit structured JSON (array of `{post_number, username, created_at, raw}`) for piping to other tools / LLMs
## Backward compatibility
No change to the default `dsc topic pull` behaviour. `--full` is additive. The model changes (`stream`, `username`) add optional fields with `#[serde(default)]` and cannot break existing deserialisation.
## Out of scope
- `dsc topic push --full`: a full-thread file is a read-only snapshot; replies are handled by `dsc topic reply`.
- Fetching rendered HTML (`cooked`) - raw Markdown is sufficient and more useful for LLM and editing workflows.
- Streaming output for very large threads - page-at-a-time batch fetching is good enough.
- Diffing thread snapshots over time - that is a separate archiving concern.