1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
//! Source-side shared types. `Document` is the unit yielded by every source.
//!
//! Mirrors `python/src/chunkshop/sources/base.py`. Per-source impls live in
//! sibling files (files.rs, json_corpus.rs, pg_table.rs, http.rs, s3.rs).
//!
//! RM-B Task 1 adds the SP-1 sync primitives that Python landed in 0.6.0:
//! `SyncMode`, `IncrementalSource`, `PrunableSource`, `StaleCursorError`, and
//! the `fingerprint: Option<String>` field on `Document`.
use ;
use Future;
use Error;
/// A document emitted by a Source.
///
/// `metadata` is `serde_json::Value` so it can round-trip rich nested types
/// (vs. Python's flat `dict[str, Any]`). `fingerprint` (RM-B / SP-1) is an
/// opaque per-document signature (ETag, content-hash, mtime, etc.) used by
/// consumers running `SyncMode::Fingerprint` to detect changes without a
/// per-source cursor.
/// How a Source detects changes between runs. Mirrors
/// `chunkshop.sources.base.SyncMode` exactly (snake_case wire format).
/// Raised by `iter_changes_since` when a server-side cursor is too old to honor.
///
/// Consumers should treat this as a signal to fall back to a full resync:
/// re-call with `empty_cursor()` (or call the source's `iter_documents()` path
/// directly).
///
/// Constructed via `StaleCursorError::new(msg)`. Auto-converts into
/// `anyhow::Error`, so impls returning `anyhow::Result` can simply
/// `return Err(StaleCursorError::new("...").into())`. Consumers detect it via
/// `err.downcast_ref::<StaleCursorError>()`.
;
/// Sources that support cursor-based incremental sync implement this.
///
/// Cursor shape is source-specific:
/// - S3: `BTreeMap<String, String>` (key → ETag)
/// - pg_table: `{ after_ts: String, after_id: String }` tuple cursor
/// - http: `BTreeMap<String, HttpUrlCursor>` (url → ETag + Last-Modified)
///
/// Consumers treat the cursor as opaque. chunkshop never stores it — the
/// consumer (orchestrator, application) persists it between runs.
///
/// `cursor_from(last)` returns a per-doc DELTA. Consumers build the next
/// cursor by starting from the previous cursor and merging each emitted
/// doc's delta in iteration order:
///
/// ```ignore
/// let mut next = prev.clone();
/// for d in docs { next.extend(source.cursor_from(d)); }
/// ```
///
/// For map-style cursors (S3, http) this accumulates the full manifest and
/// preserves unchanged keys. For monotonic cursors (pg_table) the last doc
/// wins.
///
/// **Async-in-trait convention.** Matches `BackendConn`'s pattern: explicit
/// `impl Future<Output = …> + Send` rather than the bare `async fn` sugar,
/// so the auto-`Send` bound on the returned future is captured explicitly
/// and resilient across compiler versions.
/// Sources that can enumerate source-side deletions implement this.
///
/// Typically called at a lower cadence than `iter_changes_since` because
/// prune detection often requires walking the full source manifest. Returns
/// source-IDs (the `Document.id` field), not full `Document` objects.