1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
//! `DocumentLoader` — async source-side trait.
//!
//! Loaders stream `Document`s so ingestion pipelines stay
//! memory-bounded over arbitrarily large corpora. A backend that
//! produces millions of records (S3 bucket walk, Confluence space
//! enumeration) must not buffer the whole catalogue — the
//! `BoxStream` return forces incremental yield.
//!
//! Loaders are *async* (network IO, paginated APIs) but a
//! `DocumentLoader` impl that wraps a sync source pre-loads the
//! payload eagerly inside `load(...)` and yields synchronously
//! from there.
use Pin;
use async_trait;
use ;
use Stream;
use crateDocument;
/// Boxed stream type alias for documents produced by a
/// [`DocumentLoader`]. Items are `Result` so a partial-success
/// stream can yield successful documents while reporting per-item
/// errors — a single mid-walk failure does not abort the whole
/// ingestion run.
pub type DocumentStream<'a> = ;
/// Source-side trait the ingestion pipeline pulls documents from.
///
/// Implementations cover network sources (HTTP, REST APIs, GraphQL),
/// SaaS connectors (Notion, Confluence, GDrive, Slack), object
/// stores (S3, GCS, Azure Blob), and filesystem walkers (the latter
/// behind invariant 9 sandbox exemption — typically in a coding-agent
/// companion crate, not this surface).
///
/// The cancellation token on the supplied
/// [`ExecutionContext`](entelix_core::ExecutionContext) gates the
/// walk; long-running loaders poll `ctx.is_cancelled()` between
/// pages so an abandoned ingestion run releases backend resources
/// promptly.