1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
//! chunkshop-rs — Rust port of chunkshop.
//!
//! Implements sources (files / HTTP / S3 / DB tables), chunkers, a fastembed
//! embedder, and a modular sink/backend layer (PG / MariaDB / SQLite /
//! ClickHouse). The YAML config schema and target table shape match the
//! Python reference so vectors are interchangeable across implementations.
//!
//! ## Cargo features
//!
//! `default = ["full"]` — preserves backward compatibility with `chunkshop = "0.3"`.
//!
//! Library consumers who want only the chunker structs (e.g. an embedded
//! Postgres extension) can opt into the slim build:
//!
//! ```toml
//! chunkshop = { version = "0.4", default-features = false, features = ["chunkers"] }
//! ```
//!
//! Available features:
//! - `chunkers` — chunker structs + their config types (no fastembed/ort/sqlx).
//! - `embedder-core` — fastembed (BYO `try_new_from_user_defined`) + ORT.
//! No `hf-hub`, no auto-download. Caller supplies model bytes directly via
//! [`embedder::FastembedEmbedder::from_user_defined_files`].
//! - `embedder-hub` — adds `hf-hub` for runtime auto-download. Enables
//! [`embedder::FastembedEmbedder::new`] (stock variants + Xenova int8 BGE
//! bit-near-exact) and the [`chunker::SemanticChunker::new`] convenience.
//! - `embedder` — historical alias = `embedder-core` + `embedder-hub`.
//! Existing consumers see no change.
//! - `extractor` — language detection + entity extractor.
//! - `source` — files / HTTP / S3 source loaders.
//! - `sink` — the full modular sink/backend layer (PG/MariaDB/SQLite/ClickHouse).
//! - `pipeline` — high-level Pipeline + run_cell glue.
//! - `bakeoff` — chunker × embedder matrix evaluator.
//! - `full` — all of the above (default).
// The entire modular sink/backend layer is folded under the `sink` feature
// (deliberate v4 design decision — no per-backend features). DB-table sources
// reuse this backend layer, so their fetchers are additionally gated.
// `hf_cache` is the network-fetch path (HuggingFace download via hf-hub).
// Slim consumers on `embedder-core` alone never compile this module.
pub
// `sources` is always declared so the `Document` struct is always available
// (chunkers consume `&Document`). The heavy fetcher impls inside this module
// are themselves cfg-gated behind the `source` (and, for DB-table sources,
// `sink`) features.
// RM-B Task 5: pluggable raw-bytes storage (filesystem + S3). Always-on for
// LocalRawStore; the S3 backend is `source`-feature-gated for the
// object_store dep.
// RM-A: zero-network Rust consolidator default + trait. Always-on (only the
// staging/source/sink layer is `memory`-feature-gated).
/// RM-A: agent-memory staging API — chunkshop-owned append-only session
/// staging table with deterministic event_id derivation (byte-identical
/// to Python `chunkshop.memory.staging`).
pub use ;
pub use ;
pub use ;
pub use ;
pub use FastembedEmbedder;
pub use Pipeline;
pub use ;
pub use ;
// `Document` is always available; the fetcher sources are gated.
pub use Document;
pub use ;
pub use ;