1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
//! # scankit — walk + watch + filter directory trees.
//!
//! `scankit` is the shared scanner that Tauri / Iced / native
//! desktop apps reach for when they need to enumerate user files.
//! Its job is small but easy to get wrong:
//!
//! 1. Walk a directory tree (`walkdir` under the hood).
//! 2. Skip what the user said to skip — `.DS_Store`, `node_modules`,
//! `.git`, `*.log`, anything matching the configured glob set.
//! 3. Drop oversized files before you ever read them — a rogue
//! 50 GB sqlite database shouldn't take your indexer offline.
//! 4. (Future, behind `watch` feature) keep watching the tree and
//! emit change events as files are added / modified / removed.
//!
//! What `scankit` deliberately does NOT do:
//!
//! - Parse files. Use [`mdkit`](https://crates.io/crates/mdkit) or
//! bring your own. `scankit` hands you `ScanEntry`s and gets out
//! of the way.
//! - Schema extraction, search indexing, embedding generation.
//! Those are the layers that consume `scankit`'s output.
//! - PII redaction, secrets scanning. Privacy policy is the
//! embedding application's concern.
//!
//! ## Quick start
//!
//! ```no_run
//! use scankit::{Scanner, ScanConfig};
//! use std::path::Path;
//!
//! let scanner = Scanner::new(
//! ScanConfig::default()
//! .max_file_size_bytes(50 * 1024 * 1024) // 50 MB cap
//! .add_exclude("**/.git/**")?
//! .add_exclude("**/node_modules/**")?
//! .add_exclude("**/.DS_Store")?,
//! )?;
//!
//! for result in scanner.walk(Path::new("/Users/me/Documents")) {
//! match result {
//! Ok(entry) => println!("{}: {} bytes", entry.path.display(), entry.size_bytes),
//! Err(e) => eprintln!("scan error: {e}"),
//! }
//! }
//! # Ok::<(), scankit::Error>(())
//! ```
//!
//! ## Why a separate crate
//!
//! Every "index files on the user's machine" project rebuilds the
//! same five hundred lines of walkdir-with-excludes-and-size-cap
//! glue, and every project gets it slightly wrong. `scankit` ships
//! it once, with the edge cases (symlink loops, permission denials,
//! mid-walk concurrent deletes) handled in one place.
use PathBuf;
use SystemTime;
pub use ;
pub use ;
// ---------------------------------------------------------------------------
// ScanEntry — the unit of output
// ---------------------------------------------------------------------------
/// One file produced by a successful walk. Directories are not
/// surfaced — `Scanner` recurses into them silently. Symlinks are
/// dereferenced when [`ScanConfig::follow_symlinks`] is true and
/// emitted as the target file; otherwise they're skipped.
///
/// `#[non_exhaustive]` so we can grow the struct (e.g. add inode /
/// content hash) in minor versions without breaking external
/// struct-literal construction.
// ---------------------------------------------------------------------------
// ScanConfig — the policy
// ---------------------------------------------------------------------------
/// Configuration for a [`Scanner`]. Construct via [`ScanConfig::default`]
/// then layer on options with the `with_*` / `add_*` builder methods,
/// or build from struct literal during the same crate.
///
/// `#[non_exhaustive]` — same forward-compat reasoning as
/// [`ScanEntry`].