file_declutter/
lib.rs

1//! # File Declutter
2//!
3//! [![badge github]][url github]
4//! [![badge crates.io]][url crates.io]
5//! [![badge docs.rs]][url docs.rs]
6//! [![badge license]][url license]
7//!
8//! [badge github]: https://img.shields.io/badge/github-FloGa%2Ffile--declutter-green
9//! [badge crates.io]: https://img.shields.io/crates/v/file-declutter
10//! [badge docs.rs]: https://img.shields.io/docsrs/file-declutter
11//! [badge license]: https://img.shields.io/crates/l/file-declutter
12//!
13//! [url github]: https://github.com/FloGa/file-declutter
14//! [url crates.io]: https://crates.io/crates/file-declutter
15//! [url docs.rs]: https://docs.rs/file-declutter
16//! [url license]: https://github.com/FloGa/file-declutter/blob/develop/LICENSE
17//!
18//! > Reorganizes files into nested folders based on their filenames.
19//!
20//! *File Declutter* is a little command line tool that helps you bring order to
21//! your large directories. It can "declutter" a flat list of files by
22//! redistributing them into nested subdirectories based on their filenames. It's
23//! particularly useful for organizing large numbers of files in a single
24//! directory (for example images, documents, etc.) into a more manageable
25//! structure.
26//!
27//! This crate is split into an [Application](#application) part and a
28//! [Library](#library) part.
29//!
30//! ## Motivation
31//!
32//! The need for this little tool derived from a situation where I was confronted
33//! with a flat directory of 500k files and more. Well, to be frank, it was one of
34//! my other creations, namely [*DedupeFS*][dedupefs github], which presents files
35//! as 1MB chunks, named after their checksums. By using this over my media hard
36//! drive, I ended up with so many files in one directory that my file manager
37//! just refused to work with it.
38//!
39//! Since the file names are SHA-1 hashes, they consist of a somewhat evenly
40//! distributed sequence of the hexadecimal numbers 0 to f. So, by putting all
41//! files that start with 0 into a separate folder, and all that start with 1 into
42//! a different folder, and so on, I can already split the number of files per
43//! subdirectory by 16. If I repeat this step in each subdirectory, I can split
44//! them again by 16.
45//!
46//! This is one possible scenario where *File Declutter* really comes in handy.
47//!
48//! [dedupefs github]: https://github.com/FloGa/dedupefs
49//!
50//! ## Application
51//!
52//! ### Installation
53//!
54//! This tool can be installed easily through Cargo via `crates.io`:
55//!
56//! ```shell
57//! cargo install --locked file-declutter
58//! ```
59//!
60//! Please note that the `--locked` flag is necessary here to have the exact same
61//! dependencies as when the application was tagged and tested. Without it, you
62//! might get more up-to-date versions of dependencies, but you have the risk of
63//! undefined and unexpected behavior if the dependencies changed some
64//! functionalities. The application might even fail to build if the public API of
65//! a dependency changed too much.
66//!
67//! Alternatively, pre-built binaries can be downloaded from the [GitHub
68//! releases][gh-releases] page.
69//!
70//! [gh-releases]: https://github.com/FloGa/file-declutter/releases
71//!
72//! ### Usage
73//!
74//! ```text
75//! Usage: file-declutter [OPTIONS] <PATH>
76//!
77//! Arguments:
78//!   <PATH>  Directory to declutter
79//!
80//! Options:
81//!   -l, --levels <LEVELS>           Number of nested subdirectory levels [default: 3]
82//!   -r, --remove-empty-directories  Remove empty directories after moving files
83//!   -h, --help                      Print help
84//!   -V, --version                   Print version
85//! ```
86//!
87//! To declutter a directory into three levels, you would go with:
88//!
89//! ```shell
90//! file-declutter --levels 3 path/to/large-directory
91//! ```
92//!
93//! To "restore" the directory, or rather, flatten a list of files in
94//! subdirectories, you can use:
95//!
96//! ```shell
97//! file-declutter --levels 0 --remove-empty-directories path/to/decluttered-directory
98//! ```
99//!
100//! **Warning:** Please note that flattening a directory tree in this way will
101//! result in **data loss** when there are files with the same names! So if you
102//! have two files `dir/a/123.txt` and `dir/b/123.txt` and you run the above
103//! command over `dir`, you will end up with `dir/123.txt`, where the last file
104//! move has overwritten the previous ones. The file listing could be in arbitrary
105//! order, so you cannot even tell beforehand, which file will "win".
106//!
107//! ## Library
108//!
109//! ### Installation
110//!
111//! To add the `file-clutter` library to your project, you can use:
112//!
113//! ```shell
114//! cargo add file-declutter
115//! ```
116//!
117//! ### Usage
118//!
119//! The following is a short summary of how this library is intended to be used.
120//! The actual functionality lies in a custom Iterator object
121//! `FileDeclutterIterator`, but it is not intended to be instantiated directly.
122//! Instead, the wrapper `FileDeclutter` should be used.
123//!
124//! Once the Iterator is created, you can either iterate over its items to receive
125//! a list of `(source, target)` tuples, with `source` being the original filename
126//! and `target` the "decluttered" one. You can use this to do your own logic over
127//! them. Or you can use the `declutter_files` method to do the actual moving of
128//! the files.
129//!
130//! #### Create Iterator from Path
131//!
132//! This method can be used if you have an actual directory that you want to
133//! completely declutter recursively.
134//!
135//! ```rust no_run
136//! use std::path::PathBuf;
137//!
138//! fn main() {
139//!     let files_decluttered = file_declutter::FileDeclutter::new_from_path("/tmp/path")
140//!         .levels(1)
141//!         .collect::<Vec<_>>();
142//!
143//!     // If the specified directory contains the files 13.txt and 23.txt, the following tuples
144//!     // will be produced:
145//!     let files_expected = vec![
146//!         (PathBuf::from("13.txt"), PathBuf::from("1/13.txt")),
147//!         (PathBuf::from("23.txt"), PathBuf::from("2/23.txt")),
148//!     ];
149//!
150//!     assert_eq!(files_expected, files_decluttered);
151//! }
152//! ```
153//!
154//! #### Create Iterator from Existing Iterator
155//!
156//! This method can be used if you have a specific list of files you want to
157//! process.
158//!
159//! ```rust
160//! use std::path::PathBuf;
161//!
162//! fn main() {
163//!     let files = vec!["13.txt", "23.txt"];
164//!     let files_decluttered = file_declutter::FileDeclutter::new_from_iter(files.into_iter())
165//!         .levels(1)
166//!         .collect::<Vec<_>>();
167//!
168//!     let files_expected = vec![
169//!         (PathBuf::from("13.txt"), PathBuf::from("1/13.txt")),
170//!         (PathBuf::from("23.txt"), PathBuf::from("2/23.txt")),
171//!     ];
172//!
173//!     assert_eq!(files_expected, files_decluttered);
174//! }
175//! ```
176//!
177//! #### Oneshot
178//!
179//! This method can be used if you have a single file which you want to have a
180//! decluttered name for.
181//!
182//! ```rust
183//! use std::path::PathBuf;
184//!
185//! fn main() {
186//!     let file = "123456.txt";
187//!     let file_decluttered = file_declutter::FileDeclutter::oneshot(file, 3);
188//!
189//!     let file_expected = PathBuf::from("1/2/3/123456.txt");
190//!
191//!     assert_eq!(file_expected, file_decluttered);
192//! }
193//! ```
194
195use std::path::PathBuf;
196
197/// An iterator that transforms a list of file paths into (source, target) pairs, where the target
198/// path is a decluttered version based on the filename's characters.
199pub struct FileDeclutterIterator<I> {
200    inner: I,
201    base: PathBuf,
202    levels: usize,
203}
204
205impl<I> FileDeclutterIterator<I>
206where
207    I: Iterator<Item = PathBuf>,
208{
209    /// Sets the base directory into which files will be moved.
210    pub fn base<P: Into<PathBuf>>(mut self, base: P) -> Self {
211        self.base = base.into();
212        self
213    }
214
215    /// Sets the number of directory levels to create based on the filename.
216    ///
217    /// For example, with `levels = 2` and a file named `abcdef.txt`, the target path would include
218    /// two subdirectories: `a/b/abcdef.txt`.
219    pub fn levels(mut self, levels: usize) -> Self {
220        self.levels = levels;
221        self
222    }
223
224    /// Moves all files to their decluttered target paths.
225    ///
226    /// If `remove_empty_directories` is `true`, the function will attempt to remove any now-empty
227    /// directories after the move operation.
228    ///
229    /// # Errors
230    ///
231    /// Returns an error if directory creation or file renaming fails.
232    pub fn declutter_files(self, remove_empty_directories: bool) -> anyhow::Result<()> {
233        let base = self.base.clone();
234
235        for (source, target) in self {
236            std::fs::create_dir_all(&target.parent().unwrap())?;
237            std::fs::rename(source, target)?;
238        }
239
240        if remove_empty_directories {
241            for dir in walkdir::WalkDir::new(base)
242                .min_depth(1)
243                .contents_first(true)
244                .into_iter()
245                .filter_entry(|f| f.file_type().is_dir())
246                .flatten()
247            {
248                let dir = dir.into_path();
249
250                if dir.read_dir()?.count() == 0 {
251                    // Ignore result, it doesn't matter if deletion fails.
252                    let _ = std::fs::remove_dir(dir);
253                }
254            }
255        }
256
257        Ok(())
258    }
259}
260
261impl<I> Iterator for FileDeclutterIterator<I>
262where
263    I: Iterator<Item = PathBuf>,
264{
265    type Item = (PathBuf, PathBuf);
266
267    /// Returns the next `(source, target)` file path pair.
268    ///
269    /// The target path is derived from the file name by taking the first `levels` characters and
270    /// using them as nested directories.
271    fn next(&mut self) -> Option<Self::Item> {
272        self.inner.next().map(move |entry| {
273            let sub_dirs = entry.file_name().unwrap().to_string_lossy();
274            let sub_dirs = sub_dirs.chars().take(self.levels).map(String::from);
275
276            let mut target_path = self.base.clone();
277            for sub_dir in sub_dirs {
278                target_path.push(sub_dir);
279            }
280            target_path.push(entry.file_name().unwrap());
281
282            (entry, target_path)
283        })
284    }
285}
286
287/// Entry point for creating decluttering iterators or computing decluttered paths.
288pub struct FileDeclutter;
289
290impl FileDeclutter {
291    /// Creates a `FileDeclutterIterator` from an arbitrary iterator over file paths.
292    ///
293    /// # Examples
294    ///
295    /// ```rust
296    /// # use std::path::PathBuf;
297    /// let files = vec!["13.txt", "23.txt"];
298    /// let files_decluttered = file_declutter::FileDeclutter::new_from_iter(files.into_iter())
299    ///     .levels(1)
300    ///     .collect::<Vec<_>>();
301    ///
302    /// let files_expected = vec![
303    ///     (PathBuf::from("13.txt"), PathBuf::from("1/13.txt")),
304    ///     (PathBuf::from("23.txt"), PathBuf::from("2/23.txt")),
305    /// ];
306    ///
307    /// assert_eq!(files_expected, files_decluttered);
308    /// ```
309    pub fn new_from_iter(
310        iter: impl Iterator<Item = impl Into<PathBuf>>,
311    ) -> FileDeclutterIterator<impl Iterator<Item = PathBuf>> {
312        FileDeclutterIterator {
313            inner: iter.map(Into::into),
314            base: Default::default(),
315            levels: Default::default(),
316        }
317    }
318
319    /// Creates a `FileDeclutterIterator` by recursively collecting all files under a given
320    /// directory and setting this directory as the base.
321    ///
322    /// # Examples
323    ///
324    /// ```rust no_run
325    /// # use std::path::PathBuf;
326    /// let files_decluttered = file_declutter::FileDeclutter::new_from_path("/tmp/path")
327    ///     .levels(1)
328    ///     .collect::<Vec<_>>();
329    ///
330    /// // If the specified directory contains the files 13.txt and 23.txt, the following tuples
331    /// // will be produced:
332    /// let files_expected = vec![
333    ///     (PathBuf::from("13.txt"), PathBuf::from("1/13.txt")),
334    ///     (PathBuf::from("23.txt"), PathBuf::from("2/23.txt")),
335    /// ];
336    ///
337    /// assert_eq!(files_expected, files_decluttered);
338    /// ```
339    pub fn new_from_path(
340        base: impl Into<PathBuf>,
341    ) -> FileDeclutterIterator<impl Iterator<Item = PathBuf>> {
342        let base = base.into();
343
344        let iter = walkdir::WalkDir::new(&base)
345            .min_depth(1)
346            .into_iter()
347            .flatten()
348            .filter(|f| f.file_type().is_file())
349            .map(|entry| entry.into_path());
350
351        FileDeclutter::new_from_iter(iter).base(base)
352    }
353
354    /// Computes the decluttered path of a single file without moving it.
355    ///
356    /// # Arguments
357    ///
358    /// - `file`: Path to the input file.
359    /// - `levels`: Number of subdirectory levels to use.
360    ///
361    /// # Returns
362    ///
363    /// A `PathBuf` representing the target decluttered location.
364    ///
365    /// # Examples
366    ///
367    /// ```rust
368    /// # use std::path::PathBuf;
369    /// let file = "123456.txt";
370    /// let file_decluttered = file_declutter::FileDeclutter::oneshot(file, 3);
371    ///
372    /// let file_expected = PathBuf::from("1/2/3/123456.txt");
373    ///
374    /// assert_eq!(file_expected, file_decluttered);
375    /// ```
376    pub fn oneshot(file: impl Into<PathBuf>, levels: usize) -> PathBuf {
377        let iter = std::iter::once(file.into());
378        FileDeclutter::new_from_iter(iter)
379            .levels(levels)
380            .next()
381            .unwrap()
382            .1
383    }
384}
385
386#[cfg(test)]
387mod tests {
388    use assert_fs::TempDir;
389    use assert_fs::prelude::*;
390    use rand::Rng;
391
392    use super::*;
393
394    #[test]
395    fn decluttered_from_path_file_names_same() -> anyhow::Result<()> {
396        let temp_dir = TempDir::new()?;
397
398        let mut rng = rand::rng();
399        for _ in 0..100 {
400            let mut file_name = rng
401                .random_range(1_000_000_000u64..10_000_000_000u64)
402                .to_string();
403
404            if rng.random_bool(0.25) {
405                file_name = format!("subdir/{file_name}");
406            }
407
408            let child = temp_dir.child(file_name);
409            child.touch()?;
410        }
411
412        for (source, target) in FileDeclutter::new_from_path(temp_dir.to_path_buf()).levels(1) {
413            assert_ne!(source.parent(), target.parent());
414            assert_eq!(source.file_name(), target.file_name());
415        }
416
417        Ok(())
418    }
419
420    #[test]
421    fn decluttered_from_iter_file_names_same() -> anyhow::Result<()> {
422        let mut rng = rand::rng();
423        let files = (0..100).map(move |_| {
424            let mut file_name = rng
425                .random_range(1_000_000_000u64..10_000_000_000u64)
426                .to_string();
427
428            if rng.random_bool(0.25) {
429                file_name = format!("subdir/{file_name}");
430            }
431
432            file_name
433        });
434
435        for (source, target) in FileDeclutter::new_from_iter(files).levels(1) {
436            assert_ne!(source.parent(), target.parent());
437            assert_eq!(source.file_name(), target.file_name());
438        }
439
440        Ok(())
441    }
442
443    #[test]
444    fn oneshot() -> anyhow::Result<()> {
445        let source = PathBuf::from("123456");
446        let target_expected = PathBuf::from("1/2/3/123456");
447
448        let target_actual = FileDeclutter::oneshot(&source, 3);
449
450        assert_eq!(target_actual, target_expected);
451
452        Ok(())
453    }
454}