scfs/
lib.rs

1//! # SCFS – SplitCatFS
2//!
3//! A convenient splitting and concatenating filesystem.
4//!
5//! ## Motivation
6//!
7//! ### History
8//!
9//! While setting up a cloud based backup and archive solution, I encountered the
10//! following phenomenon: Many small files would get uploaded quite fast and –
11//! depending on the actual cloud storage provider – highly concurrently, while
12//! big files tend to slow down the whole process. The explanation is simple, many
13//! cloud storage providers do not support concurrent or chunked uploads of a
14//! single file, sometimes they would not even support resuming a partial upload.
15//! You would need to upload it in one go, sequentially one byte at a time, it's
16//! all or nothing.
17//!
18//! Now consider a scenario, where you upload a huge file, like a mirror of your
19//! Raspberry Pi's SD card with the system and configuration on it. I have such a
20//! file, it is about 4 GB big. Now, while backing up my system, this was the last
21//! file to be uploaded. According to ETA calculations, it would have taken
22//! several hours, so I let it run overnight. The next morning I found out that
23//! after around 95% of upload process, my internet connection vanished for just a
24//! few seconds, but long enough for the transfer tool to abort the upload. The
25//! temporary file got deleted from the cloud storage, so I had to start from zero
26//! again. Several hours of uploading wasted.
27//!
28//! I thought of a way to split big files, so that I can upload it more
29//! efficiently, but I came to the conclusion, that manually splitting files,
30//! uploading them, and deleting them afterwards locally, is not a very scalable
31//! solution.
32//!
33//! So I came up with the idea of a special filesystem. A filesystem that would
34//! present big files as if they were many small chunks in separate files. In
35//! reality, the chunks would all point to the same physical file, only with
36//! different offsets. This way I could upload chunked files in parallel without
37//! losing too much progress, even if the upload gets aborted midway.
38//!
39//! *SplitFS* was born.
40//!
41//! If I download such chunked file parts, I would need to call `cat * >file`
42//! afterwards to re-create the actual file. This seems like a similar hassle like
43//! manually splitting files. That's why I had also *CatFS* in mind, when
44//! developing SCFS. CatFS will concatenate chunked files transparently and
45//! present them as complete files again.
46//!
47//! ### Why Rust?
48//!
49//! I am relatively new to Rust and I thought, the best way to deepen my
50//! understanding with Rust is to take on a project that would require dedication
51//! and a certain knowledge of the language.
52//!
53//! ## Installation
54//!
55//! SCFS can be installed easily through Cargo via `crates.io`:
56//!
57//! ```shell script
58//! cargo install scfs
59//! ```
60//!
61//! ## Usage
62//!
63//! ```text
64//! Usage: scfs <COMMAND>
65//!
66//! Commands:
67//!   split  Create a splitting file system
68//!   cat    Create a concatenating file system
69//!   help   Print this message or the help of the given subcommand(s)
70//!
71//! Options:
72//!   -h, --help     Print help
73//!   -V, --version  Print version
74//! ```
75//!
76//! ### SplitFS
77//!
78//! ```text
79//! Usage: scfs split [OPTIONS] <MIRROR> <MOUNTPOINT> [-- <FUSE_OPTIONS_EXTRA>...]
80//!
81//! Arguments:
82//!   <MIRROR>                 Defines the directory that will be mirrored
83//!   <MOUNTPOINT>             Defines the mountpoint, where the mirror will be accessible
84//!   [FUSE_OPTIONS_EXTRA]...  Additional options, which are passed down to FUSE
85//!
86//! Options:
87//!   -b, --blocksize <BLOCKSIZE>        Sets the desired blocksize [default: 2097152]
88//!   -o, --fuse-options <FUSE_OPTIONS>  Additional options, which are passed down to FUSE
89//!   -d, --daemon                       Run program in background
90//!       --mkdir                        Create mountpoint directory if it does not exist already
91//!   -h, --help                         Print help
92//!   -V, --version                      Print version
93//! ```
94//!
95//! To mount a directory with SplitFS, use the following form:
96//!
97//! ```shell script
98//! scfs split <base directory> <mount point>
99//! ```
100//!
101//! This can be simplified by using the dedicated `splitfs` binary:
102//!
103//! ```shell script
104//! splitfs <base directory> <mount point>
105//! ```
106//!
107//! The directory specified as `mount point` will now reflect the content of `base
108//! directory`, replacing each regular file with a directory that contains
109//! enumerated chunks of that file as separate files.
110//!
111//! It is possible to use a custom block size for the file fragments. For example,
112//! to use 1&nbsp;MB chunks instead of the default size of 2&nbsp;MB, you would go
113//! with:
114//!
115//! ```shell script
116//! splitfs --blocksize=1048576 <base directory> <mount point>
117//! ```
118//!
119//! Where 1048576 is 1024 * 1024, so one megabyte in bytes.
120//!
121//! You can even leverage the calculating power of your Shell, like for example in
122//! Bash:
123//!
124//! ```shell script
125//! splitfs --blocksize=$((1024 * 1024)) <base directory> <mount point>
126//! ```
127//!
128//! New since v0.9.0: The block size may now also be given with a symbolic
129//! quantifier. Allowed quantifiers are "K", "M", "G", and "T", each one
130//! multiplying the base with 1024. So, to set the block size to 1&nbsp;MB like in
131//! the example above, you can now use:
132//!
133//! ```shell script
134//! splitfs --blocksize=1M <base directory> <mount point>
135//! ```
136//!
137//! You can actually go as far as to set a block size of one byte, but be prepared
138//! for a ridiculous amount of overhead or maybe even a system freeze because the
139//! metadata table grows too large.
140//!
141//! ### CatFS
142//!
143//! ```text
144//! Usage: scfs cat [OPTIONS] <MIRROR> <MOUNTPOINT> [-- <FUSE_OPTIONS_EXTRA>...]
145//!
146//! Arguments:
147//!   <MIRROR>                 Defines the directory that will be mirrored
148//!   <MOUNTPOINT>             Defines the mountpoint, where the mirror will be accessible
149//!   [FUSE_OPTIONS_EXTRA]...  Additional options, which are passed down to FUSE
150//!
151//! Options:
152//!   -o, --fuse-options <FUSE_OPTIONS>  Additional options, which are passed down to FUSE
153//!   -d, --daemon                       Run program in background
154//!       --mkdir                        Create mountpoint directory if it does not exist already
155//!   -h, --help                         Print help
156//!   -V, --version                      Print version
157//! ```
158//!
159//! To mount a directory with CatFS, use the following form:
160//!
161//! ```shell script
162//! scfs cat <base directory> <mount point>
163//! ```
164//!
165//! This can be simplified by using the dedicated `catfs` binary:
166//!
167//! ```shell script
168//! catfs <base directory> <mount point>
169//! ```
170//!
171//! Please note that `base directory` needs to be a directory structure that has
172//! been generated by SplitFS. CatFS will refuse mounting the directory otherwise.
173//!
174//! The directory specified as `mount point` will now reflect the content of `base
175//! directory`, replacing each directory with chunked files in it as single files.
176//!
177//! ### Additional FUSE mount options
178//!
179//! It is possible to pass additional mount options to the underlying FUSE
180//! library.
181//!
182//! SCFS supports two ways of specifying options, either via the "-o" option, or
183//! via additional arguments after a "--" separator. This is in accordance to
184//! other FUSE based filesystems like EncFS.
185//!
186//! These two calls are equivalent:
187//!
188//! ```shell script
189//! scfs split -o nonempty mirror mountpoint
190//! scfs split mirror mountpoint -- nonempty
191//! ```
192//!
193//! Of course, these methods also work in the `splitfs` and `catfs` binaries.
194//!
195//! ### Daemon mode
196//!
197//! Originally, SCFS was meant to be run in the foreground. This proved to be
198//! annoying if one wants to use the same terminal for further work. Granted, one
199//! could always use features of their Shell to send the process to the
200//! background, but then you have a background process that might accidentally be
201//! killed if the user closes terminal. Furthermore, SCFS originally did not
202//! terminate cleanly if the user unmounted it by external means.
203//!
204//! Since v0.9.0, SCFS natively supports daemon mode, in that the program changes
205//! its working directory to `"/"` and then forks itself into a true daemon
206//! process, independent of the running terminal.
207//!
208//! ```shell script
209//! splitfs --daemon mirror mountpoint
210//! ```
211//!
212//! Note that `mirror` and `mountpoint` are resolved *before* changing the working
213//! directory, so they can still be given relative to the current working
214//! directory.
215//!
216//! To unmount, `fusermount` can be used:
217//!
218//! ```shell script
219//! fusermount -u mountpoint
220//! ```
221//!
222//! ## Limitations
223//!
224//! I consider this project no longer a "raw prototype", and I am eating my own
225//! dog food, meaning I use it in my own backup strategies and create features
226//! based on my personal needs.
227//!
228//! However, this might not meet the needs of the typical user and without
229//! feedback I might not even think of some scenarios to begin with.
230//!
231//! Specifically, these are the current limitations of SCFS:
232//!
233//! -   It should work an all UNIX based systems, like Linux and maybe some MacOS
234//!     versions, however without MacOS specific file attributes. But definitely
235//!     not on Windows, since this would need special handling of system calls,
236//!     which I haven't had time to take care of yet.
237//!
238//! -   It can only work with directories, regular files, and symlinks. Every
239//!     other file types (device files, pipes, and so on) will be silently
240//!     ignored.
241//!
242//! -   The base directory will be mounted read-only in the new mount point, and
243//!     SCFS expects that the base directory will not be altered while mounted.
244
245use std::ffi::{OsStr, OsString};
246use std::fs;
247use std::fs::Metadata;
248use std::os::unix::ffi::OsStrExt;
249use std::os::unix::ffi::OsStringExt;
250use std::os::unix::fs::MetadataExt;
251use std::path::Path;
252use std::time::{Duration, SystemTime, UNIX_EPOCH};
253
254use fuser::{BackgroundSession, FileAttr, FileType, Filesystem, MountOption};
255use rusqlite::Row;
256use serde::{Deserialize, Serialize};
257
258pub use cli::Cli;
259
260pub(crate) use catfs::CatFS;
261pub(crate) use shared::Shared;
262pub(crate) use splitfs::SplitFS;
263
264mod catfs;
265mod cli;
266mod shared;
267mod splitfs;
268
269const TTL: Duration = Duration::from_secs(60 * 60 * 24);
270
271const STMT_CREATE: &str = "
272    CREATE TABLE Files (
273        ino INTEGER PRIMARY KEY,
274        parent_ino INTEGER,
275        path TEXT UNIQUE,
276        file_name TEXT,
277        part INTEGER,
278        vdir INTEGER,
279        symlink INTEGER
280    )
281";
282const STMT_CREATE_INDEX_PARENT_INO_FILE_NAME: &str = "
283    CREATE INDEX idx_parent_ino_file_name
284    ON Files (parent_ino, file_name)
285";
286const STMT_INSERT: &str = "
287    INSERT INTO Files (ino, parent_ino, path, file_name, part, vdir, symlink)
288    VALUES (?, ?, ?, ?, ?, ?, ?)
289";
290const STMT_QUERY_BY_INO: &str = "
291    SELECT *
292    FROM Files
293    WHERE ino = ?
294";
295const STMT_QUERY_BY_PARENT_INO: &str = "
296    SELECT *
297    FROM Files
298    WHERE parent_ino = ?
299    LIMIT -1 OFFSET ?
300";
301const STMT_QUERY_BY_PARENT_INO_AND_FILENAME: &str = "
302    SELECT *
303    FROM Files
304    WHERE parent_ino = ?
305    AND file_name = ?
306";
307
308const CONFIG_FILE_NAME: &str = ".scfs_config";
309
310const CONFIG_DEFAULT_BLOCKSIZE: u64 = 2 * 1024 * 1024;
311
312const INO_OUTSIDE: u64 = 0;
313const INO_ROOT: u64 = 1;
314const INO_CONFIG: u64 = 2;
315
316const INO_FIRST_FREE: u64 = 10;
317
318type DropHookFn = Box<dyn Fn() + Send + 'static>;
319
320fn system_time_from_time(secs: i64, nsecs: i64) -> SystemTime {
321    if secs >= 0 {
322        UNIX_EPOCH + Duration::new(secs as u64, nsecs as u32)
323    } else {
324        UNIX_EPOCH - Duration::new((-secs) as u64, nsecs as u32)
325    }
326}
327
328fn convert_filetype(ft: fs::FileType) -> Option<FileType> {
329    if ft.is_dir() {
330        Some(FileType::Directory)
331    } else if ft.is_file() {
332        Some(FileType::RegularFile)
333    } else if ft.is_symlink() {
334        Some(FileType::Symlink)
335    } else {
336        None
337    }
338}
339
340fn convert_metadata_to_attr(meta: Metadata, ino: Option<u64>) -> FileAttr {
341    FileAttr {
342        ino: if let Some(ino) = ino { ino } else { meta.ino() },
343        size: meta.size(),
344        blocks: meta.blocks(),
345        atime: system_time_from_time(meta.atime(), meta.atime_nsec()),
346        mtime: system_time_from_time(meta.mtime(), meta.mtime_nsec()),
347        ctime: system_time_from_time(meta.ctime(), meta.ctime_nsec()),
348        crtime: meta.created().unwrap_or(system_time_from_time(0, 0)),
349        kind: convert_filetype(meta.file_type()).expect("Filetype not supported"),
350        perm: meta.mode() as u16,
351        nlink: meta.nlink() as u32,
352        uid: meta.uid(),
353        gid: meta.gid(),
354        rdev: meta.rdev() as u32,
355        blksize: meta.blksize() as u32,
356        flags: 0,
357    }
358}
359
360// Copied from fuser, mount_options.rs. When this becomes part of their public API, delete this function.
361fn mount_option_from_str(s: &str) -> MountOption {
362    match s {
363        "auto_unmount" => MountOption::AutoUnmount,
364        "allow_other" => MountOption::AllowOther,
365        "allow_root" => MountOption::AllowRoot,
366        "default_permissions" => MountOption::DefaultPermissions,
367        "dev" => MountOption::Dev,
368        "nodev" => MountOption::NoDev,
369        "suid" => MountOption::Suid,
370        "nosuid" => MountOption::NoSuid,
371        "ro" => MountOption::RO,
372        "rw" => MountOption::RW,
373        "exec" => MountOption::Exec,
374        "noexec" => MountOption::NoExec,
375        "atime" => MountOption::Atime,
376        "noatime" => MountOption::NoAtime,
377        "dirsync" => MountOption::DirSync,
378        "sync" => MountOption::Sync,
379        "async" => MountOption::Async,
380        x if x.starts_with("fsname=") => MountOption::FSName(x[7..].into()),
381        x if x.starts_with("subtype=") => MountOption::Subtype(x[8..].into()),
382        x => MountOption::CUSTOM(x.into()),
383    }
384}
385
386fn mount<'a, 'b, FS, P, I>(filesystem: FS, mountpoint: &P, fuse_options: I) -> BackgroundSession
387where
388    FS: Filesystem + Send + 'static + 'a,
389    P: AsRef<Path>,
390    I: IntoIterator<Item = &'b OsStr>,
391{
392    let fuse_options = fuse_options
393        .into_iter()
394        .map(|x| mount_option_from_str(x.to_str().unwrap()));
395
396    let options = {
397        let mut options = vec![MountOption::RO, MountOption::FSName(String::from("scfs"))];
398        for opt in fuse_options {
399            options.push(opt);
400        }
401        options
402    };
403
404    fuser::spawn_mount2(filesystem, &mountpoint, options.as_ref()).unwrap()
405}
406
407struct FileHandle {
408    file: OsString,
409    start: u64,
410    end: u64,
411}
412
413#[derive(Clone, Debug, Default, Eq, PartialEq)]
414struct FileInfo {
415    ino: u64,
416    parent_ino: u64,
417    path: OsString,
418    file_name: OsString,
419    part: u64,
420    vdir: bool,
421    symlink: bool,
422}
423
424impl FileInfo {
425    fn with_ino(ino: u64) -> Self {
426        FileInfo {
427            ino,
428            parent_ino: Default::default(),
429            path: Default::default(),
430            file_name: Default::default(),
431            part: 0,
432            vdir: false,
433            symlink: false,
434        }
435    }
436
437    fn with_parent_ino(parent_ino: u64) -> Self {
438        FileInfo {
439            ino: Default::default(),
440            parent_ino,
441            path: Default::default(),
442            file_name: Default::default(),
443            part: 0,
444            vdir: false,
445            symlink: false,
446        }
447    }
448
449    fn file_name<S: Into<OsString>>(mut self, file_name: S) -> Self {
450        self.file_name = file_name.into();
451        self
452    }
453
454    fn into_file_info_row(self) -> FileInfoRow {
455        FileInfoRow::from(self)
456    }
457}
458
459impl From<&Row<'_>> for FileInfo {
460    fn from(row: &Row) -> Self {
461        FileInfoRow::from(row).into()
462    }
463}
464
465#[derive(Debug)]
466struct FileInfoRow {
467    ino: i64,
468    parent_ino: i64,
469    path: Vec<u8>,
470    file_name: Vec<u8>,
471    part: i64,
472    vdir: bool,
473    symlink: bool,
474}
475
476impl From<&Row<'_>> for FileInfoRow {
477    fn from(row: &Row) -> Self {
478        FileInfoRow {
479            ino: row.get_unwrap(0),
480            parent_ino: row.get_unwrap(1),
481            path: row.get_unwrap(2),
482            file_name: row.get_unwrap(3),
483            part: row.get_unwrap(4),
484            vdir: row.get_unwrap(5),
485            symlink: row.get_unwrap(6),
486        }
487    }
488}
489
490impl From<FileInfoRow> for FileInfo {
491    fn from(f: FileInfoRow) -> Self {
492        FileInfo {
493            ino: f.ino as u64,
494            parent_ino: f.parent_ino as u64,
495            path: OsString::from_vec(f.path),
496            file_name: OsString::from_vec(f.file_name),
497            part: f.part as u64,
498            vdir: f.vdir,
499            symlink: f.symlink,
500        }
501    }
502}
503
504impl From<FileInfo> for FileInfoRow {
505    fn from(f: FileInfo) -> Self {
506        FileInfoRow {
507            ino: f.ino as i64,
508            parent_ino: f.parent_ino as i64,
509            path: f.path.as_bytes().to_vec(),
510            file_name: f.file_name.as_bytes().to_vec(),
511            part: f.part as i64,
512            vdir: f.vdir,
513            symlink: f.symlink,
514        }
515    }
516}
517
518#[derive(Clone, Copy, Debug, Deserialize, Serialize)]
519struct Config {
520    blocksize: u64,
521}
522
523impl Config {
524    fn blocksize(mut self, blocksize: u64) -> Self {
525        self.blocksize = blocksize;
526        self
527    }
528}
529
530impl Default for Config {
531    fn default() -> Self {
532        Config {
533            blocksize: CONFIG_DEFAULT_BLOCKSIZE,
534        }
535    }
536}
537
538#[cfg(test)]
539mod tests {
540    use super::*;
541
542    #[test]
543    fn convert_fileinfo_to_fileinforow_and_back() {
544        let file_info = FileInfo {
545            ino: 7,
546            parent_ino: 3,
547            path: OsString::from("File.Name"),
548            file_name: OsString::from("/path/to/files/File.Name"),
549            part: 5,
550            vdir: true,
551            symlink: false,
552        };
553
554        let file_info_row = FileInfoRow::from(file_info.clone());
555
556        assert_eq!(file_info, file_info_row.into());
557    }
558}