scfs/lib.rs
1//! # SCFS – SplitCatFS
2//!
3//! A convenient splitting and concatenating filesystem.
4//!
5//! ## Motivation
6//!
7//! ### History
8//!
9//! While setting up a cloud based backup and archive solution, I encountered the
10//! following phenomenon: Many small files would get uploaded quite fast and –
11//! depending on the actual cloud storage provider – highly concurrently, while
12//! big files tend to slow down the whole process. The explanation is simple, many
13//! cloud storage providers do not support concurrent or chunked uploads of a
14//! single file, sometimes they would not even support resuming a partial upload.
15//! You would need to upload it in one go, sequentially one byte at a time, it's
16//! all or nothing.
17//!
18//! Now consider a scenario, where you upload a huge file, like a mirror of your
19//! Raspberry Pi's SD card with the system and configuration on it. I have such a
20//! file, it is about 4 GB big. Now, while backing up my system, this was the last
21//! file to be uploaded. According to ETA calculations, it would have taken
22//! several hours, so I let it run overnight. The next morning I found out that
23//! after around 95% of upload process, my internet connection vanished for just a
24//! few seconds, but long enough for the transfer tool to abort the upload. The
25//! temporary file got deleted from the cloud storage, so I had to start from zero
26//! again. Several hours of uploading wasted.
27//!
28//! I thought of a way to split big files, so that I can upload it more
29//! efficiently, but I came to the conclusion, that manually splitting files,
30//! uploading them, and deleting them afterwards locally, is not a very scalable
31//! solution.
32//!
33//! So I came up with the idea of a special filesystem. A filesystem that would
34//! present big files as if they were many small chunks in separate files. In
35//! reality, the chunks would all point to the same physical file, only with
36//! different offsets. This way I could upload chunked files in parallel without
37//! losing too much progress, even if the upload gets aborted midway.
38//!
39//! *SplitFS* was born.
40//!
41//! If I download such chunked file parts, I would need to call `cat * >file`
42//! afterwards to re-create the actual file. This seems like a similar hassle like
43//! manually splitting files. That's why I had also *CatFS* in mind, when
44//! developing SCFS. CatFS will concatenate chunked files transparently and
45//! present them as complete files again.
46//!
47//! ### Why Rust?
48//!
49//! I am relatively new to Rust and I thought, the best way to deepen my
50//! understanding with Rust is to take on a project that would require dedication
51//! and a certain knowledge of the language.
52//!
53//! ## Installation
54//!
55//! SCFS can be installed easily through Cargo via `crates.io`:
56//!
57//! ```shell script
58//! cargo install scfs
59//! ```
60//!
61//! ## Usage
62//!
63//! ```text
64//! Usage: scfs <COMMAND>
65//!
66//! Commands:
67//! split Create a splitting file system
68//! cat Create a concatenating file system
69//! help Print this message or the help of the given subcommand(s)
70//!
71//! Options:
72//! -h, --help Print help
73//! -V, --version Print version
74//! ```
75//!
76//! ### SplitFS
77//!
78//! ```text
79//! Usage: scfs split [OPTIONS] <MIRROR> <MOUNTPOINT> [-- <FUSE_OPTIONS_EXTRA>...]
80//!
81//! Arguments:
82//! <MIRROR> Defines the directory that will be mirrored
83//! <MOUNTPOINT> Defines the mountpoint, where the mirror will be accessible
84//! [FUSE_OPTIONS_EXTRA]... Additional options, which are passed down to FUSE
85//!
86//! Options:
87//! -b, --blocksize <BLOCKSIZE> Sets the desired blocksize [default: 2097152]
88//! -o, --fuse-options <FUSE_OPTIONS> Additional options, which are passed down to FUSE
89//! -d, --daemon Run program in background
90//! --mkdir Create mountpoint directory if it does not exist already
91//! -h, --help Print help
92//! -V, --version Print version
93//! ```
94//!
95//! To mount a directory with SplitFS, use the following form:
96//!
97//! ```shell script
98//! scfs split <base directory> <mount point>
99//! ```
100//!
101//! This can be simplified by using the dedicated `splitfs` binary:
102//!
103//! ```shell script
104//! splitfs <base directory> <mount point>
105//! ```
106//!
107//! The directory specified as `mount point` will now reflect the content of `base
108//! directory`, replacing each regular file with a directory that contains
109//! enumerated chunks of that file as separate files.
110//!
111//! It is possible to use a custom block size for the file fragments. For example,
112//! to use 1 MB chunks instead of the default size of 2 MB, you would go
113//! with:
114//!
115//! ```shell script
116//! splitfs --blocksize=1048576 <base directory> <mount point>
117//! ```
118//!
119//! Where 1048576 is 1024 * 1024, so one megabyte in bytes.
120//!
121//! You can even leverage the calculating power of your Shell, like for example in
122//! Bash:
123//!
124//! ```shell script
125//! splitfs --blocksize=$((1024 * 1024)) <base directory> <mount point>
126//! ```
127//!
128//! New since v0.9.0: The block size may now also be given with a symbolic
129//! quantifier. Allowed quantifiers are "K", "M", "G", and "T", each one
130//! multiplying the base with 1024. So, to set the block size to 1 MB like in
131//! the example above, you can now use:
132//!
133//! ```shell script
134//! splitfs --blocksize=1M <base directory> <mount point>
135//! ```
136//!
137//! You can actually go as far as to set a block size of one byte, but be prepared
138//! for a ridiculous amount of overhead or maybe even a system freeze because the
139//! metadata table grows too large.
140//!
141//! ### CatFS
142//!
143//! ```text
144//! Usage: scfs cat [OPTIONS] <MIRROR> <MOUNTPOINT> [-- <FUSE_OPTIONS_EXTRA>...]
145//!
146//! Arguments:
147//! <MIRROR> Defines the directory that will be mirrored
148//! <MOUNTPOINT> Defines the mountpoint, where the mirror will be accessible
149//! [FUSE_OPTIONS_EXTRA]... Additional options, which are passed down to FUSE
150//!
151//! Options:
152//! -o, --fuse-options <FUSE_OPTIONS> Additional options, which are passed down to FUSE
153//! -d, --daemon Run program in background
154//! --mkdir Create mountpoint directory if it does not exist already
155//! -h, --help Print help
156//! -V, --version Print version
157//! ```
158//!
159//! To mount a directory with CatFS, use the following form:
160//!
161//! ```shell script
162//! scfs cat <base directory> <mount point>
163//! ```
164//!
165//! This can be simplified by using the dedicated `catfs` binary:
166//!
167//! ```shell script
168//! catfs <base directory> <mount point>
169//! ```
170//!
171//! Please note that `base directory` needs to be a directory structure that has
172//! been generated by SplitFS. CatFS will refuse mounting the directory otherwise.
173//!
174//! The directory specified as `mount point` will now reflect the content of `base
175//! directory`, replacing each directory with chunked files in it as single files.
176//!
177//! ### Additional FUSE mount options
178//!
179//! It is possible to pass additional mount options to the underlying FUSE
180//! library.
181//!
182//! SCFS supports two ways of specifying options, either via the "-o" option, or
183//! via additional arguments after a "--" separator. This is in accordance to
184//! other FUSE based filesystems like EncFS.
185//!
186//! These two calls are equivalent:
187//!
188//! ```shell script
189//! scfs split -o nonempty mirror mountpoint
190//! scfs split mirror mountpoint -- nonempty
191//! ```
192//!
193//! Of course, these methods also work in the `splitfs` and `catfs` binaries.
194//!
195//! ### Daemon mode
196//!
197//! Originally, SCFS was meant to be run in the foreground. This proved to be
198//! annoying if one wants to use the same terminal for further work. Granted, one
199//! could always use features of their Shell to send the process to the
200//! background, but then you have a background process that might accidentally be
201//! killed if the user closes terminal. Furthermore, SCFS originally did not
202//! terminate cleanly if the user unmounted it by external means.
203//!
204//! Since v0.9.0, SCFS natively supports daemon mode, in that the program changes
205//! its working directory to `"/"` and then forks itself into a true daemon
206//! process, independent of the running terminal.
207//!
208//! ```shell script
209//! splitfs --daemon mirror mountpoint
210//! ```
211//!
212//! Note that `mirror` and `mountpoint` are resolved *before* changing the working
213//! directory, so they can still be given relative to the current working
214//! directory.
215//!
216//! To unmount, `fusermount` can be used:
217//!
218//! ```shell script
219//! fusermount -u mountpoint
220//! ```
221//!
222//! ## Limitations
223//!
224//! I consider this project no longer a "raw prototype", and I am eating my own
225//! dog food, meaning I use it in my own backup strategies and create features
226//! based on my personal needs.
227//!
228//! However, this might not meet the needs of the typical user and without
229//! feedback I might not even think of some scenarios to begin with.
230//!
231//! Specifically, these are the current limitations of SCFS:
232//!
233//! - It should work an all UNIX based systems, like Linux and maybe some MacOS
234//! versions, however without MacOS specific file attributes. But definitely
235//! not on Windows, since this would need special handling of system calls,
236//! which I haven't had time to take care of yet.
237//!
238//! - It can only work with directories, regular files, and symlinks. Every
239//! other file types (device files, pipes, and so on) will be silently
240//! ignored.
241//!
242//! - The base directory will be mounted read-only in the new mount point, and
243//! SCFS expects that the base directory will not be altered while mounted.
244
245use std::ffi::{OsStr, OsString};
246use std::fs;
247use std::fs::Metadata;
248use std::os::unix::ffi::OsStrExt;
249use std::os::unix::ffi::OsStringExt;
250use std::os::unix::fs::MetadataExt;
251use std::path::Path;
252use std::time::{Duration, SystemTime, UNIX_EPOCH};
253
254use fuser::{BackgroundSession, FileAttr, FileType, Filesystem, MountOption};
255use rusqlite::Row;
256use serde::{Deserialize, Serialize};
257
258pub use cli::Cli;
259
260pub(crate) use catfs::CatFS;
261pub(crate) use shared::Shared;
262pub(crate) use splitfs::SplitFS;
263
264mod catfs;
265mod cli;
266mod shared;
267mod splitfs;
268
269const TTL: Duration = Duration::from_secs(60 * 60 * 24);
270
271const STMT_CREATE: &str = "
272 CREATE TABLE Files (
273 ino INTEGER PRIMARY KEY,
274 parent_ino INTEGER,
275 path TEXT UNIQUE,
276 file_name TEXT,
277 part INTEGER,
278 vdir INTEGER,
279 symlink INTEGER
280 )
281";
282const STMT_CREATE_INDEX_PARENT_INO_FILE_NAME: &str = "
283 CREATE INDEX idx_parent_ino_file_name
284 ON Files (parent_ino, file_name)
285";
286const STMT_INSERT: &str = "
287 INSERT INTO Files (ino, parent_ino, path, file_name, part, vdir, symlink)
288 VALUES (?, ?, ?, ?, ?, ?, ?)
289";
290const STMT_QUERY_BY_INO: &str = "
291 SELECT *
292 FROM Files
293 WHERE ino = ?
294";
295const STMT_QUERY_BY_PARENT_INO: &str = "
296 SELECT *
297 FROM Files
298 WHERE parent_ino = ?
299 LIMIT -1 OFFSET ?
300";
301const STMT_QUERY_BY_PARENT_INO_AND_FILENAME: &str = "
302 SELECT *
303 FROM Files
304 WHERE parent_ino = ?
305 AND file_name = ?
306";
307
308const CONFIG_FILE_NAME: &str = ".scfs_config";
309
310const CONFIG_DEFAULT_BLOCKSIZE: u64 = 2 * 1024 * 1024;
311
312const INO_OUTSIDE: u64 = 0;
313const INO_ROOT: u64 = 1;
314const INO_CONFIG: u64 = 2;
315
316const INO_FIRST_FREE: u64 = 10;
317
318type DropHookFn = Box<dyn Fn() + Send + 'static>;
319
320fn system_time_from_time(secs: i64, nsecs: i64) -> SystemTime {
321 if secs >= 0 {
322 UNIX_EPOCH + Duration::new(secs as u64, nsecs as u32)
323 } else {
324 UNIX_EPOCH - Duration::new((-secs) as u64, nsecs as u32)
325 }
326}
327
328fn convert_filetype(ft: fs::FileType) -> Option<FileType> {
329 if ft.is_dir() {
330 Some(FileType::Directory)
331 } else if ft.is_file() {
332 Some(FileType::RegularFile)
333 } else if ft.is_symlink() {
334 Some(FileType::Symlink)
335 } else {
336 None
337 }
338}
339
340fn convert_metadata_to_attr(meta: Metadata, ino: Option<u64>) -> FileAttr {
341 FileAttr {
342 ino: if let Some(ino) = ino { ino } else { meta.ino() },
343 size: meta.size(),
344 blocks: meta.blocks(),
345 atime: system_time_from_time(meta.atime(), meta.atime_nsec()),
346 mtime: system_time_from_time(meta.mtime(), meta.mtime_nsec()),
347 ctime: system_time_from_time(meta.ctime(), meta.ctime_nsec()),
348 crtime: meta.created().unwrap_or(system_time_from_time(0, 0)),
349 kind: convert_filetype(meta.file_type()).expect("Filetype not supported"),
350 perm: meta.mode() as u16,
351 nlink: meta.nlink() as u32,
352 uid: meta.uid(),
353 gid: meta.gid(),
354 rdev: meta.rdev() as u32,
355 blksize: meta.blksize() as u32,
356 flags: 0,
357 }
358}
359
360// Copied from fuser, mount_options.rs. When this becomes part of their public API, delete this function.
361fn mount_option_from_str(s: &str) -> MountOption {
362 match s {
363 "auto_unmount" => MountOption::AutoUnmount,
364 "allow_other" => MountOption::AllowOther,
365 "allow_root" => MountOption::AllowRoot,
366 "default_permissions" => MountOption::DefaultPermissions,
367 "dev" => MountOption::Dev,
368 "nodev" => MountOption::NoDev,
369 "suid" => MountOption::Suid,
370 "nosuid" => MountOption::NoSuid,
371 "ro" => MountOption::RO,
372 "rw" => MountOption::RW,
373 "exec" => MountOption::Exec,
374 "noexec" => MountOption::NoExec,
375 "atime" => MountOption::Atime,
376 "noatime" => MountOption::NoAtime,
377 "dirsync" => MountOption::DirSync,
378 "sync" => MountOption::Sync,
379 "async" => MountOption::Async,
380 x if x.starts_with("fsname=") => MountOption::FSName(x[7..].into()),
381 x if x.starts_with("subtype=") => MountOption::Subtype(x[8..].into()),
382 x => MountOption::CUSTOM(x.into()),
383 }
384}
385
386fn mount<'a, 'b, FS, P, I>(filesystem: FS, mountpoint: &P, fuse_options: I) -> BackgroundSession
387where
388 FS: Filesystem + Send + 'static + 'a,
389 P: AsRef<Path>,
390 I: IntoIterator<Item = &'b OsStr>,
391{
392 let fuse_options = fuse_options
393 .into_iter()
394 .map(|x| mount_option_from_str(x.to_str().unwrap()));
395
396 let options = {
397 let mut options = vec![MountOption::RO, MountOption::FSName(String::from("scfs"))];
398 for opt in fuse_options {
399 options.push(opt);
400 }
401 options
402 };
403
404 fuser::spawn_mount2(filesystem, &mountpoint, options.as_ref()).unwrap()
405}
406
407struct FileHandle {
408 file: OsString,
409 start: u64,
410 end: u64,
411}
412
413#[derive(Clone, Debug, Default, Eq, PartialEq)]
414struct FileInfo {
415 ino: u64,
416 parent_ino: u64,
417 path: OsString,
418 file_name: OsString,
419 part: u64,
420 vdir: bool,
421 symlink: bool,
422}
423
424impl FileInfo {
425 fn with_ino(ino: u64) -> Self {
426 FileInfo {
427 ino,
428 parent_ino: Default::default(),
429 path: Default::default(),
430 file_name: Default::default(),
431 part: 0,
432 vdir: false,
433 symlink: false,
434 }
435 }
436
437 fn with_parent_ino(parent_ino: u64) -> Self {
438 FileInfo {
439 ino: Default::default(),
440 parent_ino,
441 path: Default::default(),
442 file_name: Default::default(),
443 part: 0,
444 vdir: false,
445 symlink: false,
446 }
447 }
448
449 fn file_name<S: Into<OsString>>(mut self, file_name: S) -> Self {
450 self.file_name = file_name.into();
451 self
452 }
453
454 fn into_file_info_row(self) -> FileInfoRow {
455 FileInfoRow::from(self)
456 }
457}
458
459impl From<&Row<'_>> for FileInfo {
460 fn from(row: &Row) -> Self {
461 FileInfoRow::from(row).into()
462 }
463}
464
465#[derive(Debug)]
466struct FileInfoRow {
467 ino: i64,
468 parent_ino: i64,
469 path: Vec<u8>,
470 file_name: Vec<u8>,
471 part: i64,
472 vdir: bool,
473 symlink: bool,
474}
475
476impl From<&Row<'_>> for FileInfoRow {
477 fn from(row: &Row) -> Self {
478 FileInfoRow {
479 ino: row.get_unwrap(0),
480 parent_ino: row.get_unwrap(1),
481 path: row.get_unwrap(2),
482 file_name: row.get_unwrap(3),
483 part: row.get_unwrap(4),
484 vdir: row.get_unwrap(5),
485 symlink: row.get_unwrap(6),
486 }
487 }
488}
489
490impl From<FileInfoRow> for FileInfo {
491 fn from(f: FileInfoRow) -> Self {
492 FileInfo {
493 ino: f.ino as u64,
494 parent_ino: f.parent_ino as u64,
495 path: OsString::from_vec(f.path),
496 file_name: OsString::from_vec(f.file_name),
497 part: f.part as u64,
498 vdir: f.vdir,
499 symlink: f.symlink,
500 }
501 }
502}
503
504impl From<FileInfo> for FileInfoRow {
505 fn from(f: FileInfo) -> Self {
506 FileInfoRow {
507 ino: f.ino as i64,
508 parent_ino: f.parent_ino as i64,
509 path: f.path.as_bytes().to_vec(),
510 file_name: f.file_name.as_bytes().to_vec(),
511 part: f.part as i64,
512 vdir: f.vdir,
513 symlink: f.symlink,
514 }
515 }
516}
517
518#[derive(Clone, Copy, Debug, Deserialize, Serialize)]
519struct Config {
520 blocksize: u64,
521}
522
523impl Config {
524 fn blocksize(mut self, blocksize: u64) -> Self {
525 self.blocksize = blocksize;
526 self
527 }
528}
529
530impl Default for Config {
531 fn default() -> Self {
532 Config {
533 blocksize: CONFIG_DEFAULT_BLOCKSIZE,
534 }
535 }
536}
537
538#[cfg(test)]
539mod tests {
540 use super::*;
541
542 #[test]
543 fn convert_fileinfo_to_fileinforow_and_back() {
544 let file_info = FileInfo {
545 ino: 7,
546 parent_ino: 3,
547 path: OsString::from("File.Name"),
548 file_name: OsString::from("/path/to/files/File.Name"),
549 part: 5,
550 vdir: true,
551 symlink: false,
552 };
553
554 let file_info_row = FileInfoRow::from(file_info.clone());
555
556 assert_eq!(file_info, file_info_row.into());
557 }
558}