scfs 0.10.6

A convenient splitting and concatenating filesystem.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
//! # SCFS – SplitCatFS
//!
//! A convenient splitting and concatenating filesystem.
//!
//! ## Motivation
//!
//! ### History
//!
//! While setting up a cloud based backup and archive solution, I encountered the
//! following phenomenon: Many small files would get uploaded quite fast and –
//! depending on the actual cloud storage provider – highly concurrently, while
//! big files tend to slow down the whole process. The explanation is simple, many
//! cloud storage providers do not support concurrent or chunked uploads of a
//! single file, sometimes they would not even support resuming a partial upload.
//! You would need to upload it in one go, sequentially one byte at a time, it's
//! all or nothing.
//!
//! Now consider a scenario, where you upload a huge file, like a mirror of your
//! Raspberry Pi's SD card with the system and configuration on it. I have such a
//! file, it is about 4 GB big. Now, while backing up my system, this was the last
//! file to be uploaded. According to ETA calculations, it would have taken
//! several hours, so I let it run overnight. The next morning I found out that
//! after around 95% of upload process, my internet connection vanished for just a
//! few seconds, but long enough for the transfer tool to abort the upload. The
//! temporary file got deleted from the cloud storage, so I had to start from zero
//! again. Several hours of uploading wasted.
//!
//! I thought of a way to split big files, so that I can upload it more
//! efficiently, but I came to the conclusion, that manually splitting files,
//! uploading them, and deleting them afterwards locally, is not a very scalable
//! solution.
//!
//! So I came up with the idea of a special filesystem. A filesystem that would
//! present big files as if they were many small chunks in separate files. In
//! reality, the chunks would all point to the same physical file, only with
//! different offsets. This way I could upload chunked files in parallel without
//! losing too much progress, even if the upload gets aborted midway.
//!
//! *SplitFS* was born.
//!
//! If I download such chunked file parts, I would need to call `cat * >file`
//! afterwards to re-create the actual file. This seems like a similar hassle like
//! manually splitting files. That's why I had also *CatFS* in mind, when
//! developing SCFS. CatFS will concatenate chunked files transparently and
//! present them as complete files again.
//!
//! ### Why Rust?
//!
//! I am relatively new to Rust and I thought, the best way to deepen my
//! understanding with Rust is to take on a project that would require dedication
//! and a certain knowledge of the language.
//!
//! ## Installation
//!
//! SCFS can be installed easily through Cargo via `crates.io`:
//!
//! ```shell script
//! cargo install scfs
//! ```
//!
//! ## Usage
//!
//! ```text
//! Usage: scfs <COMMAND>
//!
//! Commands:
//!   split  Create a splitting file system
//!   cat    Create a concatenating file system
//!   help   Print this message or the help of the given subcommand(s)
//!
//! Options:
//!   -h, --help     Print help
//!   -V, --version  Print version
//! ```
//!
//! ### SplitFS
//!
//! ```text
//! Usage: scfs split [OPTIONS] <MIRROR> <MOUNTPOINT> [-- <FUSE_OPTIONS_EXTRA>...]
//!
//! Arguments:
//!   <MIRROR>                 Defines the directory that will be mirrored
//!   <MOUNTPOINT>             Defines the mountpoint, where the mirror will be accessible
//!   [FUSE_OPTIONS_EXTRA]...  Additional options, which are passed down to FUSE
//!
//! Options:
//!   -b, --blocksize <BLOCKSIZE>        Sets the desired blocksize [default: 2097152]
//!   -o, --fuse-options <FUSE_OPTIONS>  Additional options, which are passed down to FUSE
//!   -d, --daemon                       Run program in background
//!       --mkdir                        Create mountpoint directory if it does not exist already
//!   -h, --help                         Print help
//!   -V, --version                      Print version
//! ```
//!
//! To mount a directory with SplitFS, use the following form:
//!
//! ```shell script
//! scfs split <base directory> <mount point>
//! ```
//!
//! This can be simplified by using the dedicated `splitfs` binary:
//!
//! ```shell script
//! splitfs <base directory> <mount point>
//! ```
//!
//! The directory specified as `mount point` will now reflect the content of `base
//! directory`, replacing each regular file with a directory that contains
//! enumerated chunks of that file as separate files.
//!
//! It is possible to use a custom block size for the file fragments. For example,
//! to use 1&nbsp;MB chunks instead of the default size of 2&nbsp;MB, you would go
//! with:
//!
//! ```shell script
//! splitfs --blocksize=1048576 <base directory> <mount point>
//! ```
//!
//! Where 1048576 is 1024 * 1024, so one megabyte in bytes.
//!
//! You can even leverage the calculating power of your Shell, like for example in
//! Bash:
//!
//! ```shell script
//! splitfs --blocksize=$((1024 * 1024)) <base directory> <mount point>
//! ```
//!
//! New since v0.9.0: The block size may now also be given with a symbolic
//! quantifier. Allowed quantifiers are "K", "M", "G", and "T", each one
//! multiplying the base with 1024. So, to set the block size to 1&nbsp;MB like in
//! the example above, you can now use:
//!
//! ```shell script
//! splitfs --blocksize=1M <base directory> <mount point>
//! ```
//!
//! You can actually go as far as to set a block size of one byte, but be prepared
//! for a ridiculous amount of overhead or maybe even a system freeze because the
//! metadata table grows too large.
//!
//! ### CatFS
//!
//! ```text
//! Usage: scfs cat [OPTIONS] <MIRROR> <MOUNTPOINT> [-- <FUSE_OPTIONS_EXTRA>...]
//!
//! Arguments:
//!   <MIRROR>                 Defines the directory that will be mirrored
//!   <MOUNTPOINT>             Defines the mountpoint, where the mirror will be accessible
//!   [FUSE_OPTIONS_EXTRA]...  Additional options, which are passed down to FUSE
//!
//! Options:
//!   -o, --fuse-options <FUSE_OPTIONS>  Additional options, which are passed down to FUSE
//!   -d, --daemon                       Run program in background
//!       --mkdir                        Create mountpoint directory if it does not exist already
//!   -h, --help                         Print help
//!   -V, --version                      Print version
//! ```
//!
//! To mount a directory with CatFS, use the following form:
//!
//! ```shell script
//! scfs cat <base directory> <mount point>
//! ```
//!
//! This can be simplified by using the dedicated `catfs` binary:
//!
//! ```shell script
//! catfs <base directory> <mount point>
//! ```
//!
//! Please note that `base directory` needs to be a directory structure that has
//! been generated by SplitFS. CatFS will refuse mounting the directory otherwise.
//!
//! The directory specified as `mount point` will now reflect the content of `base
//! directory`, replacing each directory with chunked files in it as single files.
//!
//! ### Additional FUSE mount options
//!
//! It is possible to pass additional mount options to the underlying FUSE
//! library.
//!
//! SCFS supports two ways of specifying options, either via the "-o" option, or
//! via additional arguments after a "--" separator. This is in accordance to
//! other FUSE based filesystems like EncFS.
//!
//! These two calls are equivalent:
//!
//! ```shell script
//! scfs split -o nonempty mirror mountpoint
//! scfs split mirror mountpoint -- nonempty
//! ```
//!
//! Of course, these methods also work in the `splitfs` and `catfs` binaries.
//!
//! ### Daemon mode
//!
//! Originally, SCFS was meant to be run in the foreground. This proved to be
//! annoying if one wants to use the same terminal for further work. Granted, one
//! could always use features of their Shell to send the process to the
//! background, but then you have a background process that might accidentally be
//! killed if the user closes terminal. Furthermore, SCFS originally did not
//! terminate cleanly if the user unmounted it by external means.
//!
//! Since v0.9.0, SCFS natively supports daemon mode, in that the program changes
//! its working directory to `"/"` and then forks itself into a true daemon
//! process, independent of the running terminal.
//!
//! ```shell script
//! splitfs --daemon mirror mountpoint
//! ```
//!
//! Note that `mirror` and `mountpoint` are resolved *before* changing the working
//! directory, so they can still be given relative to the current working
//! directory.
//!
//! To unmount, `fusermount` can be used:
//!
//! ```shell script
//! fusermount -u mountpoint
//! ```
//!
//! ## Limitations
//!
//! I consider this project no longer a "raw prototype", and I am eating my own
//! dog food, meaning I use it in my own backup strategies and create features
//! based on my personal needs.
//!
//! However, this might not meet the needs of the typical user and without
//! feedback I might not even think of some scenarios to begin with.
//!
//! Specifically, these are the current limitations of SCFS:
//!
//! -   It should work an all UNIX based systems, like Linux and maybe some MacOS
//!     versions, however without MacOS specific file attributes. But definitely
//!     not on Windows, since this would need special handling of system calls,
//!     which I haven't had time to take care of yet.
//!
//! -   It can only work with directories, regular files, and symlinks. Every
//!     other file types (device files, pipes, and so on) will be silently
//!     ignored.
//!
//! -   The base directory will be mounted read-only in the new mount point, and
//!     SCFS expects that the base directory will not be altered while mounted.

use std::ffi::{OsStr, OsString};
use std::fs;
use std::fs::Metadata;
use std::os::unix::ffi::OsStrExt;
use std::os::unix::ffi::OsStringExt;
use std::os::unix::fs::MetadataExt;
use std::path::Path;
use std::time::{Duration, SystemTime, UNIX_EPOCH};

use fuser::{BackgroundSession, FileAttr, FileType, Filesystem, MountOption};
use rusqlite::Row;
use serde::{Deserialize, Serialize};

pub use cli::Cli;

pub(crate) use catfs::CatFS;
pub(crate) use shared::Shared;
pub(crate) use splitfs::SplitFS;

mod catfs;
mod cli;
mod shared;
mod splitfs;

const TTL: Duration = Duration::from_secs(60 * 60 * 24);

const STMT_CREATE: &str = "
    CREATE TABLE Files (
        ino INTEGER PRIMARY KEY,
        parent_ino INTEGER,
        path TEXT UNIQUE,
        file_name TEXT,
        part INTEGER,
        vdir INTEGER,
        symlink INTEGER
    )
";
const STMT_CREATE_INDEX_PARENT_INO_FILE_NAME: &str = "
    CREATE INDEX idx_parent_ino_file_name
    ON Files (parent_ino, file_name)
";
const STMT_INSERT: &str = "
    INSERT INTO Files (ino, parent_ino, path, file_name, part, vdir, symlink)
    VALUES (?, ?, ?, ?, ?, ?, ?)
";
const STMT_QUERY_BY_INO: &str = "
    SELECT *
    FROM Files
    WHERE ino = ?
";
const STMT_QUERY_BY_PARENT_INO: &str = "
    SELECT *
    FROM Files
    WHERE parent_ino = ?
    LIMIT -1 OFFSET ?
";
const STMT_QUERY_BY_PARENT_INO_AND_FILENAME: &str = "
    SELECT *
    FROM Files
    WHERE parent_ino = ?
    AND file_name = ?
";

const CONFIG_FILE_NAME: &str = ".scfs_config";

const CONFIG_DEFAULT_BLOCKSIZE: u64 = 2 * 1024 * 1024;

const INO_OUTSIDE: u64 = 0;
const INO_ROOT: u64 = 1;
const INO_CONFIG: u64 = 2;

const INO_FIRST_FREE: u64 = 10;

type DropHookFn = Box<dyn Fn() + Send + 'static>;

fn system_time_from_time(secs: i64, nsecs: i64) -> SystemTime {
    if secs >= 0 {
        UNIX_EPOCH + Duration::new(secs as u64, nsecs as u32)
    } else {
        UNIX_EPOCH - Duration::new((-secs) as u64, nsecs as u32)
    }
}

fn convert_filetype(ft: fs::FileType) -> Option<FileType> {
    if ft.is_dir() {
        Some(FileType::Directory)
    } else if ft.is_file() {
        Some(FileType::RegularFile)
    } else if ft.is_symlink() {
        Some(FileType::Symlink)
    } else {
        None
    }
}

fn convert_metadata_to_attr(meta: Metadata, ino: Option<u64>) -> FileAttr {
    FileAttr {
        ino: if let Some(ino) = ino { ino } else { meta.ino() },
        size: meta.size(),
        blocks: meta.blocks(),
        atime: system_time_from_time(meta.atime(), meta.atime_nsec()),
        mtime: system_time_from_time(meta.mtime(), meta.mtime_nsec()),
        ctime: system_time_from_time(meta.ctime(), meta.ctime_nsec()),
        crtime: meta.created().unwrap_or(system_time_from_time(0, 0)),
        kind: convert_filetype(meta.file_type()).expect("Filetype not supported"),
        perm: meta.mode() as u16,
        nlink: meta.nlink() as u32,
        uid: meta.uid(),
        gid: meta.gid(),
        rdev: meta.rdev() as u32,
        blksize: meta.blksize() as u32,
        flags: 0,
    }
}

// Copied from fuser, mount_options.rs. When this becomes part of their public API, delete this function.
fn mount_option_from_str(s: &str) -> MountOption {
    match s {
        "auto_unmount" => MountOption::AutoUnmount,
        "allow_other" => MountOption::AllowOther,
        "allow_root" => MountOption::AllowRoot,
        "default_permissions" => MountOption::DefaultPermissions,
        "dev" => MountOption::Dev,
        "nodev" => MountOption::NoDev,
        "suid" => MountOption::Suid,
        "nosuid" => MountOption::NoSuid,
        "ro" => MountOption::RO,
        "rw" => MountOption::RW,
        "exec" => MountOption::Exec,
        "noexec" => MountOption::NoExec,
        "atime" => MountOption::Atime,
        "noatime" => MountOption::NoAtime,
        "dirsync" => MountOption::DirSync,
        "sync" => MountOption::Sync,
        "async" => MountOption::Async,
        x if x.starts_with("fsname=") => MountOption::FSName(x[7..].into()),
        x if x.starts_with("subtype=") => MountOption::Subtype(x[8..].into()),
        x => MountOption::CUSTOM(x.into()),
    }
}

fn mount<'a, 'b, FS, P, I>(filesystem: FS, mountpoint: &P, fuse_options: I) -> BackgroundSession
where
    FS: Filesystem + Send + 'static + 'a,
    P: AsRef<Path>,
    I: IntoIterator<Item = &'b OsStr>,
{
    let fuse_options = fuse_options
        .into_iter()
        .map(|x| mount_option_from_str(x.to_str().unwrap()));

    let options = {
        let mut options = vec![MountOption::RO, MountOption::FSName(String::from("scfs"))];
        for opt in fuse_options {
            options.push(opt);
        }
        options
    };

    fuser::spawn_mount2(filesystem, &mountpoint, options.as_ref()).unwrap()
}

struct FileHandle {
    file: OsString,
    start: u64,
    end: u64,
}

#[derive(Clone, Debug, Default, Eq, PartialEq)]
struct FileInfo {
    ino: u64,
    parent_ino: u64,
    path: OsString,
    file_name: OsString,
    part: u64,
    vdir: bool,
    symlink: bool,
}

impl FileInfo {
    fn with_ino(ino: u64) -> Self {
        FileInfo {
            ino,
            parent_ino: Default::default(),
            path: Default::default(),
            file_name: Default::default(),
            part: 0,
            vdir: false,
            symlink: false,
        }
    }

    fn with_parent_ino(parent_ino: u64) -> Self {
        FileInfo {
            ino: Default::default(),
            parent_ino,
            path: Default::default(),
            file_name: Default::default(),
            part: 0,
            vdir: false,
            symlink: false,
        }
    }

    fn file_name<S: Into<OsString>>(mut self, file_name: S) -> Self {
        self.file_name = file_name.into();
        self
    }

    fn into_file_info_row(self) -> FileInfoRow {
        FileInfoRow::from(self)
    }
}

impl From<&Row<'_>> for FileInfo {
    fn from(row: &Row) -> Self {
        FileInfoRow::from(row).into()
    }
}

#[derive(Debug)]
struct FileInfoRow {
    ino: i64,
    parent_ino: i64,
    path: Vec<u8>,
    file_name: Vec<u8>,
    part: i64,
    vdir: bool,
    symlink: bool,
}

impl From<&Row<'_>> for FileInfoRow {
    fn from(row: &Row) -> Self {
        FileInfoRow {
            ino: row.get_unwrap(0),
            parent_ino: row.get_unwrap(1),
            path: row.get_unwrap(2),
            file_name: row.get_unwrap(3),
            part: row.get_unwrap(4),
            vdir: row.get_unwrap(5),
            symlink: row.get_unwrap(6),
        }
    }
}

impl From<FileInfoRow> for FileInfo {
    fn from(f: FileInfoRow) -> Self {
        FileInfo {
            ino: f.ino as u64,
            parent_ino: f.parent_ino as u64,
            path: OsString::from_vec(f.path),
            file_name: OsString::from_vec(f.file_name),
            part: f.part as u64,
            vdir: f.vdir,
            symlink: f.symlink,
        }
    }
}

impl From<FileInfo> for FileInfoRow {
    fn from(f: FileInfo) -> Self {
        FileInfoRow {
            ino: f.ino as i64,
            parent_ino: f.parent_ino as i64,
            path: f.path.as_bytes().to_vec(),
            file_name: f.file_name.as_bytes().to_vec(),
            part: f.part as i64,
            vdir: f.vdir,
            symlink: f.symlink,
        }
    }
}

#[derive(Clone, Copy, Debug, Deserialize, Serialize)]
struct Config {
    blocksize: u64,
}

impl Config {
    fn blocksize(mut self, blocksize: u64) -> Self {
        self.blocksize = blocksize;
        self
    }
}

impl Default for Config {
    fn default() -> Self {
        Config {
            blocksize: CONFIG_DEFAULT_BLOCKSIZE,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn convert_fileinfo_to_fileinforow_and_back() {
        let file_info = FileInfo {
            ino: 7,
            parent_ino: 3,
            path: OsString::from("File.Name"),
            file_name: OsString::from("/path/to/files/File.Name"),
            part: 5,
            vdir: true,
            symlink: false,
        };

        let file_info_row = FileInfoRow::from(file_info.clone());

        assert_eq!(file_info, file_info_row.into());
    }
}