rocfl 1.7.0

A CLI for OCFL repositories
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
# rocfl

![build](https://github.com/pwinckles/rocfl/workflows/build/badge.svg)

`rocfl` is a command line utility for interacting with
[OCFL](https://ocfl.io/) repositories on the local filesystem or in
S3. Its goal is to provide a logical view of OCFL objects and make
them easy to interact with in a unix-like way.

`rocfl` supports OCFL spec versions 1.0 and 1.1.

![gif](./media/demo.gif)

## Extension Support

`rocfl` supports the following extensions:

- [0001-digest-algorithms]https://ocfl.github.io/extensions/0001-digest-algorithms.html
- [0002-flat-direct-storage-layout]https://ocfl.github.io/extensions/0002-flat-direct-storage-layout.html
- [0003-hash-and-id-n-tuple-storage-layout]https://ocfl.github.io/extensions/0003-hash-and-id-n-tuple-storage-layout.html
- [0004-hashed-n-tuple-storage-layout]https://ocfl.github.io/extensions/0004-hashed-n-tuple-storage-layout.html
- [0005-mutable-head]https://ocfl.github.io/extensions/0005-mutable-head.html:
  Only read is supported; not write.
- [0006-flat-omit-prefix-storage-layout]https://ocfl.github.io/extensions/0006-flat-omit-prefix-storage-layout.html
- [0007-n-tuple-omit-prefix-storage-layout]https://ocfl.github.io/extensions/0007-n-tuple-omit-prefix-storage-layout.html

Additionally, it uses the following extensions for write support that
have not been specified:

- `rocfl-staging`: By default, new object versions are staged in this
  extension's directory the contents of which are OCFL repository like.
- `rocfl-locks`: This extension contains object file locks that provide
  limited concurrent modification protection.

## Install

The [releases page](https://github.com/pwinckles/rocfl/releases) has
pre-built binaries that should work on most common OSes and
architectures. You do _not_ need to install Rust to use them.

1. Download and unzip the appropriate binary
2. Execute `./rocfl help` to verify it works

### Local Build

Alternatively, you can build `rocfl` from source as follows:

1. Install [Rust]https://www.rust-lang.org/tools/install, and make
   sure `cargo` is on your `PATH`
2. Execute: `cargo install rocfl`
3. Verify the install: `rocfl help`

If you want to build a binary that does not include the S3
integration, which adds a large number of dependencies, then you can
do so by running: `cargo install rocfl --no-default-features`.

## Configuration

`rocfl` supports optional configuration that makes it less verbose to
use. `rocfl` expects to find its configuration file in the following,
OS dependent location:

- **Linux**: `$HOME/.config/rocfl/config.toml`
- **Mac**: `$HOME/Library/Application
  Support/org.rocfl.rocfl/config.toml`
- **Windows**:
  `{FOLDERID_RoamingAppData}/rocfl/rocfl/config/config.toml`
  
You can easily edit the configuration by executing `rocfl config`,
which will create/open the config file for editing.

The config file may contain any number of sections structured as
follows:

``` toml
[repo-name]
# The name to attribute new OCFL versions to
author_name = "My Name"
# The URI address to associate with the above name
author_address = "mailto:name@example.com"
# The absolute path to the OCFL storage root
root = "/path/to/storage/root"
# The absolute path to the OCFL staging root
staging_root = "/path/to/staging/root"

# The next properties only apply when using S3
# The AWS region your bucket is in
region = "aws-region"
# The AWS credentials profile to load credentials from. If not specified, "default" is used.
profile = "aws-credentials-profile"
# The URL to the S3 endpoint. This is only needed if you are using a non-standard region
endpoint = "https://s3-endpoint"
# The S3 bucket the OCFL repository is in
bucket = "s3-bucket"
```

`repo-name` is the arbitrary name assigned to the configuration. This
is the value that you pass `rocfl` when invoking the `--name` option.

A special `[global]` section may be used to provide default values
across all configurations. This is primarily useful for defining
`author_name` and `author_address`.

All of these properties correspond to values that can be specified in
arguments to `rocfl`. `rocfl` resolves the configuration by first
loading the `global` config, then overlays the repository specific
config, and finally applies any values specified directly as command
line arguments.

Refer to the command line argument documentation for more information
on the configuration properties.

## Usage

The following is an overview of the features that `rocfl` supports.
For a detailed description of all of the options available, consult
the builtin help by executing `rocfl help` or `rocfl help <COMMAND>`.

### Read Commands

#### Validate

The `validate` command validates either an entire OCFL repository or
specific objects within the repository. Any issues that are identified
are reported with their corresponding [validation
code](https://ocfl.io/1.0/spec/validation-codes.html).

##### Examples

Validate an entire repository:

``` console
rocfl validate
```

Validate an entire repository without performing content fixity checks
and only reporting errors:

``` console
rocfl validate -n -l Error
```

Validate an entire repository and suppress warning `W004`:

``` console
rocfl validate -w W004
```

Validate a specific object:

``` console
rocfl validate urn:example:rocfl:object-1
```

Validate multiple objects at specific paths relative the repository
root:

``` console
rocfl validate -p object-1/ object-2/
```

#### List

The `ls` command either lists all of the objects in a repository or
list the files in an OCFL object.

`ls` will list objects as soon as they're found so long as it does not
need to sort the objects or display them in a formatted table. Object
lists are not sorted by default, and the formatted table can be
disabled with `-t`.

`rocfl` must scan the repository to locate objects. This can be slow
when operating on large repositories. The scan can be avoided when
listing an object's contents if the repository uses a supported
storage layout extension that is defined in the repository's
`ocfl_layout.json`

When listing files, only files in the most recent version are
returned. Previous versions can be queried with the `-v` option.

##### Examples

###### Listing Objects

The following command lists all of the object IDs in a repository
that's rooted in the current working directory:

```console
rocfl ls
```

This lists the same objects but with additional details, current
version and updated date:

```console
rocfl ls -l
```

Adding the `-p` flag additionally provides the path from the storage
root to the object:

```console
rocfl ls -lp
```

A subset of objects can be listed by providing a glob pattern to match
on:

```console
rocfl ls -lo '*object*'
```

###### Listing Object Contents

The contents of an object's current state are displayed by invoking
`ls` on a specific object ID:

```console
rocfl ls urn:example:rocfl:object-1
```

With the `-l` flag, additional details are displayed. In this case,
the version and date indicate when the individual file was last
updated:

```console
rocfl ls -l urn:example:rocfl:object-1
```

The `-p` flag can also be used here to display the paths to the
physical files on disk relative the storage root:

```console
rocfl ls -p urn:example:rocfl:object-1
```

The contents of previous versions are displayed by using the `-v`
option. The following command displays the files that were in the
first version of the object:

```console
rocfl ls -v1 urn:example:rocfl:object-1
```

An object's contents can be filtered by specifying a glob pattern to
match on:

```console
rocfl ls urn:example:rocfl:object-1 '*.txt'
```

The output is sorted by name by default, but can also be sorted
version or updated date:

```console
rocfl ls -l -s version urn:example:rocfl:object-1
```

Paths within in an object can be interpreted as containing logical
directories by using the `-D` flag. For example, the following will
list the logical files and logical directories that are direct
children of the logical directory `sub/dir`:

``` console
rocfl ls -D urn:example:rocfl:object-1 sub/dir
```

#### Log

The `log` command displays the version metadata for all versions of an
object. It can also be executed on a file within an object, in which
case only versions that affected the specified file are displayed.

##### Examples

Show all of the versions of an object in ascending order:

```console
rocfl log urn:example:rocfl:object-1
```

Only display the five most recent versions:

```console
rocfl log -rn5 urn:example:rocfl:object-1
```

Show all of the versions, but formatted so each version is on a single
line:

```console
rocfl log -c urn:example:rocfl:object-1
```

Show all of the versions that affected a specific file:

```console
rocfl log urn:example:rocfl:object-1 file1.txt
```

#### Show

The `show` command displays everything that changed in an object
within a specific version. If no version is specified, the most recent
changes are shown.

##### Examples

Show the changes in the most recent version:

```console
rocfl show urn:example:rocfl:object-1
```

Show the changes in the first version:

```console
rocfl show urn:example:rocfl:object-1 v1
```

Don't show the version metadata; only show the files that changed:

```console
rocfl show -m urn:example:rocfl:object-1
```

#### Diff

The `diff` command displays the files that changed between two
specific versions.

##### Example

Show the changes between the second and fourth versions:

```console
rocfl diff v2 v4
```

#### Cat

The `cat` command writes the contents of a file to `stdout`.

##### Examples

Display the contents of the head version of a file:

```console
rocfl cat urn:example:rocfl:object-1 file1.txt
```

Display the contents of a file from a specific version of the object:

```console
rocfl cat -v1 urn:example:rocfl:object-1 file1.txt
```

#### Status

The `status` command shows objects that have staged changes pending
commit, as well as what an object's pending changes are.

##### Examples

List all of the objects with staged changes:

``` console
rocfl status
```

List all of the file level changes to an object:

``` console
rocfl status urn:example:rocfl:object-1
```

Staged changes can also be examined using the more featureful `ls`,
`show`, and `cat` commands by adding the `-S` flag.

#### Info

The `info` command displays information, such as spec version and
configured extensions, for repositories and objects.

##### Examples

Show info for a repository:

``` console
rocfl info
```

Show info for an object:

``` console
rocfl info urn:example:rocfl:object-1
```

### Write Commands

`rocfl` supports updating OCFL objects by staging changes to objects
in a local staging repository. For filesystem based repositories, the
staging repository is located within the storage root's extensions
directory, and for S3 it's in the user's home application data.

The intended workflow is to accumulate a collection of updates to an
object, and then commit all of the changes to the object a single new
OCFL version.

#### Init

The `init` command creates new OCFL repositories. By default, the
latest OCFL spec version is used along with the
[0004-hashed-n-tuple-storage-layout](https://ocfl.github.io/extensions/0004-hashed-n-tuple-storage-layout.html)
storage layout.

##### Examples

Create a new repository using the default layout,
[0004-hashed-n-tuple-storage-layout](https://ocfl.github.io/extensions/0004-hashed-n-tuple-storage-layout.html):

``` console
rocfl -r /var/tmp/ocfl-repo-1 init
```

The default layout configuration can be changed by passing a config
file that contains the desired configuration:

``` console
rocfl -r /var/tmp/ocfl-repo-2 init -l 0003-hash-and-id-n-tuple-storage-layout -c
my-config.json
```

#### New

The `new` command stages new OCFL objects. New objects will not exist
in the main repository until they have been committed. When invoked
with no options, the object is created using all of the OCFL spec
recommend values.

##### Examples

Create a new object with non-standard settings:

``` console
rocfl new urn:example:rocfl:object-1 -d sha256 -c data -z 6
```

#### Copy

The `cp` command copies files from the local filesystem into a
staged object, or copies logical paths within an object to a new
location within the same object. This command attempts to mimic the
behavior of GNU `cp` as closely as possible.

##### Examples

Copy a directory into the object's root:

``` console
rocfl cp -r urn:example:rocfl:object-1 /path/to/src -- /
```

Copy several files into a logical directory within the object:

``` console
rocfl cp urn:example:rocfl:object-1 /path/to/files/* -- sub/dir
```

Copy several existing files internally to a new location:

``` console
rocfl cp -i urn:example:rocfl:object-1 'internal/*.txt' -- new-location
```

Copy an entire logical directory from an old version to a new location
in the staged version:

``` console
rocfl cp -ir -v2 urn:example:rocfl:object-1 src/dir -- dst/dir
```

#### Move

The `mv` command moves files from the local filesystem into a staged
object, or moves logical paths within an object to a new location
within the same object. This command attempts to mimic the behavior of
GNU `mv` as closely as possible.

##### Examples

Move a directory into the object's root:

``` console
rocfl mv urn:example:rocfl:object-1 /path/to/src -- /
```

Move several files into a logical directory within the object:

``` console
rocfl mv urn:example:rocfl:object-1 /path/to/files/* -- sub/dir
```

Move an existing file internally to a new location:

``` console
rocfl mv -i urn:example:rocfl:object-1 internal/file.txt -- new/location.txt
```

#### Remove

The `rm` command removes files from an object. If the removed files
were new to the staged version, then they are permanently removed and
will not appear anywhere in the object. Otherwise, references to the
files are removed from the staged version, but the files still exist
in prior versions.

##### Examples

Recursively remove a logical directory:

``` console
rocfl rm -r urn:example:rocfl:object-1 path/to/dir
```

Remove several individual files:

``` console
rocfl rm urn:example:rocfl:object-1 path/to/file1.txt path/to/file2.txt
```

Or with a glob:

``` console
rocfl rm urn:example:rocfl:object-1 'path/to/*.txt'
```

#### Reset

The `reset` command unstages changes made to an object. Additions are
removed and files that were deleted or modified are reverted to their
previous state.

##### Examples

Reset a file to its previous state:

``` console
rocfl reset urn:example:rocfl:object-1 file.txt
```

Reset an entire object to its previous state, removing all staged
changes:

``` console
rocfl reset urn:example:rocfl:object-1
```

#### Commit

The `commit` command moves an object's staged changes into the OCFL
object as a new version.

##### Examples

Commit changes to an object:

``` console
rocfl commit urn:example:rocfl:object-1 -n "My Name" -a "mailto:me@example.com" -m "commit
message"
```

This can be simplified if you define your name and address in [rocfl
configuration](#configuration). In which case, you can simply execute:

``` console
rocfl commit urn:example:rocfl:object-1 -m "commit message"
```

And your name and address will be automatically added to the version
metadata.

In order to commit an object to a repository without a defined storage
layout, the location to store the object with the repository must be
manually specified as follows:

``` console
rocfl commit urn:example:rocfl:object-1 -m "commit message" -r relative/path/to/object/root
```

#### Purge

The `purge` command permanently removes an object from the main OCFL
repository. This is **not** an operation that stages changes. However,
it will ask for confirmation before deleting an object.

#### Upgrade

The `upgrade` command upgrades an object or repository to a later OCFL
spec version. OCFL objects are not automatically upgraded when a
repository is upgraded, and must be upgraded individually. Upgrading
an OCFL object requires creating a new object version, and any changes
that are staged for an object are included in the upgrade version.

##### Examples

Upgrade a 1.0 repository to spec version 1.1:

``` console
rocfl upgrade -v 1.1
```

Upgrade an object to spec version 1.1:

``` console
rocfl upgrade urn:example:rocfl:object-1 -n "My Name" -a "mailto:me@example.com" -m "upgrade to 1.1"
```

## S3

### S3 Configuration

To connect to an OCFL repository in S3, you first need to create an
IAM user with access to the S3 bucket, and then setup a local
`~/.aws/credentials` file or environment variables as [described
here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
Credential profiles can be specified using the `--profile` option.
Then, when you invoke `rocfl` you must specify the bucket the
repository is in as well as the bucket region. For example:

```console
rocfl -R us-east-2 -b example-ocfl-repo ls
```

You can specify a sub directory, or prefix, that the repository is
rooted in within the bucket like this:

```console
rocfl -R us-east-2 -b example-ocfl-repo -r ocfl-root ls
```

Changes to objects are staged locally and are only pushed to S3 when
the staged version is committed. By default, changes are staged in the
following location:

- **Linux**: `$HOME/.local/share/rocfl/s3/staging/<sha256:BUCKET/ROOT>`
- **Mac**: `{FOLDERID_RoamingAppData}/org.rocfl.rocfl/data/s3/staging/<sha256:BUCKET/ROOT>`
- **Windows**: `$HOME/Library/Application Support/rocfl/rocfl/s3/staging/<sha256:BUCKET/ROOT>`

This location can be changed by setting the `--staging-root` option.

All of these properties can define defined in `rocfl`'s [config
file](#configuration), and activated by invoking `rocfl` using the
configuration `NAME` as follows:

``` console
rocfl -n NAME ls
```

### Important S3 Considerations

While `rocfl` supports all of the same operations on S3 as it does the
local filesystem, this does come with a couple of caveats:

1. Scanning for objects in S3 is very slow, and using a defined storage
   layout extension is key to improving performance.
2. `rocfl` does not provide any strong concurrency guarantees when
   modifying objects in S3. A file lock is used to guard changes in
   the staging repository, but there is an unchecked race condition if
   multiple processes attempt to commit changes to the same object
   from different staging locations.

## Roadmap

The following features are planned:

1. Export objects to a location outside the repository
2. Index objects when the storage layout is unknown
3. Unsafe mutating operations such as `squash`, `revert`, and
   `rewrite` that can be used to change an object's history