Crate usv

Source
Expand description

§Unicode Separated Values (USV) ™

Unicode Separated Values (USV) ™ is a data format that uses Unicode characters for markup.

This USV crate implements the USV specification: https://github.com/sixarm/usv.

This USV crate and aims to help developers build new USV applications, tools, and workflows.

§USV characters

Separators:

  • File Separator (FS) is U+001C or U+241C ␜

  • Group Separator (GS) is U+001D or U+241D ␝

  • Record Separator (RS) is U+001E or U+241E ␞

  • Unit Separator (US) is U+001F or U+241F ␟

Modifiers:

  • Escape (ESC) is U+001B or U+241B ␛

  • End of Transmission (EOT) is U+0004 or U+2404 ␄

§Units

use usv::*;
let str = "a␟b␟";
let units: Units = str.units().collect();
assert_eq!(units, ["a", "b"]);
assert_eq!(units.into_usv_string(), str);

§Records

use usv::*;
let str = "a␟b␟␞c␟d␟␞";
let records: Records = str.records().collect();
assert_eq!(records, [["a", "b"],["c", "d"]]);
assert_eq!(records.into_usv_string(), str);

§Groups

use usv::*;
let str = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝";
let groups: Groups = str.groups().collect();
assert_eq!(groups, [[["a", "b"],["c", "d"]],[["e", "f"],["g", "h"]]]);
assert_eq!(groups.into_usv_string(), str);

§Files

use usv::*;
let str = "a␟b␟␞c␟d␟␞␝e␟f␟␞g␟h␟␞␝␜i␟j␟␞k␟l␟␞␝m␟n␟␞o␟p␟␞␝␜";
let files: Files = str.files().collect();
assert_eq!(files, [[[["a", "b"],["c", "d"]],[["e", "f"],["g", "h"]]],[[["i", "j"],["k", "l"]],[["m", "n"],["o", "p"]]]]);
assert_eq!(files.into_usv_string(), str);

§Architecture

The architecture of this crate looks like this, in order of importance:

  • lib.rs: the library entry point.

  • constants.rs: constants for USV characters.

  • token.rs: the USV Token enumerator for returning parser results.

  • iter/: iterators for units, records, groups, files, tokens.

  • style/: style sets of characters for symbols, controls, braces.

  • layout/: layout formats for lines, visual displays, and editors.

  • from/: convert from one thing into another thing.

  • into_usv_string: trait and impl to convert from data into a usv string.

  • examples.rs: data strings suitable for demos and tests.

  • str_ext.rs: string extension traits for parsing USV.

  • svec.rs: a simple macro for creating string vectors.

  • bench/: benchmark tests; this is work in progress.

  • tests/: integration tests placeholder; not needed yet.

§Token

A token is the underlying USV enumeration for parsing a string to output:

pub enum Token {
    Unit(String),
    UnitSeparator,
    RecordSeparator,
    GroupSeparator,
    FileSeparator,
    EndOfTransmission,
}

§Type aliases

  • Token = described above

  • Tokens = Vec

  • Unit = String

  • Units = Vec

  • Record = Units

  • Records = Vec

  • Group = Records

  • Groups = Vec

  • File = Groups

  • Files = Vec

The USV project aims to become a free open source IETF standard and IANA standard, much like the standards for CSV and TDF.

Until the standardization happens, the terms “Unicode Separated Values” and “USV” are both trademarks of this project. This repository is copyright 2022-2024. The trademarks and copyrights are by Joel Parker Henderson, me, an individual, not a company.

When IETF and IANA approve the submissions as a standard, then the trademarks and copyright will go to a free libre open source software advocacy foundation. We welcome advice about how to do this well.

§Conclusion

USV is helping us with data projects. We hope USV may help you too.

We welcome constructive feedback about USV, as well as git issues, pull requests, and standardization help.

Re-exports§

pub use token::Token;
pub use str_ext::StrExt;
pub use style::Style;
pub use constants::*;
pub use examples::*;
pub use from::*;
pub use into_usv_string::*;
pub use layout::*;
pub use svec::*;

Modules§

constants
examples
Examples of USV strings with styles and units. These can be useful for demos and tests.
from
Convert USV data from various representations into other various representations.
into_usv_string
Convert USV data from various representations into other various representations.
iter
layout
USV layout is the terminology for how items are displayed.
str_ext
style
USV style is the terminology for how marks are displayed.
svec
token
Token is an enumeration of the USV data structures.

Macros§

svec
svec! makes a string vector from an array of &str.

Type Aliases§

File
Files
Group
Groups
Record
Records
Tokens
Unit
Units