[−][src]Module no_proto::schema

Schemas are JSON used to declare the shape of buffer objects

No Proto Schemas are JSON objects that describe how the data in a buffer is stored and what types of data is stored. Schemas are required to create buffers and each buffer is a descendant of the schema that created it.

Buffers are forever related to the schema that created them, buffers created from a given schema can only later be decoded, edited or compacted by that same schema.

Schemas are validated and sanity checked upon creation. You cannot pass an invalid schema into a factory constructor and build/parse buffers with it.

Properties that are not part of the schema are ignored.

If you're familiar with Typescript, schemas can be described by this recursive interface:

interface NP_Schema {
    // table, string, bytes, etc
    type: string; 
     
    // used by string & bytes types
    size?: number;
     
    // used by decimal type, the number of decimal places every value has
    exp?: number;
     
    // used by tuple to indicite bytewise sorting of children
    sorted?: boolean;
     
    // used by list types
    of?: NP_Schema
     
    // used by map types
    value?: NP_Schema
 
    // used by tuple types
    values?: NP_Schema[]
 
    // used by table types
    columns?: [string, NP_Schema][]
 
    // used by option/enum types
    choices?: string[];
 
    // default value for this item
    default?: any;
}

Schemas can be as simple as a single scalar type, for example a perfectly valid schema for a buffer that contains only a string:

{
    "type": "string"
}

However, you will likely want to store more complicated objects, so that's easy to do as well.

{
    "type": "table",
    "columns": [
        ["userID",   {"type": "string"}], // userID column contains a string
        ["password", {"type": "string"}], // password column contains a string
        ["email",    {"type": "string"}], // email column contains a string
        ["age",      {"type": "u8"}]     // age column contains a Uint8 number (0 - 255)
    ]
}

There are multiple collection types, and they can be nested.

For example, this is a list of tables. Every item in the list is a table with two columns: id and title. Both columns are a string type.

{
    "type": "list",
    "of": {
        "type": "table",
        "columns": [
            ["id",    {type: "string"}]
            ["title", {type: "string"}]
        ]
    }
}

You can nest collections as much and however you'd like. Nesting is only limited by the address space of the buffer, so go crazy.

A list of strings is just as easy...

{
    "type": "list",
    "of": { type: "string" }
}

Each type has trade offs associated with it. The table and documentation below go into further detail.

Supported Data Types

Type	Rust Type / Struct	Bytewise Sorting	Bytes (Size)	Limits / Notes
`table`	`NP_Table`	𐄂	2 bytes - ~4GB	Linked list with indexed keys that map against up to 255 named columns.
`list`	`NP_List`	𐄂	4 bytes - ~4GB	Linked list with integer indexed values and up to 65,535 items.
`map`	`NP_Map`	𐄂	2 bytes - ~4GB	Linked list with `Vec<u8>` keys.
`tuple`	`NP_Tuple`	✓ *	2 bytes - ~4GB	Static sized collection of specific values.
`any`	`NP_Any`	𐄂	2 bytes - ~4GB	Generic type.
`string`	`String`	✓ **	2 bytes - ~4GB	Utf-8 formatted string.
`bytes`	`NP_Bytes`	✓ **	2 bytes - ~4GB	Arbitrary bytes.
`int8`	`i8`	✓	1 byte	-127 to 127
`int16`	`i16`	✓	2 bytes	-32,768 to 32,768
`int32`	`i32`	✓	4 bytes	-2,147,483,648 to 2,147,483,648
`int64`	`i64`	✓	8 bytes	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,808
`uint8`	`u8`	✓	1 byte	0 - 255
`uint16`	`u16`	✓	2 bytes	0 - 65,535
`uint32`	`u32`	✓	4 bytes	0 - 4,294,967,295
`uint64`	`u64`	✓	8 bytes	0 - 18,446,744,073,709,551,616
`float`	`f32`	𐄂	4 bytes	-3.4e38 to 3.4e38
`double`	`f64`	𐄂	8 bytes	-1.7e308 to 1.7e308
`option`	`NP_Enum`	✓	1 byte	Up to 255 string based options in schema.
`bool`	`bool`	✓	1 byte
`decimal`	`NP_Dec`	✓	8 bytes	Fixed point decimal number based on i64.
`geo4`	`NP_Geo`	✓	4 bytes	1.1km resolution (city) geographic coordinate
`geo8`	`NP_Geo`	✓	8 bytes	11mm resolution (marble) geographic coordinate
`geo16`	`NP_Geo`	✓	16 bytes	110 microns resolution (grain of sand) geographic coordinate
`ulid`	`NP_ULID`	✓	16 bytes	6 bytes for the timestamp, 10 bytes of randomness.
`uuid`	`NP_UUID`	✓	16 bytes	v4 UUID, 2e37 possible UUIDs
`date`	`NP_Date`	✓	8 bytes	Good to store unix epoch (in milliseconds) until the year 584,866,263

* sorting must be set to true in the schema for this object to enable sorting.
** String & Bytes can be bytewise sorted only if they have a size property in the schema

Legend

Bytewise Sorting
Bytewise sorting means that two buffers can be compared at the byte level without deserializing and a correct ordering between the buffer's internal values will be found. This is extremely useful for storing ordered keys in databases.

Each type has specific notes on wether it supports bytewise sorting and what things to consider if using it for that purpose.

You can sort by multiple types/values if a tuple is used. The ordering of values in the tuple will determine the sort order. For example if you have a tuple with types (A, B) the ordering will first sort by A, then B where A is identical. This is true for any number of items, for example a tuple with types (A,B,C,D) will sort by D when A, B & C are identical.

Compaction
Campaction is an optional operation you can perform at any time on a buffer, typically used to recover free space. NoProto Buffers are contiguous, growing arrays of bytes. When you add or update a value sometimes additional memory is used and the old value is dereferenced, meaning the buffer is now occupying more space than it needs to. This space can be recovered with compaction. Compaction involves a recursive, full copy of all referenced & valid values of the buffer, it's an expensive operation that should be avoided.

Sometimes the space you can recover with compaction is minimal or you can craft your schema and upates in such a way that compactions are never needed, in these cases compaction can be avoided with little to no consequence.

Deleting a value will almost always mean space can be recovered with compaction, but updating values can have different outcomes to the space used depending on the type and options.

Each type will have notes on how updates can lead to wasted bytes and require compaction to recover the wasted space.

How do you run compaction on a buffer?

Schema Mutations
Once a schema is created all the buffers it creates depend on that schema for reliable de/serialization, data access, and compaction.

There are safe ways you can mutate a schema after it's been created without breaking old buffers, however those updates are limited. The safe mutations will be mentioned for each type, consider any other schema mutations unsafe.

Changing the type property of any value in the schame is unsafe. It's only sometimes safe to modify properties besides type.

Schema Types

Every schema type maps exactly to a native data type in your code.

table

Tables represnt a fixed number of named columns, with each column having it's own data type.

Bytewise Sorting: Unsupported
Compaction: Columns without values will be removed from the buffer durring compaction. If a column never had a value set it's using zero space in the buffer.
Schema Mutations: The ordering of items in the columns property must always remain the same. It's safe to add new columns to the bottom of the column list or rename columns, but never to remove columns. Column types cannot be changed safely. If you need to depreciate a column, set it's name to an empty string.

Table schemas have a single required property called columns. The columns property is an array of arrays that represent all possible columns in the table and their data types. Any type can be used in columns, including other tables.

Tables do not store the column names in the buffer, only the column index, so this is a very efficient way to store associated data.

If you need flexible column names use a map type instead.

{
    "type": "table",
    "columns": [ // can have between 1 and 255 columns
        ["column name",  {"type": "data type for this column"}],
        ["name",         {"type": "string"}],
        ["tags",         {"type": "list", "of": { // nested list of strings
            "type": "string"
        }}],
        ["age",          {"type": "u8"}], // Uint8 number
        ["meta",         {"type": "table", columns: [ // nested table
            ["favorite_color",  {"type": "string"}],
            ["favorite_sport",  {"type": "string"}]
        ]}]
    ]
}

More Details:

Using NP_Table data type

list

Lists represent a dynamically sized list of items. The type for every item in the list is identical and the order of entries is mainted in the buffer. Lists do not have to contain contiguous entries, gaps can safely and efficiently be stored.

Bytewise Sorting: Unsupported
Compaction: Indexes that have had their value cleared will be removed from the buffer. If a specific index never had a value, it occupies zero space.
Schema Mutations: None

Lists have a single required property in the schema, of. The of property contains another schema for the type of data contained in the list. Any type is supported, including another list. Tables cannot have more than 255 columns, and the colum names cannot be longer than 255 UTF8 bytes.

The more items you have in a list, the slower it will be to seek to values towards the end of the list or loop through the list.

// a list of list of strings
{
    "type": "list",
    "of": {
        "type": "list",
        "of": {"type": "string"}
    }
}
 
// list of numbers
{
    "type": "list",
    "of": {"type": "int32"}
}

More Details:

Using NP_List data type

map

A map is a dynamically sized list of items where each key is a Vec. Every value of a map has the same type.

Bytewise Sorting: Unsupported
Compaction: Keys without values are removed from the buffer
Schema Mutations: None

Maps have a single required property in the schema, value. The property is used to describe the schema of the values for the map. Keys are always String. Values can be any schema type, including another map.

If you expect to have fixed, predictable keys then use a table type instead. Maps are less efficient than tables because keys are stored in the buffer.

The more items you have in a map, the slower it will be to seek to values or loop through the map.

// a map where every value is a string
{
    "type": "map",
    "value": {
        "type": "string"
    }
}

More Details:

Using NP_Map data type

tuple

A tuple is a fixed size list of items. Each item has it's own type and index. Tuples support up to 255 items.

Bytewise Sorting: Supported if all children are scalars that support bytewise sorting and schema sorted is set to true.
Compaction: If sorted is true, compaction will not save space. Otherwise, tuples only reduce in size if children are deleted or children with a dyanmic size are updated.
Schema Mutations: If sorted is true, none. Otherwise adding new values to the end of the values schema property is safe.

Tuples have a single required property in the schema called values. It's an array of schemas that represnt the tuple values. Any schema is allowed, including other Tuples.

Sorting
You can use tuples to support bytewise sorting across a list of items. By setting the sorted property to true you enable a strict mode for the tuple that enables bytewise sorting. When sorted is enabled only scalar values that support sorting are allowed in the schema. For example, strings/bytes types can only be fixed size.

When sorted is true the order of values is gauranteed to be constant across buffers, allowing compound bytewise sorting.

{
    "type": "tuple",
    "values": [
        {"type": "string"},
        {"type": "list", "of": {"type": "strings"}},
        {"type": "uint64"}
    ]
}
 
// tuple for bytewise sorting
{
    "type": "tuple",
    "sorted": true,
    "values": [
        {"type": "string", "size": 25},
        {"type": "uint8"},
        {"type": "int64"}
    ]
}

More Details:

Using NP_Tuple data type

string

A string is a fixed or dynamically sized collection of utf-8 encoded bytes.

Bytewise Sorting: Supported only if size property is set in schema.
Compaction: If size property is set, compaction cannot reclaim space. Otherwise it will reclaim space unless all updates have been identical in length.
Schema Mutations: If the size property is set it's safe to make it smaller, but not larger (this may cause existing string values to truncate, though). If the field is being used for bytewise sorting, no mutation is safe.

{
    "type": "string"
}
// fixed size
{
    "type": "string",
    "size": 20
}
// with default value
{
    "type": "string",
    "default": "Default string value"
}

More Details:

Using String data type

bytes

Bytes are fixed or dynimcally sized Vec collections.

Bytewise Sorting: Supported only if size property is set in schema.
Compaction: If size property is set, compaction cannot reclaim space. Otherwise it will reclaim space unless all updates have been identical in length.
Schema Mutations: If the size property is set it's safe to make it smaller, but not larger (this may cause existing bytes values to truncate, though). If the field is being used for bytewise sorting, no mutation is safe.

{
    "type": "bytes"
}
// fixed size
{
    "type": "bytes",
    "size": 20
}
// with default value
{
    "type": "bytes",
    "default": [1, 2, 3, 4]
}

More Details:

Using NP_Bytes data type

int8, int16, int32, int64

Signed integers allow positive or negative whole numbers to be stored. The bytes are stored in big endian format and converted to unsigned types to allow bytewise sorting.

{
    "type": "int8"
}
// with default value
{
    "type": "int8",
    "default": 20
}

Bytewise Sorting: Supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

More Details:

Using number data types

uint8, uint16, uint32, uint64

Unsgined integers allow only positive whole numbers to be stored. The bytes are stored in big endian format to allow bytewise sorting.

Bytewise Sorting: Supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

{
    "type": "uint8"
}
// with default value
{
    "type": "uint8",
    "default": 20
}

More Details:

Using number data types

float, double

Allows the storage of floating point numbers of various sizes. Bytes are stored in big endian format.

Bytewise Sorting: Unsupported, use decimal type.
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

{
    "type": "float"
}
// with default value
{
    "type": "float",
    "default": 20.283
}

More Details:

Using number data types

option

Allows efficeint storage of a selection between a known collection of ordered strings. The selection is stored as a single u8 byte, limiting the max number of choices to 255. Also the choices themselves cannot be longer than 255 UTF8 bytes each.

Bytewise Sorting: Supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: You can safely add new choices to the end of the list or update the existing choices in place. If you need to delete a choice, just make it an empty string. Changing the order of the choices is destructive as this type only stores the index of the choice it's set to.

There is one required property of this schema called choices. The property should contain an array of strings that represent all possible choices of the option.

{
    "type": "option",
    "choices": ["choice 1", "choice 2", "etc"]
}
// with default value
{
    "type": "option",
    "choices": ["choice 1", "choice 2", "etc"],
    "default": "etc"
}

More Details:

Using NP_Enum data type

bool

Allows efficent storage of a true or false value. The value is stored as a single byte that is set to either 1 or 0.

Bytewise Sorting: Supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

{
    "type": "bool"
}
// with default value
{
    "type": "bool",
    "default": false
}

More Details:

decimal

Allows you to store fixed point decimal numbers. The number of decimal places must be declared in the schema as exp property and will be used for every value.

Bytewise Sorting: Supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

There is a single required property called exp that represents the number of decimal points every value will have.

{
    "type": "decimal",
    "exp": 3
}
// with default value
{
    "type": "decimal",
    "exp": 3,
    "default": 20.293
}

More Details:

Using NP_Dec data type

geo4, ge8, geo16

Allows you to store geographic coordinates with varying levels of accuracy and space usage.

Bytewise Sorting: Not supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

Larger geo values take up more space, but allow greater resolution.

Type	Bytes	Earth Resolution	Decimal Places
geo4	4	1.1km resolution (city)	2
geo8	8	11mm resolution (marble)	7
geo16	16	110 microns resolution (grain of sand)	9

{
    "type": "geo4"
}
// with default
{
    "type": "geo4",
    "default": {"lat": -20.283, "lng": 19.929}
}

More Details:

Using NP_Geo data type

ulid

Allows you to store a unique ID with a timestamp. The timestamp is stored in milliseconds since the unix epoch.

Bytewise Sorting: Supported, orders by timestamp. Order is random if timestamp is identical between two values.
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

{
    "type": "ulid"
}
// no default supported

More Details:

Using NP_ULID data type

uuid

Allows you to store a universally unique ID.

Bytewise Sorting: Supported, but values are random
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

{
    "type": "uuid"
}
// no default supported

More Details:

Using NP_UUID data type

date

Allows you to store a timestamp as a u64 value. This is just a thin wrapper around the u64 type.

Bytewise Sorting: Supported
Compaction: Updates are done in place, never use additional space.
Schema Mutations: None

{
    "type": "date"
}
// with default value (default should be in ms)
{
    "type": "date",
    "default": 1605909163951
}

More Details:

Using NP_Date data type

Next Step

Read about how to initialize a schema into a NoProto Factory.

Go to NP_Factory docs

Enums

NP_Parsed_Schema	When a schema is parsed from JSON or Bytes, it is stored in this recursive type
NP_TypeKeys	Simple enum to store the schema types

Type Definitions

NP_Schema_Addr

Schema Address (usize alias)