[−][src]Module no_proto::schema
Schemas are JSON used to declare & store the shape of buffer objects.
No Proto Schemas are JSON objects that describe how the data in an NP_Buffer is stored.
Every schema object has at least a "type" property that provides the kind of value stored at that part of the schema. Additional keys are dependent on the type of schema.
Schemas are validated and sanity checked by the NP_Factory struct upon creation. You cannot pass an invalid schema into a factory constructor and build/parse buffers with it.
If you're familiar with typescript, schemas can be described by this recursive interface:
interface NP_Schema {
// table, string, bytes, etc
type: string;
// used by string & bytes types
size?: number;
// used by Dec32 and Dec64 types, the number of decimal places each value has
precision?: number;
// used by table, list & tuple types to indicite bytewise sorting
sorted?: boolean;
// used by list types
of?: NP_Schema
// used by map types
value?: NP_Schema
// used by tuple types
values?: NP_Schema[]
// used by table types
columns?: [string, NP_Schema][]
}
Schemas can be as simple as a single scalar type, for example a perfectly valid schema for a buffer that contains only a string:
{
"type": "string"
}
However, you will likely want to store collections of items, so that's easy to do as well.
{
"type": "table",
"columns": [
["userID", {"type": "string"}],
["password", {"type": "string"}],
["email", {"type": "string"}]
]
}
There are multiple collection types, and they can be nested.
For example, this is a list of tables. Each table has two columns: id and title. Both columns are a string type.
{
"type": "list",
"of": {
"type": "table",
"columns": [
["id", {type: "string"}]
["title", {type: "string"}]
]
}
}
A list of strings is just as easy...
{
"type": "list",
"of": { type: "string" }
}
Each type has trade offs associated with it. The table and documentation below go into further detail.
Here is a table of supported types.
Type | Rust Type / Struct | Bytewise Sorting | Bytes (Size) | Limits / Notes |
---|---|---|---|---|
table | NP_Table | ✓ * | 4 bytes - ~4GB | Linked list with indexed keys that map against up to 255 named columns. |
list | NP_List | ✓ * | 8 bytes - ~4GB | Linked list with integer indexed values and up to 65,535 items. |
map | NP_Map | 𐄂 | 4 bytes - ~4GB | Linked list with Vec<u8> keys. |
tuple | NP_Tuple | ✓ * | 4 bytes - ~4GB | Static sized collection of specific values. |
any | NP_Any | 𐄂 | 4 bytes - ~4GB | Generic type. |
string | String | ✓ ** | 4 bytes - ~4GB | Utf-8 formatted string. |
bytes | NP_Bytes | ✓ ** | 4 bytes - ~4GB | Arbitrary bytes. |
int8 | i8 | ✓ | 1 byte | -127 to 127 |
int16 | i16 | ✓ | 2 bytes | -32,768 to 32,768 |
int32 | i32 | ✓ | 4 bytes | -2,147,483,648 to 2,147,483,648 |
int64 | i64 | ✓ | 8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,808 |
uint8 | u8 | ✓ | 1 byte | 0 - 255 |
uint16 | u16 | ✓ | 2 bytes | 0 - 65,535 |
uint32 | u32 | ✓ | 4 bytes | 0 - 4,294,967,295 |
uint64 | u64 | ✓ | 8 bytes | 0 - 18,446,744,073,709,551,616 |
float | f32 | 𐄂 | 4 bytes | -3.4e38 to 3.4e38 |
double | f64 | 𐄂 | 8 bytes | -1.7e308 to 1.7e308 |
option | NP_Option | ✓ | 1 byte | Up to 255 string based options in schema. |
bool | bool | ✓ | 1 byte | |
dec64 | NP_Dec | ✓ | 8 bytes | Fixed point decimal number based on i64. |
geo4 | NP_Geo | ✓ *** | 4 bytes | 1.1km resolution (city) geographic coordinate |
geo8 | NP_Geo | ✓ *** | 8 bytes | 11mm resolution (marble) geographic coordinate |
geo16 | NP_Geo | ✓ *** | 16 bytes | 110 microns resolution (grain of sand) geographic coordinate |
tid | NP_TimeID | ✓ | 16 bytes | u64 for time with 8 random bytes. |
uuid | NP_UUID | ✓ | 16 bytes | v4 UUID, 2e37 possible UUID v4s |
date | NP_Date | ✓ | 8 bytes | Good to store unix epoch (in seconds) until the year 584,942,417,355 |
- * For some collections to work with bytewise sorting,
sorting
must be set totrue
in the collection schema and other constraints must be met. - ** String & Bytes can be bytewise sorted only if they have a fixed length in the schema
- *** Geo types cannot be collectively sorted since they contain two values, but the individual lat/lon values can be bytewise sorted
Legend
Bytewise Sorting
Bytewise sorting means that two buffers can be compared at the byte level without deserializing and a correct ordering between the buffer's internal values will be found. This is extremely useful for storing ordered keys in databases.
Each type has specific notes on wether it supports bytewise sorting and what things to consider if using it for that purpose.
You can sort by multiple types/values if a tuple is used. The ordering of values in the tuple will determine the sort order. For example if you have a tuple with types (A, B) the ordering will first sort by A, then B where A is identical. This is true for any number of items, for example a tuple with types (A,B,C,D) will sort by D when A, B & C are identical.
Compaction
NoProto Buffers are contiguous, growing arrays of bytes. When you add or update a value sometimes additional memory is used and the old value is dereferenced, meaning the buffer is now occupying more space than it needs to. This space can be recovered with compaction. Compaction involves a recursive, full copy of all referenced & valid values of the buffer, it's an expensive operation that should be avoided.
Sometimes the space you can recover with compaction is minimal or you can craft your schema and upates in such a way that compactions are never needed, in these cases compaction can be avoided with little to no consequence.
Deleting a value will always mean space can be recovered with compaction, but updating values can have different outcomes to the space used depending on the type and options.
Each type will have notes on how updates can lead to wasted bytes and require compaction to recover the wasted space.
Schema Mutations
Once a schema is created all the buffers it creates depend on that schema for reliable de/serialization, data access, and compaction.
There are safe ways you can mutate a schema after it's been created without breaking old buffers, however those updates are limited. The safe mutations will be mentioned for each type, consider any other schema mutations unsafe.
Changing the type
property of any value in the schame is unsafe. It's only sometimes safe to modify properties besides type
.
table
Tables represnt a fixed number of named columns, with each column having it's own data type.
- Bytewise Sorting: Supported if all column types support bytewise sorting and if the same columns are used in every buffer and are set in the same order.
- Compaction: columns without values will be removed from the buffer
- Mutations: The ordering of items in the
columns
property must always remain the same. It's safe to add new columns to the bottom of the column list or rename columns, but never to remove columns. Column types cannot be changed safely. If you need to depreciate a column, set it's name to an empty string.
list
Lists represent a dynamically growing or shrinking list of items. The type for every item in the list is identical and the order of entries is mainted in the buffer. Lists do not have to contain contiguous entries, gaps can safely and efficiently be stored.
- Bytewise Sorting: Supported if
of
type supports bytewise sorting and if the same indexes are used in every buffer and set in the same order. - Compaction: Indexes without valuse are removed from the buffer
- Mutations: None
map
A map is a dynamically growing or shrinking list of items where each key is a Vec
- Bytewise Sorting: Unsupported
- Compaction: keys without values are removed from the buffer
- Mutations: None
tuple
A tuple is a fixed size list of items. Each item has it's own type and index. Tuples support up to 255 items.
- Bytewise Sorting: Supported if all children support bytewise sorting and schema
sorted
is set totrue
. Unlike lists and tables, the ordering of values will be enforced by the tuple based on it'svalues
property. - Compaction: Tuples only reduce in size if children are deleted or children with a dyanmic size are updated.
- Mutations: It's safe to remove values from a tuple schema of
values
, but never to add new values or update value types. No mutations are safe ifsorted
istrue
.
any
Any types are used to declare that a specific type has no fixed schema but is dynamic. It's generally not a good idea to use Any types.
- Bytewise Sorting: Unsupported
- Compaction: Any types are always compacted out of the buffer, data stored behind an
any
schema will be lost after compaction. - Mutations: None
string
A string is a fixed or dynamically sized collection of utf-8 encoded bytes.
- Bytewise Sorting: Supported only if
size
property is set in schema. - Compaction: If dynamic/changing size between updates compaction can save space. If the size is fixed compaction will not reclaim space.
- Mutations: If the
size
property is set it's safe to make it smaller, but not larger. If the field is being used for bytewise sorting, no mutation is safe.
bytes
Bytes are fixed or dynimcally sized Vec
- Bytewise Sorting: Supported only if
size
property is set in schema. - Compaction: If dynamic/changing size between updates compaction can save space. If the size is fixed compaction will not reclaim space.
- Mutations: If the
size
property is set it's safe to make it smaller, but not larger. If the field is being used for bytewise sorting, no mutation is safe.
int8, int16, int32, int64
Signed integers allow positive or negative numbers to be stored. The bytes are stored in big endian format and converted to unsigned types to allow bytewise sorting.
- Bytewise Sorting: Supported
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
uint8, uint16, uint32, uint64
Unsgined integers allow only positive numbers to be stored. The bytes are stored in big endian format to allow bytewise sorting.
- Bytewise Sorting: Supported
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
float, double
Allows the storage of floating point numbers of various sizes. Bytes are stored in big endian format.
- Bytewise Sorting: Unsupported, use Dec32 or Dec64 types.
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
option
Allows efficeint storage of a selection between a known collection of ordered strings. The selection is stored as a single u8 byte, limiting the max choices to 255.
- Bytewise Sorting: Supported
- Compaction: Updates are done in place, never use additional space.
- Mutations: You can safely add new choices to the end of the list or update the existing choices in place. If you need to delete a choice, make it an empty string. Changing the order of the choices is destructive as this type only stores the index of the choice it's set to.
bool
Allows efficent storage of a true or false value. The value is stored as a single byte that is set to either 1 or 0.
- Bytewise Sorting: Supported
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
dec64
Allows you to store fixed point decimal numbers. The number of decimal places must be declared in the schema as precision
property and will be used for every value.
- Bytewise Sorting: Supported
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
geo4, ge8, geo16
Allows you to store geographic coordinates with varying levels of accuracy and space usage.
- Bytewise Sorting: Not supported, but the individual lat/lon values can be sorted.
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
Larger geo values take up more space, but allow greater resolution.
Type | Bytes | Earth Resolution | Decimal Places |
---|---|---|---|
geo4 | 4 | 1.1km resolution (city) | 2 |
geo8 | 8 | 11mm resolution (marble) | 7 |
geo16 | 16 | 110 microns resolution (grain of sand) | 9 |
tid
Allows you to store a unique ID with a timestamp.
- Bytewise Sorting: Supported, orders by timestamp. Order is random if timestamp is identical between two values.
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
uuid
Allows you to store a universally unique ID.
- Bytewise Sorting: Supported, but values are always random
- Compaction: Updates are done in place, never use additional space.
- Mutations: None
date
Allows you to store a timestamp as a u64 value. This is just a thin wrapper around the u64 type.
- Bytewise Sorting: Supported
- Compaction: Updates are done in place, never use additional space.
- Mutations: None