Module fastnbt::de

source ·
Expand description

This module contains a serde deserializer. It can do most of the things you would expect of a typical serde deserializer, such as deserializing into:

  • Rust structs.
  • containers like HashMap and Vec.
  • an arbitrary Value.
  • enums. For NBT typically you want either internally or untagged enums.

This deserializer supports from_bytes for zero-copy deserialization for types like &[u8] and borrow::LongArray. There is also from_reader for deserializing from types implementing Read.

§Avoiding allocations

When using from_bytes, we can avoid allocations for things like strings and vectors, instead deserializing into a reference to the input data.

The following table summarises what types you likely want to store NBT data in for owned or borrowed types:

NBT typeOwned typeBorrowed type
Byteu8 or i8use owned
Shortu16 or i16use owned
Inti32 or u32use owned
Longi64 or u64use owned
Floatf32use owned
Doublef64use owned
StringStringCow<'a, str> or &[u8] (see below)
ListVec<T>use owned
Byte ArrayByteArrayborrow::ByteArray
Int ArrayIntArrayborrow::IntArray
Long ArrayLongArrayborrow::LongArray

§Primitives

Borrowing for primitive types like the integers and floats is generally not possible due to alignment requirements of those types. It likely wouldn’t be faster/smaller anyway.

§Strings

For strings, we cannot know ahead of time whether the data can be borrowed as &str. This is because Minecraft uses Java’s encoding of Unicode, not UTF-8. If the string contains Unicode characters outside of the Basic Multilingual Plane then we need to convert it to UTF-8, requiring us to own the string data.

Using Cow<'a, str> lets us borrow when possible, but produce an owned value when the representation is different.

Strings can also be deserialized to &[u8] which will always succeed. These bytes will be Java’s CESU-8 format. You can use cesu8::from_java_cesu8 to decode this.

§Representation of NBT arrays

In order for Value to preserve all NBT information, the deserializer “maps into serde’s data model”. As a consequence of this, NBT array types must be (de)serialized using the types provided in this crate, eg LongArray. Sequence containers like Vec will (de)serialize to NBT Lists, and will fail if an NBT array is instead expected.

§128 bit integers and UUIDs

UUIDs tend to be stored in NBT using 4-long IntArrays. When deserializing i128 or u128, IntArray with length 4 are accepted. This is parsed as big endian i.e. the most significant bit (and int) is first.

§Other quirks

Some other quirks which may not be obvious:

  • When deserializing to unsigned types such as u32, it will be an error if a value is negative to avoid unexpected behaviour with wrap-around. This does not apply to deserializing lists of integrals to u8 slice or vectors.
  • Any integral value from NBT can be deserialized to bool. Any non-zero value becomes true. Bear in mind serializing the same type will change the NBT structure, likely unintended.
  • You can deserialize a field to the unit type () or unit struct. This ignores the value but ensures that it existed.
  • You cannot deserialize into anything other than a struct or similar container eg HashMap. This is due to a misalignment between the NBT format and Rust’s types. Attempting to will give an error about no root compound. This means you can never do let s: String = from_bytes(...). Serialization of a struct assumes an empty-named compound.

§Example Minecraft types

This section demonstrates writing types for a few real Minecraft structures.

§Extracting entities as an enum

This demonstrates the type that you would need to write in order to extract some subset of entities. This uses a tagged enum in serde, meaning that it will look for a certain field in the structure to tell it what enum variant to deserialize into. We use serde’s other attribute to not error when an unknown entity type is found.

use serde::Deserialize;

#[derive(Deserialize, Debug)]
#[serde(tag = "id")]
enum Entity {
   #[serde(rename = "minecraft:bat")]
   Bat {
       #[serde(rename = "BatFlags")]
       bat_flags: i8,
   },

   #[serde(rename = "minecraft:creeper")]
   Creeper { ignited: i8 },

   // Entities we haven't coded end up as just 'unknown'.
   #[serde(other)]
   Unknown,
}

§Capture unknown entities

If you need to capture all entity types, but do not wish to manually type all of them, you can wrap the above entity type in an untagged enum.

use serde::Deserialize;
use fastnbt::Value;

#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum Entity {
    Known(KnownEntity),
    Unknown(Value),
}
#[derive(Deserialize, Debug)]
#[serde(tag = "id")]
enum KnownEntity {
    #[serde(rename = "minecraft:bat")]
    Bat {
        #[serde(rename = "BatFlags")]
        bat_flags: i8,
    },
    #[serde(rename = "minecraft:creeper")]
    Creeper { ignited: i8 },
}

§Avoiding allocations in a Chunk

This example shows how to avoid some allocations. The Section type below contains the block states which stores the state of part of the Minecraft world. In NBT this is bit-packed data stored as an array of longs (i64). We avoid allocating a vector for this by storing it as a borrow::LongArray instead, which stores it as &[u8] under the hood. We can’t safely store it as &[i64] due to memory alignment constraints. The fastanvil crate has a PackedBits type that can handle the unpacking of these block states.

use fastnbt::borrow::LongArray;

#[derive(Deserialize)]
struct Chunk<'a> {
    #[serde(rename = "Level")]
    #[serde(borrow)]
    level: Level<'a>,
}

#[derive(Deserialize)]
struct Level<'a> {
    #[serde(rename = "Sections")]
    #[serde(borrow)]
    pub sections: Option<Vec<Section<'a>>>,
}

#[derive(Deserialize, Debug)]
#[serde(rename_all = "PascalCase")]
pub struct Section<'a> {
    #[serde(borrow)]
    pub block_states: Option<LongArray<'a>>,
}

§Unit variant enum from status of chunk

use serde::Deserialize;

#[derive(Deserialize)]
struct Chunk {
    #[serde(rename = "Level")]
    level: Level,
}

#[derive(Deserialize)]
struct Level {
    #[serde(rename = "Status")]
    status: Status,
}

#[derive(Deserialize, PartialEq, Debug)]
#[serde(rename_all = "snake_case")]
enum Status {
    Empty,
    StructureStarts,
    StructureReferences,
    Biomes,
    Noise,
    Surface,
    Carvers,
    LiquidCarvers,
    Features,
    Light,
    Spawn,
    Heightmaps,
    Full,
}

Structs§

  • Deserializer for NBT data. See the de module for more information.