Expand description
§corncobs
: Corny COBS encoding/decoding in Rust
This crate provides Consistent Overhead Byte Stuffing (COBS) support
for Rust programs, with a particular focus on resource-limited embedded
no_std
targets:
-
Provides both fast (buffer-to-buffer) and small (in-place or iterator-based) versions of both encode and decode routines.
-
Provides a
const fn
for computing the maximum encoded size for a given input size, so you can define fixed-size buffers precisely without magic numbers. -
Has pretty good test coverage, Criterion benchmarks, and a honggfuzz fuzz testing suite to try to ensure code quality.
§When to use this crate
COBS lets us take an arbitrary blob of bytes and turn it into a slightly
longer blob that doesn’t contain a certain byte, except as a terminator at
the very end. corncobs
implements the version of this where the byte is
zero. That is, corncobs
can take a sequence of arbitrary bytes, and turn
it into a slightly longer sequence that doesn’t contain zero except at the
end.
The main reason you’d want to do this is framing. If you’re transmitting a series of messages over a stream, you need some way to tell where the messages begin and end. There are many ways to do this – such as by transmitting a length before every message – but most of them don’t support sync recovery. Sync recovery lets a receiver tune in anywhere in a stream and figure out (correctly) where the next message boundary is. The easiest way to provide sync recovery is to use a marker at the beginning/end of each message that you can reliably tell apart from the data in the messages. To find message boundaries in an arbitrary data stream, you only need to hunt for the end of the current message and start parsing from there. COBS can do this by ensuring that the message terminator character (0) only appears between messages.
Unlike a lot of framing methods (particularly SLIP), COBS guarantees an
upper bound to the size of the encoded output: the original length, plus two
bytes, plus one byte per 254 input bytes. corncobs
provides the
max_encoded_len
function for sizing buffers to allow for worst-case
encoding overhead, at compile time.
corncobs
can be used in several different ways, each with different costs
and benefits.
- Encoding
encode_buf
: from one slice to another; efficient, but requires 2x the available RAM.encode_iter
: incremental, using an iterator; somewhat slower, but requires no additional memory. (This can be useful in a serial interrupt handler.)
- Decoding
decode_buf
: from one slice to another; efficient, but requires 2x the available RAM.decode_in_place
: in-place in a slice; nearly as efficient, but overwrites incoming data.
§Design decisions / tradeoffs
corncobs
is optimized for a fast and simple implementation. To get best
performance on normal data, it leaves something out: validation.
Specifically: corncobs
will decode invalid COBS data that contains zeroes
in unexpected places mid-message. It could reject such data by scanning for
zeroes. We chose not to do this for performance reasons, and justify it with
the following points.
First: we don’t have to do this to maintain memory safety. Several C implementations of COBS do data validation in an attempt to avoid buffer overruns or out-of-bounds accesses. We’re not writing in C and don’t have this problem to worry about.
Second: it really does improve performance, by about 5x in benchmarks. This
is because, by lifting the requirement to inspect every byte hunting for
zeroes, we can use copy_from_slice
to move data around, which calls
optimized memory-move routines for the target architecture that are
basically always much faster than moving bytes.
Third: COBS does not guarantee integrity. Spurious zeroes in the middle of a message is only one way your input data could be corrupted. Your application needs to handle all possible corruption, which means having an integrity check on the COBS-decoded data, such as a CRC.
If you feed corncobs
random invalid data, it will either return
unexpectedly short decoded results (which will fail your next-level
integrity check), or it will return an Err
. It will not crash, corrupt
memory, or panic!
, and we have tests to demonstrate this.
§Cargo features
No features are enabled by default. Embedded programmers do not need to
specify default-features = false
when using corncobs
because who said
std
should be the default anyhow? People with lots of RAM, that’s who.
Features:
std
: if you’re on one of them “big computers” with “infinite memory” and can afford the inherent nondeterminism of dynamic memory allocation, this feature enables routines for encoding to-fromVec
, and anError
impl forCobsError
.
§Tips for using COBS
If you’re designing a protocol or message format and considering using COBS, you have some options.
Optimizing for size: COBS encoding has the least overhead when the data
being encoded contains 0x00
bytes, at least one for every 254 bytes sent.
In practice, most data formats achieve this. However…
Optimizing for speed: COBS encode/decode, and particularly the
corncobs
implementation, goes fastest when data contains as few 0x00
bytes as possible – ideally none. If you can adjust the data you’re
encoding to avoid zero, you can achieve higher encode/decode rates. For
instance, in one of my projects that sends RGB video data, I just declared
that red/green/blue value 1 is the same as 0, and made all the 0s into 1s,
for a large performance improvement.
Structs§
Enums§
- Cobs
Error - Errors that can occur while decoding.
- Decode
Status
Constants§
- ZERO
- The termination byte used by
corncobs
. Yes, it’s a bit silly to have this as a constant – but the implementation is careful to use this named constant whenever it is talking about the termination byte, for clarity.
Functions§
- decode_
buf - Decodes input from
bytes
intooutput
starting at index 0. Returns the number of bytes used inoutput
. - decode_
in_ place - Decodes an encoded message, in-place. This is useful when you’re short on
memory. Since the decoded form of a COBS frame is always shorter than the
encoded form,
bytes
is guaranteed to be long enough. - encode_
buf - Encodes the message
bytes
into the bufferoutput
. Returns the number of bytes used inoutput
, which also happens to be the index of the first zero byte. - encode_
iter - Encodes
bytes
into COBS form, yielding individual encoded bytes through an iterator. - max_
encoded_ len - Returns the largest possible encoded size for an input message of
raw_len
bytes, considering overhead.