1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
//! # `corncobs`: Corny COBS encoding/decoding in Rust
//!
//! This crate provides [Consistent Overhead Byte Stuffing][cobs] (COBS) support
//! for Rust programs, with a particular focus on resource-limited embedded
//! `no_std` targets:
//!
//! - Provides both fast (buffer-to-buffer) and small (in-place or
//! iterator-based) versions of both encode and decode routines.
//!
//! - Provides a `const fn` for computing the maximum encoded size for a given
//! input size, so you can define fixed-size buffers precisely without magic
//! numbers.
//!
//! - Has pretty good test coverage, [Criterion] benchmarks, and a [honggfuzz]
//! fuzz testing suite to try to ensure code quality.
//!
//! ## When to use this crate
//!
//! COBS lets us take an arbitrary blob of bytes and turn it into a slightly
//! longer blob that doesn't contain a certain byte, except as a terminator at
//! the very end. `corncobs` implements the version of this where the byte is
//! zero. That is, `corncobs` can take a sequence of arbitrary bytes, and turn
//! it into a slightly longer sequence that doesn't contain zero except at the
//! end.
//!
//! The main reason you'd want to do this is _framing._ If you're transmitting a
//! series of messages over a stream, you need some way to tell where the
//! messages begin and end. There are many ways to do this -- such as by
//! transmitting a length before every message -- but most of them don't support
//! _sync recovery._ Sync recovery lets a receiver tune in anywhere in a stream
//! and figure out (correctly) where the next message boundary is. The easiest
//! way to provide sync recovery is to use a marker at the beginning/end of each
//! message that you can reliably tell apart from the data in the messages. To
//! find message boundaries in an arbitrary data stream, you only need to hunt
//! for the end of the current message and start parsing from there. COBS can do
//! this by ensuring that the message terminator character (0) only appears
//! between messages.
//!
//! Unlike a lot of framing methods (particularly [SLIP]), COBS guarantees an
//! upper bound to the size of the encoded output: the original length, plus two
//! bytes, plus one byte per 254 input bytes. `corncobs` provides the
//! [`max_encoded_len`] function for sizing buffers to allow for worst-case
//! encoding overhead, at compile time.
//!
//! `corncobs` can be used in several different ways, each with different costs
//! and benefits.
//!
//! - Encoding
//! - [`encode_buf`]: from one slice to another; efficient, but requires 2x
//! the available RAM.
//! - [`encode_iter`]: incremental, using an iterator; somewhat slower, but
//! requires no additional memory. (This can be useful in a serial interrupt
//! handler.)
//! - Decoding
//! - [`decode_buf`]: from one slice to another; efficient, but requires 2x
//! the available RAM.
//! - [`decode_in_place`]: in-place in a slice; nearly as efficient, but
//! overwrites incoming data.
//!
//! ## Design decisions / tradeoffs
//!
//! `corncobs` is optimized for a fast and simple implementation. To get best
//! performance on normal data, it leaves something out: **validation**.
//!
//! Specifically: `corncobs` will decode invalid COBS data that contains zeroes
//! in unexpected places mid-message. It could reject such data by scanning for
//! zeroes. We chose not to do this for performance reasons, and justify it with
//! the following points.
//!
//! First: we don't have to do this to maintain memory safety. Several C
//! implementations of COBS do data validation in an attempt to avoid buffer
//! overruns or out-of-bounds accesses. We're not writing in C and don't have
//! this problem to worry about.
//!
//! Second: it really does improve performance, by about 5x in benchmarks. This
//! is because, by lifting the requirement to inspect every byte hunting for
//! zeroes, we can use `copy_from_slice` to move data around, which calls
//! optimized memory-move routines for the target architecture that are
//! _basically always_ much faster than moving bytes.
//!
//! Third: COBS does not guarantee integrity. Spurious zeroes in the middle of a
//! message is only one way your input data could be corrupted. Your application
//! needs to handle _all_ possible corruption, which means having an integrity
//! check on the COBS-decoded data, such as a CRC.
//!
//! If you feed `corncobs` random invalid data, it will either return
//! unexpectedly short decoded results (which will fail your next-level
//! integrity check), or it will return an `Err`. It will not crash, corrupt
//! memory, or `panic!`, and we have tests to demonstrate this.
//!
//! ## Cargo `features`
//!
//! No features are enabled by default. Embedded programmers do not need to
//! specify `default-features = false` when using `corncobs` because who said
//! `std` should be the default anyhow? People with lots of RAM, that's who.
//!
//! Features:
//!
//! - `std`: if you're on one of them "big computers" with "infinite memory" and
//! can afford the inherent nondeterminism of dynamic memory allocation, this
//! feature enables routines for encoding to-from `Vec`, and an `Error` impl
//! for `CobsError`.
//!
//! ## Tips for using COBS
//!
//! If you're designing a protocol or message format and considering using COBS,
//! you have some options.
//!
//! **Optimizing for size:** COBS encoding has the least overhead when the data
//! being encoded contains `0x00` bytes, at least one for every 254 bytes sent.
//! In practice, most data formats achieve this. However...
//!
//! **Optimizing for speed:** COBS encode/decode, and particularly the
//! `corncobs` implementation, goes fastest when data contains as _few_ `0x00`
//! bytes as possible -- ideally none. If you can adjust the data you're
//! encoding to avoid zero, you can achieve higher encode/decode rates. For
//! instance, in one of my projects that sends RGB video data, I just declared
//! that red/green/blue value 1 is the same as 0, and made all the 0s into 1s,
//! for a large performance improvement.
//!
//! [cobs]: https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing
//! [Criterion]: https://docs.rs/criterion/latest/criterion/
//! [honggfuzz]: https://docs.rs/honggfuzz/latest/honggfuzz/
//! [SLIP]: https://en.wikipedia.org/wiki/Serial_Line_Internet_Protocol
// So far, the implementation is performant without the use of `unsafe`. To
// ensure that I think before breaking this property down the road, I'm
// currently configuring the compiler to reject `unsafe`. This is not a promise
// or a religious decision and might get changed in the future; merely scanning
// for the presence of `unsafe` is neither necessary nor sufficient for auditing
// crates you depend on, including this one.
/// The termination byte used by `corncobs`. Yes, it's a bit silly to have this
/// as a constant -- but the implementation is careful to use this named
/// constant whenever it is talking about the termination byte, for clarity.
///
/// The value of this (`0`) is assumed by the implementation and can't easily be
/// changed.
pub const ZERO: u8 = 0;
/// Longest run of unchanged bytes that can be encoded using COBS.
///
/// Changing this will decrease encoding efficiency and break compatibility with
/// other COBS implementations, so, don't do that.
const MAX_RUN: usize = 254;
/// Returns the largest possible encoded size for an input message of `raw_len`
/// bytes, considering overhead.
///
/// This is a `const fn` so that you can use it to size arrays:
///
/// ```
/// const MSG_SIZE: usize = 254;
/// // Worst-case input message: no zeroes to exploit.
/// let mut msg = [0xFF; MSG_SIZE];
/// // This will still be enough space!
/// let mut encoded = [0; corncobs::max_encoded_len(MSG_SIZE)];
///
/// let len = corncobs::encode_buf(&msg, &mut encoded);
/// assert_eq!(len, encoded.len());
/// ```
pub const
/// Encodes the message `bytes` into the buffer `output`. Returns the number of
/// bytes used in `output`, which also happens to be the index of the first zero
/// byte.
///
/// Bytes in `output` after the part that gets used are left unchanged.
///
/// `output` must be large enough to receive the encoded form, which is
/// `max_encoded_len(bytes.len())` worst-case.
///
/// # Panics
///
/// If `output` is too small to contain the encoded form of `input`.
/// Encodes `bytes` into the vector `output`. This is a convenience for cases
/// where you have `std` available.
/// Encoding a len (between `0` and `MAX_RUN` inclusive) into a byte such that
/// we avoid `ZERO`.
/// Encodes `bytes` into COBS form, yielding individual encoded bytes through an
/// iterator.
///
/// This is quite a bit slower than memory-to-memory encoding (e.g.
/// `encode_buf`) because it can't move whole blocks of non-zero bytes at a
/// time -- about 35-40x slower in benchmarks. However, if your throughput is
/// restricted by the speed of a link that gets fed one byte a time, such as a
/// serial peripheral, this can encode messages with no additional memory.
+ '_
/// State for incremental encoding.
/// Takes a run off the front of `bytes`. The run will be between 0 and
/// `MAX_RUN` bytes, inclusive, and will not include any `ZERO` bytes.
///
/// If the run is empty, it means the next byte in `bytes` was `ZERO`.
///
/// Returns `(run, rest)`, where `rest` is...
///
/// - `None`, if this run consumed the entire slice.
/// - `Some(stuff)`, if after this run there is still data to process.
///
/// Note that `stuff` may be empty, if `bytes` ends in a `ZERO`. It is still
/// important to process `stuff` in that case.
/// Decodes `bytes` into a vector.
///
/// This is a convenience for cases where you have `std` available. Its behavior
/// is otherwise identical to `decode_buf`.
/// Decodes input from `bytes` into `output` starting at index 0. Returns the
/// number of bytes used in `output`.
///
/// # Panics
///
/// If `output` is not long enough to receive the decoded output. To be safe,
/// `output` must be at least `max_encoded_len(bytes.len())`.
/// Errors that can occur while decoding.
/// Decodes a length-or-terminator byte. If the byte is `ZERO`, returns `None`.
/// Otherwise returns the length of the run encoded by the byte.
/// Decodes an encoded message, in-place. This is useful when you're short on
/// memory. Since the decoded form of a COBS frame is always shorter than the
/// encoded form, `bytes` is guaranteed to be long enough.
///
/// The decoded message is deposited into `bytes` starting at index 0, and
/// `decode_in_place` returns the number of decoded bytes.
///
/// If you've got memory to spare, `decode_buf` is often somewhat faster --
/// `decode_in_place` takes between 1x and 3x the time in benchmarks. You may
/// also prefer to use `decode_buf` if you can't overwrite the incoming data,
/// for whatever reason.
// Tests for private bits; test fixtures require std, unfortunately, so you have
// to run these explicitly with `cargo test --features std`. Most of the API
// tests are broken out into an integration test.