bilrost 0.1014.2

A compact protobuf-like serializer and deserializer for the Rust Language.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
[![continuous integration](https://github.com/mumbleskates/bilrost/actions/workflows/ci.yml/badge.svg?branch=bilrost)](https://github.com/mumbleskates/bilrost/actions/workflows/ci.yml)
[![Documentation](https://docs.rs/bilrost/badge.svg)](https://docs.rs/bilrost/)
[![Crate](https://img.shields.io/crates/v/bilrost.svg)](https://crates.io/crates/bilrost)
[![Dependency Status](https://deps.rs/repo/github/mumbleskates/bilrost/status.svg)](https://deps.rs/repo/github/mumbleskates/bilrost)

# *BILROST!*

Bilrost is an encoding format designed for storing and transmitting structured
data, such as in file formats or network protocols. The encoding is binary, and
unsuitable for reading directly by humans; however, it does have other other
useful properties and advantages. This crate, `bilrost`, is its first
implementation and its first instantiation.

Bilrost is designed with the following goals in mind:

* A stable encoding format, simple to specify and relatively easy to implement
  even in other languages
* Durable encoded data, suitable to retain across many versions of the
  application that generated it or to transmit between applications that have
  very different versions[^extensions]
* Good performance, [comparable]#comparisons-to-other-encodings to what is
  achievable in encodings with similar design
* Canonical encoding and [distinguished decoding]#distinguished-decoding
* Unintrusive: implementations should be able to efficiently implement encoding
  and decoding on structs that are already in use in the program. At worst they
  should be extremely similar to the structs in use and easy to populate & move
  around the program, rather than forcing users to use structs code-generated by
  a tool that come with cumbersome access & modification APIs[^derive-codegen]

[^derive-codegen]: The `bilrost` Rust library implements encoding and decoding
for message types via `#[derive]` macros. Technically this is generated code,
but it makes use of the existing compiler infrastructure rather than tooling,
the resulting code never needs to be added to version control, and the
definition of the type itself is always unaffected.

[^extensions]: Bilrost's design, like [protobuf][pb]'s, is oriented towards
versioning by introducing new fields to the encoding (and possibly deprecating
old ones) in a way that can still be mutually intelligible by both the old and
new versions of the application.

Non-goals include[^lol]:

* A [self-describing format]#schema-ful-encoding
* The *most* compact or compressible format[^octet-aligned]
* The *fastest* format[^memcpy-fast]

[^lol]: Also a non-goal, not listed here: a small readme :)

[^octet-aligned]: Bilrost is octet-aligned and does not try to save bytes by
stuffing or commingling data between field keys and their values, a practice
which *can* save space but increases complexity and makes distinguished
decoding harder and more prone to mistakes in implementation.

[^memcpy-fast]: Many of the decisions made in Bilrost in order to achieve stable
representation across versions of an evolving schema, extensibility, and general
simplicity sacrifice opportunities for extreme performance. These are deliberate
tradeoffs that often preclude the ability to perform fast & branchless encoding
similar to what is seen in some other encodings, which are often more similar to
directly copying the memory of a struct than to distinctly encoding the value of
each field. In exchange schemas are simpler to describe and more portable and
the encoded data is more durable.

Bilrost at the encoding level is based upon [Protocol Buffers][pb] (protobuf)
and shares many of its traits, but is incompatible. It is in some ways simpler
and less rigid in its specification, and is designed to improve on some of
protobuf's deficiencies. In doing so it breaks wire-compatibility with protobuf.

Bilrost (as a specification) strives to provide a superset of the capabilities
of protocol buffers while reducing some of the surface area for mistakes and
surprises; `bilrost` (the implementing library) strives to provide access to
all of those capabilities with maximum convenience.

`bilrost` is implemented for the [Rust Language][rs]. It is a direct fork of
[`prost`][p], and shares many of its performance characteristics. (It is not the
fastest possible encoding library, but it is still pretty fast and comes with
unique advantages.) Like `prost`, `bilrost` can enable writing simple, idiomatic
Rust code with `derive` macros that serialize and deserialize structs from
binary data. Unlike `prost`, `bilrost` is free from most of the constraints of
the protobuf ecosystem and required semantics of protobuf message types.
Bilrost (the specification) and this library allow much wider compatibility with
existing struct types and their normal semantics. Rather than relying on
producing generated code from a protobuf `.proto` schema definition, `bilrost`
is designed to be easily used "by hand," as a pure enhancement to types the user
would already have written rather than as a system that railroads the user into
using opinionated and specialized struct types designed only for encoding and
decoding.

🌈

[pb]: https://developers.google.com/protocol-buffers/

[rs]: https://www.rust-lang.org/

[p]: https://github.com/tokio-rs/prost

## Contents

- [Quick start]#getting-started
    - [Using the derive macros]#deriving-message
      - [Special attributes]#other-attributes
    - [Encoding and decoding]#encoding-and-decoding-messages
      - [Decoding distinguished canonical data]#decoding-in-distinguished-mode
      - [Borrowed decoding]#borrowed-messages
      - [Using via trait-objects]#using-dyn-with-message-traits
      - [Self-referential borrowing with `yoke` for enormous speed + portable
        structs](#recipe-for-making-borrowed-messages-portable)
    - [`no_std` support]#no_std-support
    - [Changelog]./CHANGELOG.md ([on github][ghchangelog])
- [Differences from `prost`]#bilrost-vs-prost
- [Differences from Protobuf]#differences-from-protobuf
    - [Distinguished representation of data]#distinguished-decoding and [how
      this is achieved](#distinguished-representation-on-the-wire-in-bilrost)
- [Compared to other encodings, distinguished and not]
  #comparisons-to-other-encodings
- [Why use Bilrost?]#strengths-aims-and-advantages
- [Why *not* use Bilrost?]#what-bilrost-and-the-library-wont-do
- [How does it work?]#conceptual-overview
    - [How *exactly* does it work?]#encoding-specification
- [License & copyright]#license

[ghchangelog]: https://github.com/mumbleskates/bilrost/blob/bilrost/CHANGELOG.md

*This readme is the result of a lot of work, and we want it to be good! If
anything is unclear or could be improved, please feel free to submit issues or
pull requests!*

## Conceptual overview

Bilrost is an encoding scheme for converting in-memory data structs into plain
byte strings and vice versa. It's generally suitable for both network transport
and data retained over the long-term. Its encoded data is not human-readable,
but it is encoded quite simply. It supports integral and floating point numbers,
strings and byte strings, nested messages, and [recursively](
#writing-recursive-messages) nested messages. All of the above are supported as
optional values, repeated values, sets of unique values, and key/value mappings
where sensible. With appropriate choices of encodings (which determine the
representation), most of these constructs can be nested almost arbitrarily.

Encoded Bilrost data does not include the names of its fields; they are instead
assigned numbers agreed upon in advance by the message schema that specifies it.
This can make the data much more compact than "schemaless" encodings like JSON,
CBOR, etc., without sacrificing its extensibility: new fields can be added, and
old fields removed, without necessarily breaking backwards compatibility with
older versions of the encoding program. In the typical "relaxed" decoding mode,
any field not in the message schema is ignored when decoding, so if fields are
added or removed over time the fields that remain in common will still be
mutually intelligible between the two versions of the schema. In this way,
Bilrost is very similar to [protobuf][pb]. See also: [Design philosophy](
#design-philosophy), [Comparisons to other encodings]
#comparisons-to-other-encodings, and the [Encoding specification]
#encoding-specification.

Bilrost also has the ability to encode and decode data that is guaranteed to be
canonically represented: see the section on [distinguished decoding](
#distinguished-decoding).

### Design philosophy

Bilrost is designed to be an encoding format that is simple to specify, simple
to implement, simple to port across languages and machines, and easy to use
correctly.

#### Schema-ful encoding

It is designed as a data model that has a schema, though it can of course also
be used to encode representations of "schemaless" data. There are advantages and
disadvantages to this form. The encoded data is significantly smaller, since
repetitive names of fields are replaced with surrogate numbers. At the same
time, it may be less clear what the data means because the inherent
documentation of the fields' names is missing. Schemaless encodings like JSON
can be decoded and accessed dynamically as pure data with far simpler, unified
decoder implementations, whereas encodings like Bilrost and protobuf require a
schema to even be sure of the values.

One argument is that even if fields' names are all specified in the encoding,
they are merely low-information documentation that aids *guessing* or
reverse-engineering. They can help diagnose where *lost* data belongs, or what
*mystery* data means by lightly self-documenting, but the *meaning* of the data
is still determined by the code that emitted it. Data has meaning based on where
it is found, and the documentation of that meaning cannot be fully replaced by
simply including the names of all the fields in the data.

Once that argument is conceded and a project is committed to maintaining schemas
for its encoded data, there are no further distinct disadvantages. Numeric field
tags should not be reused after they are deprecated, but neither should field
names in a schemaless encoding.

Perhaps the biggest caveat is the simultaneous invention problem. If multiple
parties were to implement extensions without communicating with each other they
may choose the same tags, which would cause conflicts in the meaning of those
fields. Sequential numeric tags are *more likely* to be chosen in conflict by
both parties than names would be. The best way to resolve this is to plan ahead
for extensions and encourage potential collaborators to synchronize and choose
allocated tags from some range reserved for extensions, or provide space for
extensions within the schema that have names or UUIDs.

#### Non-coercion of data

Bilrost aims to ensure that when a message is decoded without error, all the
recognized values in its schema will have the exact value they were encoded
with. This means that:

* For boolean fields, 0 represents `false` and 1 represents `true`; if the value
  2 is encountered, this is always an error.
* For numeric fields, out-of-range values are never truncated to fit in a
  smaller numeric type.
* In `bilrost` (this Rust library), floating point values always round trip with
  the *precise* bits of their representation. NaN bits and -0.0 are always
  preserved.
* If an key appears in a mapping multiple times, the whole message is considered
  invalid; likewise for values in sets. There should be no room for alternate
  interpretations of data that keep only the first or last such entry, or that
  discard information about a set with repeated elements.

Bilrost does not enforce these same constraints for unknown field data; if
fields with tags not present in the schema are found in data, it will not be
considered canonical but decoding may succeed. Because those fields are
discarded, they are also not being coerced into different values so the promise
holds.

#### Designed for canonicity

Bilrost is designed to make several classes of non-canonical states
unrepresentable, making detection of non-canonical data far less complex.

The biggest change is that message fields encoded out of order are
unrepresentable; in protobuf this has long been an observed behavior for most
message types, but has never been *promised* for a few reasons that are less
relevant here (and are [discussed below](#differences-from-protobuf)). This
increases the complexity of *encoding* the data *only* when a "oneof" (set of
mutually exclusive fields) has tag numbers that may appear in different places
in the ordering of a message's fields; in practice this is quite rare.

The smaller change is that the [varint representation](
#varints-leb128-bijective-encoding) that makes up the core of the encoding is
designed to guarantee that there can only be a single representation for any
given number. This may be marginally more expensive than traditional
[LEB128][leb128] varints, but not by as much as one might think; rapid decoding
of LEB128 varints is [quite complex][vectorleb128], and the biggest optimization
for most varints is to take a shortcut when the value is small enough to fit in
one byte, the range in which Bilrost's varints encode identically.

[leb128]: https://en.wikipedia.org/wiki/LEB128

[vectorleb128]: https://arxiv.org/pdf/1503.07387.pdf

### Distinguished decoding

In some applications, it's desirable to be able to encode a message in a
guaranteed-canonical form, and to be able to decode that message type while
*distinguishing* between canonical and non-canonical encodings. Bilrost can
provide this, and does so with less complexity and overhead than [many other
encodings](#comparisons-to-other-encodings).

It is possible in `bilrost` to derive extended decoding traits which provide
distinguished and canonical decoding. Decoding in distinguished mode
comes with an additional canonicity check: the decoding result makes it possible
to know whether the decoded message data was canonical. Any message type that
*can* implement distinguished decoding *will* always encode in its fully
canonical form; there is not an alternate encoding mode that is "more
canonical".

Formally, when a message type implements distinguished decoding, values of the
message type are *bijective* to a subset of all byte strings, each of which is
considered to be a canonical encoding for that message value. Each different
possible byte string canonically decodes to a message value that is distinct
from the message values decoded from every other such byte string, or will
produce an error or non-canonical result when decoded in this mode. If a message
is successfully and canonically decoded from a byte string with a distinguished
message decoding trait, is not modified, and is then re-encoded, it will emit
the exact same byte string.

The best proxy of this expectation of an [equivalence relation][equiv] in Rust
is the [`Eq`][eq] trait, which denotes that there is an equivalence relation
between all values of any type that implements it. Therefore, this trait is
required of all field and message types in order to implement distinguished
decoding in `bilrost`.

For this reason, `bilrost` will refuse to derive distinguished decoding if there
are any ignored fields, as they may also participate in the type's equality.

`bilrost` distinguishes between canonical values of the type in a way that
matches the automatically derived implementation of `Eq` (that is, it matches
based on the `Eq` trait of each constituent field). It is ***strongly
recommended,*** but not required, that the equality traits be derived
automatically. `bilrost` does not directly rely on the implementation of the
type's equality at all; rather, it acts as a contractual guardrail, setting a
minimum expectation.

[equiv]: https://en.wikipedia.org/wiki/Equivalence_relation

[eq]: https://doc.rust-lang.org/std/cmp/trait.Eq.html

Normal ("relaxed") decoding may accept other byte strings as valid encodings of
a given value, such as encodings that contain unknown fields or non-canonically
encoded values[^noncanon]. Most of the time, this is what is desired.

[^noncanon]: "Non-canonical" value encodings in Bilrost principally include
fields that are represented in the encoding even though their value is
considered empty. For message types, such as nested messages, it also includes
the message representation containing fields with unknown tags.

To support this "exactly 1:1" expectation for distinguished messages, certain
types are forbidden and not implemented in disinguished mode, even though they
theoretically could be. This primarily includes floating point numbers, which
have incompatible equality semantics. In the Bilrost encoding, floating point
numbers are represented in their standard [IEEE 754][ieee754] binary format
standard to most computers today. This comes with particular rules for equality
semantics that are generally uniform across all languages, and which don't form
an equivalence relation. "NaN" values are never equal to each other or to
themselves.

[ieee754]: https://en.wikipedia.org/wiki/IEEE_754

#### Canonical order and distinguished representation

Bilrost specifies most of what is required to make these message schemas
portable not just across architectures and programs, but to other programming
languages as well. There is currently one minor caveat: The *sort order* of
values in Bilrost may matter.

In distinguished decoding mode, canonical data must always be represented with
*sets* and *maps* having their items in sorted order. When the item type of a
set (or the key type of a map) is not a simple type with an already-standardized
sorting order (such as an integer or string), the canonical order of the items
depends on that type's implementation, and care must be taken to standardize
that order in addition to the schema of the message's fields when defining
distinguished types.

#### Floating point values and distinguished decoding

Equivalence relations are also not quite sufficient to describe the desired
properties of a distinguished type in Bilrost, either; not only must the values
*themselves* be considered equivalent, they must also *encode* to the same
bytes. When encoding and decoding floating point values, `bilrost` takes care to
preserve even the distinction between +0.0 and -0.0, which are considered to be
equal to each other in IEEE 754; this [has been a problem][protonegzero] for
other encodings in the past. Even if it is not always necessary, when a value is
encoded in `bilrost`, decoding that value again is guaranteed to produce the
same value with the exact same bits.

[protonegzero]: https://github.com/protocolbuffers/protobuf/issues/7062

For this reason it is not yet considered a good idea to implement distinguished
decoding for third-party wrappers for Rust's floating point types that implement
[`Eq`][eq] and [`Ord`][ord] (such as [`ordered_float`][ordered_float] and
[`decorum`][decorum]) because they still consider some sets of values that have
*different bits* to be equal. Any future implementation of such a type would
have to take special care to unify the encoded representation of any equivalence
classes in these types *and standardize this in a portable way*, which also
de facto induces some data loss when round tripping. It is not guaranteed this
will ever be considered worthwhile or implemented.

[ord]: https://doc.rust-lang.org/std/cmp/trait.Ord.html

[ordered_float]: https://docs.rs/ordered-float/latest/ordered_float/

[decorum]: https://docs.rs/decorum/latest/decorum/

**If it is desirable to have a distinguished encoding for the bit-wise
representations of a floating point value**, it should first be cast to its bits
as an unsigned integer and encoded that way. This reduces the surface area for
mistakes, and makes it clearer that floating point numbers need special handling
in code that cares very much about distinguished representations.

## Using the library

### Getting started

To use `bilrost`, we first add it as a dependency in `Cargo.toml`, either with
`cargo add bilrost` or manually:

```toml
bilrost = "0.1014"
```

Then, we derive `bilrost::Message` for our struct type:

```rust,
use bilrost::{Message, OwnedMessage};

#[derive(Debug, PartialEq, Message)]
struct BucketFile {
    name: String,
    shared: bool,
    storage_key: String,
}

let foo_file = BucketFile {
    name: "foo.txt".to_string(),
    shared: true,
    storage_key: "public/foo.txt".to_string(),
};

// Encoding data is simple.
let encoded = foo_file.encode_to_vec();
// The encoded data is compact, but not very human-readable.
assert_eq!(encoded, b"\x05\x07foo.txt\x04\x01\x05\x0epublic/foo.txt");

// Decoding data is likewise simple!
let decoded = BucketFile::decode(encoded.as_slice()).unwrap();
assert_eq!(foo_file, decoded);
```

Later, more fields can be added to that same struct and it will still decode the
same data.

```rust,
# use bilrost::{Message, OwnedMessage};
#[derive(Debug, Default, PartialEq, Message)]
struct BucketFile {
    #[bilrost(1)]
    name: String,
    #[bilrost(5)]
    mime_type: Option<String>,
    #[bilrost(6)]
    size: Option<u64>,
    #[bilrost(2)]
    shared: bool,
    #[bilrost(3)]
    storage_key: String,
    #[bilrost(4)]
    bucket_name: String,
}

let new_file = BucketFile::decode(
    b"\x05\x07foo.txt\x04\x01\x05\x0epublic/foo.txt".as_slice(),
)
.unwrap();
assert_eq!(
    new_file,
    BucketFile {
        name: "foo.txt".to_string(),
        shared: true,
        storage_key: "public/foo.txt".to_string(),
        ..Default::default()
    }
);
```

#### Crate features

The `bilrost` crate has several optional features:

* "std" (default): provides support for [`HashMap`][hashmap],
  [`HashSet`][hashset], and [`SystemTime`][stdsystemtime].
* "derive" (default): includes the `bilrost-derive` crate and re-exports its
  derive macros. It's unlikely this should ever be disabled if `bilrost` is used
  normally.
* "detailed-errors" (default): the decode error type returned by messages will
  have more information on the path to the exact field in the decoded data that
  encountered an error. With this disabled errors are more opaque, but may be
  smaller and faster.
* "auto-optimize" (default): makes some automatic choices about some
  performance-related implementation details. The related features can be useful
  controls for profiling and experimentation, and are documented in
  `Cargo.toml`. Most use cases should leave this feature enabled.
* "no-recursion-limit": removes the recursion limit designed to keep data from
  nesting too deeply.
* "extended-diagnostics": with a small added dependency, attempts to provide
  better compile-time diagnostics when derives and derived implementations don't
  work. Somewhat experimental.
* "arrayvec": provides first-party support for [`arrayvec::ArrayVec`][arrayvec]
* "bstr": provides first-party support for [`bstr::BString`][bstr]
* "bytestring": provides first-party support for
  [`bytestring::Bytestring`][bytestring]
* "chrono": provides first-party support for the following `chrono` types:
    * [`NaiveDate`][chrononaivedate]
    * [`NaiveTime`][chrononaivetime]
    * [`NaiveDateTime`][chrononaivedatetime]
    * [`Utc`][chronoutc], [`FixedOffset`][chronofixedoffset], and
      [`DateTime<Tz>`][chronodatetime] with either of those timezone
      types
    * [`TimeDelta`][chronotimedelta]
* "hashbrown": provides first-party support for `hashbrown` types
  [`HashMap`][hbmap] and [`HashSet`][hbset]
* "smallvec": provides first-party support for [`smallvec::SmallVec`][smallvec]
* "thin-vec": provides first-party support for [`thin_vec::ThinVec`][thinvec]
* "time": provides first-party support for the following `time` types:
    * [`Date`][timedate]
    * [`Time`][timetime]
    * [`PrimitiveDateTime`][timeprimitivedatetime]
    * [`UtcOffset`][timeutcoffset]
    * [`OffsetDateTime`][timeoffsetdatetime]
    * [`Duration`][timeduration]
* "tinyvec": provides first-party support for `tinyvec` types
  [`ArrayVec`][tinyarrayvec] and [`TinyVec`][tinyvec]

#### `no_std` support

With the "std" feature disabled, `bilrost` has full `no_std` support.
`no_std`-compatible hash-maps are still available if desired by enabling the
"hashbrown" feature.

To enable `no_std` support, disable the `std` features in `bilrost` (and
`bilrost-types`, if it is used):

```toml
[dependencies]
bilrost = { version = "0.1014", default-features = false, features = ["derive"] }
```

### Derive macros

We can now import and use its traits and derive macros. The main three are:

* [`Message`]#deriving-message: This is the basic working unit. Derive this
  for structs to enable encoding and decoding them to and from binary data.
* [`Enumeration`]#enumerations: This is a derive only, not a trait, which
  implements support for encoding an enum type with `bilrost`. The enum must
  have no fields, and each of its variants will correspond to a different `u32`
  value that will represent it in the encoding.
* [`Oneof`]#oneof-fields: This is a trait and derive macro for enumerations
  representing mutually exclusive fields within a message struct. Each variant
  will be represented by one field, and each variant must have a unique field
  tag assigned to it, *both* within the oneof and within the message of which it
  is a part. By default oneof variants may only have exactly one field, which
  will be encoded to represent the oneof when it is present. Variants that have
  zero fields or more than one field can also be represented by encoding the
  variant as a sub-message (see the [example section]
  #variants-with-multiple-fields and the [documentation]
  #embedding-messages-in-oneofs about the "message" attribute).

  Types with
  `Oneof` derived do not have `bilrost` APIs useful to library users except when
  they are included in a `Message` struct (or [have `Message` derived
  themselves](#deriving-message-for-enums)).

And then there are the five traits for the different [message encoding and
decoding](#encoding-and-decoding-messages) capabilities:
* `Message`
* `OwnedMessage`
* `BorrowedMessage`
* `DistinguishedOwnedMessage`
* `DistinguishedBorrowedMessage`

#### Deriving `Message`

The `Message` trait can be derived to allow encoding just about any struct as a
Bilrost message, as long as its fields' types are supported.

If not otherwise specified, fields are tagged sequentially in the order they
are specified in the struct. If not specified, structs with named fields have
their fields tagged starting with `1`, and tuple structs with anonymous fields
have their fields numbered starting with `0` (matching their Rust index-names).

Tags can also be explicitly specified. If a field's tag is the only attribute
provided, the number of the tag can be provided with no ceremony as the only
content of the "bilrost" attribute, like `#[bilrost(1)]`. If other attributes
are included, the "tag" attribute must be specified by name; for example, like
`#[bilrost(tag(1), encoding(fixed))]`. The "tag" attribute can also be spelled
`tag = 1` or `tag = "1"`.

We may skip tags which have been reserved, or where there are gaps between
sequentially occurring tag values by specifying the tag number to skip to with
the `tag` attribute on the first field after the gap. The following fields will
be tagged sequentially starting from the next number.

When defining message types for interoperation -- or when fields are likely to
be added, removed, or shuffled -- it may be good practice to explicitly specify
the tags of all fields in a struct instead, but this is not mandatory.

<details><summary>Example of a struct with a derived `Message` impl</summary>

```rust,
use bilrost::{Enumeration, Message};

#[derive(Clone, PartialEq, Message)]
struct Person {
    #[bilrost(tag = 1)]
    pub id: String, //             has tag 1
    // NOTE: Old "name" field has been removed
    // pub name: String,
    #[bilrost(6)]
    pub given_name: String, //     has tag 6
    pub family_name: String,    // has tag 7
    pub formatted_name: String, // has tag 8
    #[bilrost(tag = "3")]
    pub age: u32, //               has tag 3
    pub height: u32,            // has tag 4
    #[bilrost(enumeration(Gender))]
    pub gender: u32, //            has tag 5
    // NOTE: Skip to less commonly occurring fields
    #[bilrost(tag(16))]
    pub name_prefix: String, //    has tag 16  (eg. mr/mrs/ms)
    pub name_suffix: String, //    has tag 17  (eg. jr/esq)
    pub maiden_name: String, //    has tag 18
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, Enumeration)]
#[non_exhaustive]
pub enum Gender {
    Unknown = 0,
    Female = 1,
    Male = 2,
    Nonbinary = 3,
}
```

</details>

#### Oneof fields

Bilrost messages can have sets of mutually exclusive fields, only one of which
may be present at a time. These are represented by `enum` types where each
variant has one field and is assigned a field tag; the `Oneof` derive macro can
then be used to derive an implementation that allow the oneof to be included in
a message.

<details><summary>Example message with a oneof</summary>

```rust
use bilrost::{Message, Oneof};

#[derive(Oneof)]
enum NameOrUUID {
    #[bilrost(2)]
    Name(String),
    #[bilrost(tag(3), encoding(plainbytes))]
    UUID([u8; 16]),
}

#[derive(Message)]
struct Widget {
    #[bilrost(1)]
    id: u32,
    #[bilrost(oneof(2, 3))]
    label: Option<NameOrUUID>,
    #[bilrost(4)]
    description: String,
}
```

</details>

When the oneof is included in a message, it has to be declared with the "oneof"
attribute, providing a comma-separated list of all its field tags. (This
attribute can also be spelled like `oneof = "2, 3"`.)[^tagranges] It isn't
possible for the derive macro to know what those tag numbers are when it runs
because it can't have access to the definitions of the field's type, but the
list of tags declared in this attribute and the list of tags that the oneof
actually has are statically checked for equality at compile time.

<details><summary>Example of a oneof with non-matching tags</summary>

```rust,compile_fail
use bilrost::{Message, Oneof};

#[derive(Oneof)]
enum Abc {
    #[bilrost(1)]
    A(String),
    #[bilrost(2)]
    B(i64),
    #[bilrost(3)]
    C(bool),
}

#[derive(Default, Message)]
struct TagsDontMatch {
    #[bilrost(oneof(1, 2))] // These tags don't match the oneof!
    label: Option<Abc>,
}

// In older versions of rust, the build may not fail until the message trait is
// actually used somewhere.
let _ = TagsDontMatch::default().encoded_len();
```

</details>

[^tagranges]: The way the full list of tags is specified within the `oneof`
attribute and the `reserved_tags` attribute is the same: the whole list is comma
separated, and each item may be either a single tag number or an inclusive range
from minimum to maximum separated with a dash (like `1-5`). For both
`reserved_tags` and `oneof`, the following are all exactly equivalent:
`1, 2, 3, 4, 5`; `1-5`; `4, 5, 1-3`. It's also possible to specify open-ended
ranges, spelled like `10..` and `..=10`.

The field tags in the oneof must be unique, both within the oneof itself and
within any message containing it. On the wire, a oneof works as if there were an
`Option<T>` field for each of its variants, except at most one of them can be
`Some`.

In the example above, the `NameOrUUID` oneof must be nested in an `Option` to
enable it to represent the empty state where none of its fields are present. It
is also possible to include *up to one* unit variant in a oneof enum. Any such
variant will be used to represent its empty state.

<details><summary>Example of a oneof with an "empty" variant</summary>

```rust
use bilrost::{Message, Oneof};

#[derive(Oneof)]
enum NameOrUUID {
    #[bilrost(2)]
    Name(String),
    #[bilrost(tag(3), encoding(plainbytes))]
    UUID {
        octets: [u8; 16],
    },
    Neither,
}

#[derive(Message)]
struct Widget {
    #[bilrost(1)]
    id: u32,
    #[bilrost(oneof(2, 3))]
    label: NameOrUUID,
    #[bilrost(4)]
    description: String,
}
```

</details>

When a oneof enum type has the empty variant, it can only be included in a
message directly; when it has none, it can only be included when it's nested
within an `Option` so that `None` stands for the empty state.

#### Repeated values in Oneof fields

Oneof variants must contain values that encode as a single field on the wire.
This means that collection types like `Vec`, `HashSet`, arrays, etc. must always
be represented in a packed encoding, rather than as the same field repeated for
each value in the collection ("unpacked", which is the default representation in
messages).

This is the same requirement that is needed to make these types re-nest in any
other collection or `Option`; see the notes and table in the section on
[encodings for container types](#containers).

#### Boxing Oneof fields

It's possible to store oneof enums out-of-line from your struct by indirecting
them with `Box`, which is transparent to all oneof traits:

```rust
use bilrost::{Blob, Message, Oneof};

#[derive(Oneof)]
enum Big {
    #[bilrost(tag(1), encoding(packed))]
    Six([String; 6]),
    #[bilrost(tag(2), encoding(packed))]
    HalfADozen([Blob; 6]),
}

#[derive(Message)]
struct Tiny {
    #[bilrost(oneof(1, 2))]
    big_fields: Option<Box<Big>>,
    #[bilrost(3)]
    small_field: i16,
}
```

#### Variants with multiple fields

Using the "message" attribute, enum variants work as if they held a message
struct that looked like the variant.

<details><summary>Example of a oneof derive using embedded messages</summary>

```rust
use bilrost::{Enumeration, Message, Oneof};

#[derive(PartialEq, Eq, Enumeration)]
enum PhoneKind {
    Home = 1,
    Work = 2,
    Cell = 3,
}

#[derive(Oneof)]
enum RolodexInfo {
    #[bilrost(2)]
    Nickname(String),
    #[bilrost(tag(3), message)]
    Address {
        street: Option<String>,
        apt_etc: Option<u64>,
        city: Option<String>,
        state_province: Option<String>,
        postcode: Option<u32>,
    },
    #[bilrost(tag(4), message)]
    Phone(u64, Option<PhoneKind>),
    #[bilrost(tag(5), message)]
    Favorite,
    #[bilrost(empty)]
    Empty,
}

#[derive(Message)]
struct RolodexEntry {
    #[bilrost(1)]
    name: String,
    #[bilrost(oneof(2-5))]
    info: RolodexInfo,
}
```
</details>

Fundamentally, this encodes and decodes data exactly the same as the following
example except with fewer types, allowing the data to be represented
directly in the oneof `enum` if that's desirable.

<details><summary>Example showing an equivalent oneof to the above example, this
time without any embedded messages</summary>

```rust
use bilrost::{Enumeration, Message, Oneof};

#[derive(PartialEq, Eq, Enumeration)]
enum PhoneKind {
  Home = 1,
  Work = 2,
  Cell = 3,
}

#[derive(Message)]
struct Address {
    street: Option<String>,
    apt_etc: Option<u64>,
    city: Option<String>,
    state_province: Option<String>,
    postcode: Option<u32>,
}

// This message encodes the same as the regular tuple `(u64, Option<PhoneKind>)`
// but we have the opportunity to annotate the fields with more attributes,
// changing their tags and encoding etc.
#[derive(Message)]
struct Phone(u64, Option<PhoneKind>);

// This message encodes the same as the empty message type `()`.
#[derive(Message)]
struct Favorite;

#[derive(Oneof)]
enum RolodexInfo {
    #[bilrost(2)]
    Nickname(String),
    #[bilrost(3)]
    Address(Address),
    #[bilrost(4)]
    Phone(Phone),
    #[bilrost(5)]
    Favorite(Favorite),
    #[bilrost(empty)]
    Empty,
}

#[derive(Message)]
struct RolodexEntry {
    #[bilrost(1)]
    name: String,
    #[bilrost(oneof(2-5))]
    info: RolodexInfo,
}
```
</details>

#### Deriving `Message` for enums

`Message` can also be derived for enums that have a corresponding oneof
implementation derived. They encode and decode as messages that only have up to
one field, as if the type was a message that only contains the enum with an
appropriate `#[bilrost(oneof(..))]` attribute.

<details><summary>Example of `Message` derived for a `Oneof` enum</summary>

```rust
use bilrost::{Message, Oneof};

#[derive(Oneof, Message)]
enum Maybe {
    Nope,
    #[bilrost(1)]
    Yes(String),
    #[bilrost(2)]
    Very(String),
}

/// This struct encodes exactly the same as Maybe does with its own `Message`
/// impl; deriving `Message` on the enum just saves some work.
#[derive(Message)]
struct WrappedMaybe {
    #[bilrost(oneof(1, 2))]
    maybe: Maybe,
}
```

</details>

`Message` can only be implemented for oneof types that have "empty" variants.

<details><summary>Examples for using non-empty oneof enums as messages</summary>

```rust,compile_fail
use bilrost::{Message, Oneof};

#[derive(Oneof, Message)]
//              ^^^^^^^ Error: Message can only be derived for Oneof enums
//                             that have an empty variant.
enum AB {
    #[bilrost(1)]
    A(bool),
    #[bilrost(2)]
    B(bool),
}
```

It is still possible to use such an enum as a message type by wrapping it.

```rust
use bilrost::{Message, Oneof};

#[derive(Oneof)]
enum AB {
    #[bilrost(1)]
    A(bool),
    #[bilrost(2)]
    B(bool),
}

#[derive(Message)]
struct WrappedAB(#[bilrost(oneof(1, 2))] Option<AB>);
```

</details>

Note: Do exercise caution with this! While this is very convenient for encoding
types that are fully represented as an enum with one field per variant this way,
deriving both `Oneof` and `Message` makes it easy to accidentally include the
oneof as a sub-message field rather than as an "embedded" oneof that represents
a set of fields in the message that shouldn't coexist.

#### Encodings

`bilrost` message fields and oneof variants can be annotated with an "encoding"
attribute that specifies which encoding type is used when encoding and decoding
that field's value. `bilrost` provides several standard encodings which can be
used and composed to choose how the field is represented.

```rust,
# use bilrost::Message;
#[derive(Message)]
struct Foo {
    #[bilrost(encoding(general))]
    name: String,
}
```

Encoding attributes can be specified two ways, either in the form shown above or
as a string, like `#[bilrost(encoding = "general")]`. The value of this
attribute specifies a type name, using normal Rust type syntax. The standard
encodings are also available and can be addressed explicitly; there is no
practical reason to do this, but as a demonstration:

```rust,
# use bilrost::Message;
#[derive(Message)]
struct Bar(
    // This is the same type as "general"
    #[bilrost(encoding = "::bilrost::encoding::General")] String,
);

assert_eq!(
    Bar("bar".to_string()).encode_to_vec(),
    b"\x01\x03bar".as_slice()
);
```

Where these encodings' type names are evaluated the standard encodings are made
available as aliases, all-lower-cased to ensure that these aliases are unlikely
to collide with other type names that are in scope. These standard aliases are:

* `general`: the default encoding in messages, suitable for most field types.
  Delegates encoding of common collection types (vecs and sets) to `unpacked`
  and common mapping types to `map`.
* `general_packed`: the default encoding for `oneof` variant values and in the
  nested values of fields that are already repeated collections. Identical to
  `general`, except that the common collection types use the `packed` encoding
  instead of `unpacked`.
* `varint`: primitive numeric types and bool, encodes as varint.
* `fixed`: fixed-width four- and eight-byte values for integers, floats, and
  byte arrays.
* `plainbytes`: encodes byte arrays, `Vec<u8>`, and `Cow<[u8]>` as
  length-delimited values. Delegates encoding of `Vec<Vec<u8>>`
  and `Vec<Cow<[u8]>>` to `unpacked<plainbytes>`
* `unpacked` (`unpacked<E = general_packed>`): : encodes collections with their
  values unpacked as zero or more normally encoded fields, one per value. The
  fields are encoded with the parametrized encoding `E`, which defaults to
  `general_packed`. Unpacked representations may encode less efficiently when
  there are more than one or two values, but the representation is also directly
  compatible with that of `Option` values and non-repeated value fields.
* `packed` (`packed<E = general_packed>`): encodes collections with their values
  packed into a single length-delimited value. The values are encoded with the
  parametrized encoding `E`, which defaults to `general_packed`. Packed field
  values are usually more efficient when encoded when there are more than one or
  two values and can represent the difference between an *empty* collection and
  a collection that isn't present at all. But, there are few to no options for
  compatibility between a packed repeated field and any other representation:
  the schema of the field needs to be fully understood for it to be read
  correctly.
* `map<KE = general_packed, VE = general_packed>`: encodes mappings with their
  keys (encoded with parametrized encoding `KE`) and values (encoded with `VE`)
  packed alternating into a single length-delimited value.

It's possible that more standard encodings may be added in the future, but they
will be similarly lower-cased.

#### Other attributes

There are a few other attributes available inside the "bilrost" attribute:

##### Distinguished mode

* **"distinguished"**: When placed on a message or oneof, this [enables
  distinguished decoding](#deriving-distinguished-decoding).

##### Reserving tags

* **"reserved_tags"**: When placed on the message itself, this declares that the
  given tags and tag ranges[^tagranges] are not used in the field. This has no
  effect other than as a compile-time guard; if a field uses a tag that was
  declared to be reserved the compilation will err.

```rust,compile_fail
# use bilrost::Message;
#[derive(Message)]
#[bilrost(reserved_tags(2, 6-10, 25))]
struct Foo {
    #[bilrost(tag(5), encoding(general))]
    name: String,
    age: int64, // Oops! Error: "message Foo field age has reserved tag 6"
}
```

##### Ignoring fields

* **"ignore"**: Must be alone, with no tag or other attribute. This causes the
  field to be ignored by the generated message implementation. If any fields in
  a message are ignored, it must implement `Default` to implement `Message` so
  there will be a value for those fields to take on when the message is created
  from scratch.

  Ignored fields are not currently considered compatible with distinguished
  decoding.

* **"default_per_field"**: If a message has any ignored fields, adding this
  attribute to the message itself removes the requirement that the *whole
  message* needs to implement `Default`; instead, only the types of each ignored
  field need to do so.

##### Marking a oneof variant as explicitly the empty variant

* **"empty"**: While a oneof unit variant will become the empty-state variant of
  the oneof by default, it can also be explicitly marked. This attribute cannot
  be mixed with any other attributes.

##### Embedding messages in oneofs

* **"message"**: When used only with a `tag`, any kind of `enum` variant can be
  represented. Rather than being encoded as if it were a single field bearing
  the inner value, the variant value will be encoded and decoded exactly as if
  it were a message with its own fields.

  Most attributes that apply to fields in a `Message` derive will work on fields
  in a `message` variant. Helper methods (described below) are not available as
  enum variants cannot have their own methods, and ignored fields are always
  initialized per-field. A unit variant (one with no braces or parentheses, and
  thus no fields) can also be encoded as a message, and will always encode and
  decode the same as the `()` empty message type. Such variants have no fields
  and can always be widened by adding some in the future.

##### Helper methods

* **"enumeration"**: If a field is of type `u32` or `Option<u32>`, this causes
  the message type to have helper methods named after the type that get and set
  its value as the enumeration type specified by this attribute.

##### Writing recursive messages

* **"recurses"**: It is possible to nest messages recursively in `bilrost`. If
  they are, the `Message` traits are currently all always disabled because there
  is an unresolvable circular dependency of a message type on its own traits:

```rust,compile_fail
# use bilrost::Message;
#[derive(Message)]
//       ^^^^^^^ overflow evaluating the requirement `Tree: ValueEncoder<General>`
struct Tree {
    name: String,
    children: Vec<Tree>,
}
```

Somewhere along the line, we have to break this circular chain of dependencies.
To do that, annotate one of the fields in the chain with the "recurses"
attribute and its type will no longer participate in the `where` clause of the
message implementations, the cycle will be broken, and the message can be used:

```rust,
# use bilrost::Message;
#[derive(Message)]
struct Tree {
    name: String,
    #[bilrost(recurses)]
    children: Vec<Tree>,
}
```

##### Borrowed-only decoding

* **"borrowed_only"**: [disables]#disabling-owned-decoding-traits derivation
  of owned decoding implementations.

### Deriving distinguished decoding

Deriving distinguished decoding traits for messages and oneofs in addition to
the relaxed decoding traits is very simple: just add a
`#[bilrost(distinguished)]` attribute to the type.

This functionality is not provided by default, as certain common types (like
floating point numbers and hash maps) are not supported in distinguished
decoding.

For canonical encoding guarantees, `bilrost` requires that `Eq` be implemented
for each field, oneof, and message type; the trait is not used directly, but is
trivial to derive for any compatible type.

```rust,
use bilrost::{Message, Oneof};
use std::borrow::Cow;

#[derive(Debug, PartialEq, Eq, Message)]
#[bilrost(distinguished)] // <---- Add this attribute to the type!
struct DistinguishedFoo<'a> {
    #[bilrost(1)]
    bar: i64,
    #[bilrost(2)]
    baz: Cow<'a, str>,
    #[bilrost(oneof(3, 4))]
    designation: Designation<'a>,
}

#[derive(Debug, PartialEq, Eq, Oneof)]
#[bilrost(distinguished)] // <---- Add this attribute to the type!
enum Designation<'a> {
    None,
    #[bilrost(3)]
    Name(Cow<'a, str>),
    #[bilrost(4)]
    Id(u64),
}

let original = DistinguishedFoo {
    bar: 100020003,
    baz: "bear".into(),
    designation: Designation::Id(555),
};

let buf = original.encode_to_vec();
let encoded = buf.as_slice();

// All four decoding mode traits:
use bilrost::{
    BorrowedMessage, DistinguishedBorrowedMessage,
    DistinguishedOwnedMessage, OwnedMessage,
};

// This type now supports every kind of decoding:
assert_eq!(DistinguishedFoo::decode(encoded).as_ref(), Ok(&original));
assert_eq!(
    DistinguishedFoo::decode_canonical(encoded).as_ref(),
    Ok(&original),
);
assert_eq!(
    DistinguishedFoo::decode_borrowed(encoded).as_ref(),
    Ok(&original),
);
assert_eq!(
    DistinguishedFoo::decode_canonical_borrowed(encoded).as_ref(),
    Ok(&original),
);
```

Distinguished decoding traits can be added to any type that *does* or *might*
be supported, and they will be available when possible; for example, when a type
has generic fields. If a struct has fields that cannot ever be distinguished,
trying to implement the traits is currently an error (as implementing a trait
with bounds that are never satisfiable is not allowed today in rust).

### Borrowed messages

It may be desirable to have messages that don't copy all the data they decode.
For data that takes the form `str`, `[u8]`, and `[u8; N]`, `bilrost` can skip
that part. These values can be represented in a message struct as references
that will refer to the original, uncopied data in the slice that was decoded.

When a message or oneof has one of these reference fields, it can no longer
decode owned data from any buffer and won't implement the "owned" message
decoding traits.

```rust,
# use bilrost::{Message, OwnedMessage};
#[derive(Message)]
struct Borrowed<'a> {
    val: &'a str,
    uuid: &'a [u8; 16],
}

static_assertions::assert_not_impl_any!(Borrowed: OwnedMessage);
```

#### `Cow<T>` and messages that can optionally borrow or own

It's also possible to have fields that *optionally* borrow zero-copied data when
decoding, by using [`Cow`][cow]. Borrowed decoding will (promises to) always
produce `Cow::Borrowed` values, and "regular" decoding will always (can only!)
produce `Cow::Owned`:

```rust,
use bilrost::{BorrowedMessage, Message, OwnedMessage};
use std::borrow::Cow;

#[derive(Debug, PartialEq, Message)]
struct Dm<'a> {
    message: Cow<'a, str>,
}

let original = Dm {
    message: "almost done with my chicken".into(),
};
let buf = original.encode_to_vec();
let encoded = buf.as_slice();

let owned = Dm::decode(encoded).unwrap();
assert_eq!(owned, original);
assert!(matches!(owned.message, Cow::Owned(..)));

let borrowed = Dm::decode_borrowed(encoded).unwrap();
assert_eq!(borrowed, original);
assert!(matches!(borrowed.message, Cow::Borrowed(..)));
```

#### Recipe for making borrowed messages portable

Decoding message data into a struct that borrows from its input can be extremely
fast, especially when the input would have otherwise been copied to lots of
allocations -- often more than half of the cost of decoding is allocating
strings. Unfortunately it also means that getting the borrow checker to let you
keep the struct alive can be a struggle.

For many use cases, the [`yoke`][yoke] crate can a huge help here. It allows you
to pair the borrowed struct with anything that keeps the data it borrows alive,
whether that's a `Vec<u8>` or (to enable cloning the resulting `Yoke`) an
`Rc<[u8]>`, `Arc`, `Arc<Vec<u8>>`, or similar.

[yoke]: https://docs.rs/yoke/latest/yoke/

Here's a basic example:

```rust,
use bilrost::{BorrowedMessage, Message};
use yoke::{Yoke, Yokeable};

#[derive(Debug, PartialEq, Message, Yokeable)]
struct OxenFree<'a> {
    n: i32,
    s: &'a str,
}

let buf = b"\x04\xf6\x00\x05\x10Hello from yoke!".to_vec();
let yoke_result = Yoke::<OxenFree, _>::try_attach_to_cart(buf, |b| {
    OxenFree::decode_borrowed(b)
})
.unwrap();

assert_eq!(
    yoke_result.get(),
    &OxenFree {
        n: 123,
        s: "Hello from yoke!",
    }
);

// `yoke_result` is now a value that is not bound by a lifetime!
```

It's not possible to do *anything* you could do with a yoked value that you
could do with a regular struct value (destructuring it doesn't work since you
can typically only get the struct by reference) but this still solves many, many
problems.

#### Disabling owned decoding traits

Normally deriving all the message traits always works even when owned traits are
never available due to a trick of the light (the generic lifetime on the type).
However, if that message has no generic lifetimes, it can be an error to derive
owned message decoding!

```rust,compile_fail
# use bilrost::{BorrowedMessage, Message};
# use std::collections::BTreeMap;
const STATIC_LUTS: &[u8] = &[/* pretend this is include_bytes!'d */];

#[derive(Message)]
//       ^^^^^^^ error: the trait `ValueDecoder<General>` is not implemented
//                      for `BTreeMap<&'static str, &'static str>`
struct LookupTables {
    alpha2: BTreeMap<&'static str, &'static str>,
    alpha3: BTreeMap<&'static str, &'static str>,
}

let luts = LookupTables::decode_borrowed(STATIC_LUTS).unwrap();
```

If this is a problem, deriving owned decoders can be disabled in the derive
macro via the `#[bilrost(borrowed_only)]` attribute:

```rust,
# use bilrost::{Message};
# use std::collections::BTreeMap;
#[derive(Message)]
#[bilrost(borrowed_only)]
struct LookupTables {
    alpha2: BTreeMap<&'static str, &'static str>,
    alpha3: BTreeMap<&'static str, &'static str>,
}
```

### Encoding and decoding messages

There are a variety of methods and associated functions available for encoding
and decoding data in `Message` implementations.

The most straightforward ways to encode and decode a message are
`Message::encode_fast`, `Message::encode_to_vec` and `OwnedMessage::decode`.
Methods are available for encoding and decoding messages to and from several
types and traits, both with and without prefixed length delimiters. (Length
delimiters for encoded messages always take the form of a normal Bilrost varint
which prefixes the message's data.)

Trait `Message`: encoding (implemented by every message)
* `encode_fast`, `encode_length_delimited_fast`: encodes the message into a
  `ReverseBuffer` and returns it. See the section on [that type]#reversebuffer
  for more information. The `..length_delimited..` variant likewise encodes the
  message then also prefixes the encoded data with its length, such that it's
  appropriate to be decoded with the corresponding "length_delimited" decoding
  function.
* `encode_to_vec`, `encode_to_bytes`, and `..length_delimited..` variants:
  encodes the message into a new vec or bytes and returns that container. This
  is not always as efficient as `encode_fast`, but always produces an encoding
  that is contiguous in memory.
* `encode_contiguous` and `encode_length_delimited_contiguous` work exactly the
  same as `encode_fast`, but pre-measure first and reserve the exact size needed
  to store the finished encoding. This guarantees that the resulting buffer will
  be contiguous even if its size is not known ahead of time, and allows direct
  conversion from the resulting `ReverseBuffer` into a `Vec` (see
  `ReverseBuffer::into_vec`).
* `encode`, `encode_length_delimited`: encodes the message into a
  `&mut bytes::BufMut`, appending it after any data that is already there.
* `prepend`: encodes the message into a `&mut bilrost::buf::ReverseBuf`,
  *before* any data that is already there.

Trait `OwnedMessage`: decoding a fully owned message value from any
[`bytes::Buf`][buf]
* `decode`, `decode_length_delimited`: decodes the message type from a
  `bytes::Buf`. The length-delimited version of the call will consume only as
  many bytes as the length delimiter (read from the front of the `Buf`)
  indicates, while the plain version of the method will attempt to decode the
  entire contents.
* `replace_from`, `replace_from_length_delimited`: like `decode`, but rather
  than returning a `Result` with a new instance of the message, these are
  mutating methods that replace the value in an existing instance. If decoding
  fails, the message will be left with its fields [empty]#empty-values.
* There are also `encode_dyn`, `replace_from_slice`, and `replace_from_dyn`
  methods for encoding and decoding that do not provide anything the above
  methods do not, but are callable from a trait object.

Trait `BorrowedMessage<'a>`: decoding by borrowing data from a `&'a [u8]` slice
* `decode_borrowed`, `decode_borrowed_length_delimited`: decodes the message
  type from a byte slice. The length delimited version of the call accepts a
  `&mut &'a [u8]` and after returning will have consumed the bytes that
  encoded the message from the front of the slice, leaving only the left-over
  data (if any); the versions that are not length-delineated consume the entire
  slice by value.
* `replace_borrowed_from`, `replace_borrowed_from_length_delimited`: exactly
  what you would expect based on `replace_from` and `decode_borrowed` -- this
  replaces the value in-place as a mutating method, and is dyn-compatible.

#### Decoding in distinguished mode

The `DistinguishedOwnedMessage` and `DistinguishedBorrowedMessage` traits have
corresponding methods for decoding and replacing in three related modes:

* "distinguished" mode: Decoding succeeds whenever the encoding is valid even if
  it is not canonical, and extra information is returned indicating whether any
  known fields had non-canonical representations or any unknown fields were
  present
* "canonical" mode: Decoding fails and returns an error immediately whenever the
  encoding is not completely canonical. Returned values are guaranteed to be
  fully canonical on success.
* "restricted" mode: given encoded data and a minimum canonicity to restrict
  decoding to, decoding will stop and return an error immediately if something
  in the encoding is less canonical than the specified restriction. On success,
  extra information is returned indicating the canonicity just as in
  "distinguished" mode. The possible restriction levels are the variants of
  `Canonicity`:
  * `NotCanonical`: Exactly the same as "distinguished" mode
  * `HasExtensions`: Only fails if known fields are found to be encoded
    non-canonically
  * `Canonical`: Fails if any unknown fields are present or any fields are
    encoded non-canonically; the returned canonicity data will always be
    `Canonical`

#### Canonicity information

In "distinguished" and "restricted" modes, instead of
returning `Result<(), DecodeError>` or `Result<Foo, DecodeError>`, decoding
methods return types like `Result<Canonicity, DecodeError>` or
`Result<(Foo, Canonicity), DecodeError>`. `Canonicity` is a simple enum that
indicates whether the decoded data was `Canonical`, `HasExtensions`, or is
`NotCanonical`:

* `Canonical` means the decoded data is the only data that could have
  canonically decoded to this value, and that if the value or another value that
  equals it is encoded that encoding will have exactly matching bytes.
* `HasExtensions` means the fields that are known to the decoding process were
  encoded correctly, but other fields existed that have no corresponding struct
  destination and were discarded during decoding. These might be from later (or
  perhaps earlier) versions of the same program.
* `NotCanonical` means there were fields encoded non-canonically: when
  they are re-encoded they would be encoded differently. There are many
  different byte strings that can produce the same decoded value, but only one
  of them can be canonical.

The `bilrost::WithCanonicity` trait is made available to unwrap values and
results that have canonicity information:

* `.canonical()`: Converts to an error if not fully canonical, otherwise unwraps
* `.canonical_with_extensions()`: Converts to an error if any *known* fields
  were not canonical, otherwise unwraps
* `.value()`: Always unwraps, discarding the canonicity information.

This trait is implemented for `Canonicity` itself, `(T, Canonicity)`, `Result`
types where the value implements `WithCanonicity` and the error is convertible
to `DecodeErrorKind`, and corresponding references/[`.as_ref()`][resref] types.
The error in the returned result types is `DecodeErrorKind`, which discards any
"detailed-errors" information that would have indicated which field a decode
error occurred in; if that information is needed, check the decoding error
before the canonicity error.

[resref]: https://doc.rust-lang.org/std/result/enum.Result.html#method.as_ref

#### Using `dyn` with message traits

The `Message` trait and the four decoding traits are [dyn-compatible][dyncompat]
(a term formerly phrased ["object-safe"][objsafe]) and can be used via
[trait objects][traitobj]. All of their functionality (except the `decode`
methods for creating a message value from data *ex nihilo*) is available via
dyn-compatible alternatives. Messages can be cleared (reset to empty values);
measured for their encoded byte length; encoded to
[`ReverseBuffer`](#reversebuffer), [`Vec<u8>`][vec], [`Bytes`][bytes], or into a
[`&mut dyn BufMut`][bufmut], or decoded (replacing the value). Owned messages
can be replaced from [`&[u8]` slice][slice] or a [`&mut dyn Buf`][buf], while
borrowed messages can only be replaced from a slice.

[buf]: https://docs.rs/bytes/latest/bytes/buf/trait.Buf.html

[bufmut]: https://docs.rs/bytes/latest/bytes/buf/trait.BufMut.html

[slice]: https://doc.rust-lang.org/std/primitive.slice.html

[dyncompat]: https://doc.rust-lang.org/reference/items/traits.html#dyn-compatibility

[objsafe]: https://doc.rust-lang.org/reference/items/traits.html#object-safety

[traitobj]: https://doc.rust-lang.org/reference/types/trait-object.html

Methods that decode to or from trait object buffers are likely to be less
efficient than their generic, non-dyn-compatible counterparts; it is preferable
to use `encode(..)` rather than `encode_dyn(..)`, and likewise for any other
"`_dyn`" method. Likewise, `replace_from_slice(..)` is equivalent to
`replace_from(..)`, just compatible with `dyn`; the same goes for other
"`_slice`" methods.

### Supporting types and traits

Because nested values in Bilrost must have a known encoded length before they
are written (just like protobuf), if a message has many levels of nesting the
size of that innermost message must be known to encode each and every message
that contains it. If the encoded data is being written from beginning to end,
this means one of the following:

1. Checking the encoded length of each message struct before it is encoded
    * This is very simple and quite fast in the usual case where there is no
      nesting.
    * If a message with 100 levels of nesting is encoded, this means measuring
      the encoded length of each nested message about 5,000 extra times.
    * This is the choice made by `prost`, the original upstream of this library.
2. Caching the length of each message permanently within its struct and taking
   care to invalidate that cache every time it is updated
    * Most protobuf libraries choose this option, but it involves adding extra
      fields to each message struct and forces extra logic whenever the struct's
      fields are modified. This becomes very intrusive and is one of the major
      reasons that protobuf structs often fit in so poorly with the rest of the
      program.
3. Caching the length of each part of the message in a single pass before any
   writing begins
    * [At one point][protobuf-rs-comparison] `rust-protobuf` did this. It avoids
      both the quadratic cost of option 1 and the intrusive nature of option 2,
      at the cost of some speed.

[protobuf-rs-comparison]: https://github.com/stepancheg/rust-protobuf/tree/16c9dc509267a6673f29563f9a01cc3026cc2144/protobuf-examples/vs-prost

`bilrost` goes for a fourth option: Rather than encoding in the forwards
direction and doing tricks to determine the length of values that will be
written in the future, the encoding can be constructed backwards. Any nested
data that needs to be prefixed with its length will already be encoded by the
time its length needs to be known, and the whole nested message can be encoded
in a single pass.

Performance varies between forwards encoding (`encode`) and backwards encoding
(`prepend`), depending on the nature of the messages being encoded. In some
cases backwards encoding will be slightly slower, and in some cases it will be
dramatically faster; both options are made available.

#### `ReverseBuf`

`bilrost::buf::ReverseBuf` is a trait corresponding to `bytes::BufMut` which
works in almost all the same ways, except chunks of bytes that are written to it
are added *before* the data already in the buffer, rather than after it. This
can make writing length-delimited encodings such as Bilrost significantly more
efficient to write, especially as messages contain more fields and nest more
deeply.

`ReverseBuf` declares `bytes::Buf` as a supertrait, so any value of this type
can be consumed as a buffer.

#### `ReverseBuffer`

`bilrost::buf::ReverseBuffer` is the main provided implementation of the
`ReverseBuf` trait. It has amenities for reserving capacity, fetching the whole
buffer as a slice if it's contiguous in memory, and has the method
`buf_reader()` which returns a read-only view of the buffer that also implements
`bytes::Buf` but does not cause the buffer to be consumed when it is read
through that trait.

`ReverseBuffer` allocates lazily, grows exponentially, and stores its data in
multiple allocations of increasing size. It is often the most efficient type
to encode a `bilrost` message into, and it can be efficiently read and copied
out as a `bytes::Buf` the same as the other options (`Vec` and `Bytes`).

`ReverseBuffer` can be converted directly into a `Vec<u8>` with the `into_vec`
method; this method will copy the content if necessary, although if possible (if
the buffer is one fully-initialized slice) the buffer will be directly converted
without copying the data.

Both `ReverseBuffer` and `ReverseBufReader` also provide a `slices` method which
allows iterating over the slices in the buffer for vectored writing.

### Encoding and decoding example

```rust,
use bilrost::{DistinguishedOwnedMessage, Message, Oneof};
use bytes::Bytes;
use std::collections::BTreeMap;

#[derive(Debug, PartialEq, Eq, Oneof)]
#[bilrost(distinguished)]
enum PubKeyMaterial {
    Empty,
    #[bilrost(1)]
    Rsa(Bytes),
    #[bilrost(2)]
    ED25519(Bytes),
}

use PubKeyMaterial::*;

#[derive(Debug, PartialEq, Eq, Message)]
#[bilrost(distinguished)]
struct PubKey {
    #[bilrost(oneof(1, 2))]
    key: PubKeyMaterial,
    #[bilrost(3)]
    expiry: i64, // See also: `bilrost_types::Timestamp`
}

#[derive(Debug, Default, PartialEq, Eq, Message)]
#[bilrost(distinguished)]
struct PubKeyRegistry {
    keys_by_owner: BTreeMap<String, PubKey>,
}

let mut registry = PubKeyRegistry::default();
registry.keys_by_owner.insert(
    "Alice".to_string(),
    PubKey {
        key: ED25519(Bytes::from_static(b"not a secret")),
        expiry: 1600999999,
    },
);
registry.keys_by_owner.insert(
    "Bob".to_string(),
    PubKey {
        key: Rsa(Bytes::from_static(b"pkey")),
        expiry: 1500000001,
    },
);
let encoded = registry.encode_to_vec();

// The binary of this encoded message breaks down as follows:
//
// (The first and only field, containing a map from String to PubKey)
// 05 - field key: tag 0+1 = 1, wire type 1 = length-delimited
//   2c - length: 44 bytes
//     (The key of the first map item, a String value)
//     05 - length: 5 bytes
//       "Alice"
//     (The value of the first map item, a PubKey message)
//     14 - length: 20 bytes
//       (The "ED25519" variant of the PubKeyMaterial oneof)
//       09 - field key: tag 0+2 = 2, wire type 1 = length-delimited
//         (A String value)
//         0c - length: 12 bytes
//           "not a secret"
//       (The "expiry" field of the PubKey message, an i64)
//       04 - field key: tag 2+1 = 3, wire type 0 = varint
//         fec7e9f50a - varint 3201999998, which is +1600999999 in zig-zag
//     (The key of the second map item, a string value)
//     03 - length: 3 bytes
//       "Bob"
//     (The value of the second map item, another PubKey message)
//     0c - length: 12 bytes
//       (The "RSA" variant of the PubKeyMaterial oneof)
//       05 - field key: tag 0+1 = 1, wire type 1 = length-delimited
//         (A String value)
//         04 - length: 4 bytes
//           "pkey"
//       (The "expiry" field of the PubKey message, an i64)
//       08 - field key: tag 1+2 = 3, wire type 0 = varint
//         82bbc0950a - varint 3000000002, which is +1500000001 in zig-zag

assert_eq!(
    encoded,
    b"\x05\x2c\
      \x05Alice\x14\x09\x0cnot a secret\x04\xfe\xc7\xe9\xf5\x0a\
      \x03Bob\x0c\x05\x04pkey\x08\x82\xbb\xc0\x95\x0a"
        .as_slice()
);

let decoded = PubKeyRegistry::decode_canonical(encoded.as_slice());
assert_eq!(decoded, Ok(registry));
```

### Supported message field types

`bilrost` structs can encode fields with a wide variety of types ("general
encodings" refers to `general` & `general_packed`):

| Encoding                       | Value type                                             | Encoded representation | Distinguished      |
|--------------------------------|--------------------------------------------------------|------------------------|--------------------|
| general encodings & `fixed`    | [`f32`][prim]                                          | fixed-size 32 bits     | no                 |
| `fixed`                        | [`u32`][prim], [`i32`][prim]                           | fixed-size 32 bits     | yes                |
| `fixed`                        | [`NonZeroU32`][nonzero], [`NonZeroI32`][nonzero]       | fixed-size 32 bits     | yes                |
| general encodings & `fixed`    | [`f64`][prim]                                          | fixed-size 64 bits     | no                 |
| `fixed`                        | [`u64`][prim], [`i64`][prim]                           | fixed-size 64 bits     | yes                |
| `fixed`                        | [`NonZeroU64`][nonzero], [`NonZeroI64`][nonzero]       | fixed-size 64 bits     | yes                |
| general encodings & `varint`   | [`u64`][prim], [`u32`][prim], [`u16`][prim]            | varint                 | yes                |
| general encodings & `varint`   | [`i64`][prim], [`i32`][prim], [`i16`][prim]            | varint                 | yes                |
| general encodings & `varint`   | [`usize`][prim], [`isize`][prim]                       | varint                 | yes                |
| general encodings & `varint`   | [`bool`][prim]                                         | varint                 | yes                |
| general encodings & `varint`   | all [`NonZero`][nonzero] numeric types                 | varint                 | yes                |
| general encodings              | derived [`Enumeration`]#enumerations[^enum]          | varint                 | yes                |
| general encodings              | [`String`][str]*                                       | length-delimited       | yes                |
| general encodings              | impl [`Message`]#derive-macros[^boxmsg]              | length-delimited       | maybe              |
| `varint`                       | [`u8`][prim], [`i8`][prim]                             | varint                 | yes                |
| `plainbytes`                   | [`Vec<u8>`][vec]*                                      | length-delimited       | yes                |
| [`(E1, E2, ... EN)`]#tuples  | [`(T1, T2, ... TN)`][tuple]                            | length-delimited       | when each field is |
| general encodings & `(E1, E2)` | [`Range<T>`][range], [`RangeInclusive<T>`][range_incl] | length-delimited       | when `T` is        |

*Alternative types are available! See below.

[^enum]: `Enumeration` types can be directly included if they have a value that
has a Bilrost representation of zero (represented as exactly the expression `0`
either via a `#[bilrost(0)]` attribute or, absent an attribute, via a normal
discriminant value). Otherwise, enumeration types must always be nested.

[^boxmsg]: `Message` types inside [`Box`][box] still impl `Message`, with a
covering impl; message types [can nest recursively](#writing-recursive-messages)
this way.

With the relevant crate features enabled there is built in support for certain
additional types as well, each supported by the general encodings:

| Value type                                         | Empty value                            | Distinguished | Required feature |
|----------------------------------------------------|----------------------------------------|---------------|------------------|
| [`core::time::Duration`][coreduration]             | zero duration                          | yes           | (none)           |
| [`std::time::SystemTime`][stdsystemtime]           | `UNIX_EPOCH` (1970-01-01 00:00:00 UTC) | no            | "std"            |
| [`chrono::NaiveDate`][chrononaivedate]             | 0000-01-01                             | yes           | "chrono"         |
| [`chrono::NaiveTime`][chrononaivetime]             | 00:00:00                               | yes           | "chrono"         |
| [`chrono::NaiveDateTime`][chrononaivedatetime]     | 0000-01-01 00:00:00                    | yes           | "chrono"         |
| [`chrono::Utc`][chronoutc]                         | Utc                                    | yes           | "chrono"         |
| [`chrono::FixedOffset`][chronofixedoffset]         | UTC+00:00                              | yes           | "chrono"         |
| [`chrono::DateTime<Tz>`][chronodatetime]*          | 0000-01-01 00:00:00 +00:00             | yes           | "chrono"         |
| [`chrono::TimeDelta`][chronotimedelta]             | zero duration                          | yes           | "chrono"         |
| [`time::Date`][timedate]                           | 0000-01-01                             | yes           | "time"           |
| [`time::Time`][timetime]                           | 00:00:00                               | yes           | "time"           |
| [`time::PrimitiveDateTime`][timeprimitivedatetime] | 0000-01-01 00:00:00                    | yes           | "time"           |
| [`time::UtcOffset`][timeutcoffset]                 | UTC+00:00                              | yes           | "time"           |
| [`time::OffsetDateTime`][timeoffsetdatetime]       | 0000-01-01 00:00:00 +00:00             | yes           | "time"           |
| [`time::Duration`][timeduration]                   | zero duration                          | yes           | "time"           |

*`chrono::DateTime<Tz>` is supported whenever `Tz::Offset` is supported by the
encodings. Currently this means `Utc` and `FixedOffset`.

[coreduration]: https://doc.rust-lang.org/core/time/struct.Duration.html

[stdsystemtime]: https://doc.rust-lang.org/std/time/struct.SystemTime.html

[chrononaivedate]: https://docs.rs/chrono/latest/chrono/struct.NaiveDate.html

[chrononaivetime]: https://docs.rs/chrono/latest/chrono/struct.NaiveTime.html

[chrononaivedatetime]: https://docs.rs/chrono/latest/chrono/struct.NaiveDateTime.html

[chronoutc]: https://docs.rs/chrono/latest/chrono/struct.Utc.html

[chronofixedoffset]: https://docs.rs/chrono/latest/chrono/struct.FixedOffset.html

[chronodatetime]: https://docs.rs/chrono/latest/chrono/struct.DateTime.html

[chronotimedelta]: https://docs.rs/chrono/latest/chrono/struct.TimeDelta.html

[nonzero]: https://doc.rust-lang.org/std/num/index.html#types

[range]: https://doc.rust-lang.org/std/ops/struct.Range.html

[range_incl]: https://doc.rust-lang.org/std/ops/struct.RangeInclusive.html

[timedate]: https://docs.rs/time/latest/time/struct.Date.html

[timetime]: https://docs.rs/time/latest/time/struct.Time.html

[timeprimitivedatetime]: https://docs.rs/time/latest/time/struct.PrimitiveDateTime.html

[timeutcoffset]: https://docs.rs/time/latest/time/struct.UtcOffset.html

[timeoffsetdatetime]: https://docs.rs/time/latest/time/struct.OffsetDateTime.html

[timeduration]: https://docs.rs/time/latest/time/struct.Duration.html

Any of these types may be included directly in a `bilrost` message struct. If
that field's value is [empty](#empty-values), no bytes will be emitted when it
is encoded.

#### Containers

In addition to including them directly, the types listed above can also be
nested within several different containers, including the types listed here and
the variants of a `Oneof`. These types may also be re-nested in one of these
container types again if the type and encoding supports it, typically as many
times as needed.

Note that `Option` cannot be nested again. Semantically, `Option` gives the
ability to detect the difference between an zeroed-out "empty" value and a
missing field that was not included.

| Encoding                        | Value type                                               | Encoded representation                                                                     | Re-nestable | Distinguished      |
|---------------------------------|----------------------------------------------------------|--------------------------------------------------------------------------------------------|-------------|--------------------|
| any encoding                    | [`Option<T>`][opt]                                       | identical; at least some bytes are always encoded if `Some`, nothing if `None`             | no          | when `T` is        |
| `unpacked<E>`                   | [`Vec<T>`][vec], [`BTreeSet<T>`][btset]                  | the same as encoding `E`, one field per value                                              | no          | when `T` is        |
| `unpacked<E>`                   | [`[T; N]`][array][^arrays]                               | the same as encoding `E`, one field per value                                              | no          | when `T` is        |
| `unpacked`                      | *                                                        | (this means `unpacked<general_packed>`)                                                    | no          | *                  |
| `packed<E>`                     | [`Vec<T>`][vec], [`BTreeSet<T>`][btset]                  | always length-delimited, successively encoded with `E`                                     | yes         | when `T` is        |
| `packed<E>`                     | [`[T; N]`][array][^arrays]                               | always length-delimited, successively encoded with `E`                                     | yes         | when `T` is        |
| `packed`                        | *                                                        | (this means `packed<general_packed>`)                                                      | yes         | *                  |
| `map<KE, VE>`                   | [`BTreeMap<K, V>`][btmap]                                | always length-delimited, alternately encoded with keys by encoding `KE` and values by `VE` | yes         | when `K` & `V` are |
| `map`                           | *                                                        | (this means `map<general_packed, general_packed>`)                                         | yes         | *                  |
| `general`                       | [`Vec<T>`][vec], [`BTreeSet<T>`][btset]                  | (the same as `unpacked`)                                                                   | no          | *                  |
| `general_packed`                | `Vec<T>`, `BTreeSet<T>`                                  | (the same as `packed`)                                                                     | yes         | *                  |
| general encodings               | [`BTreeMap`][btmap]                                      | (the same as `map`)                                                                        | yes         | *                  |
| general encodings or `(E1, E2)` | [`Range<T>`][range] or [`RangeInclusive<T>`][range_incl] | the same as `(start, end)` with the same encoding                                          | yes         | when `T` is        |

[^arrays]: Fixed-size array types (`[T; N]`) act similarly to collections that
additionally require an exact number of items. Where other kinds of collections
are considered [empty](#empty-values) when they have no items, arrays are
considered empty when each of their values is empty.

Many alternative types are also available for both scalar values and containers!

| Value type          | Alternative                                     | Supporting encoding | Distinguished | Feature to enable |
|---------------------|-------------------------------------------------|---------------------|---------------|-------------------|
| `u32`, `u64`        | [`[u8; 4]`][prim], [`[u8; 8]`][prim]            | `fixed`             | yes           | (none)            |
| `Vec<u8>`           | `Blob`[^blob]                                   | general encodings   | yes           | (none)            |
| `Vec<u8>`           | [`Cow<[u8]>`][cow]                              | `plainbytes`        | yes           | (none)            |
| `Vec<u8>`           | [`bytes::Bytes`][bytes][^bzcopy]                | general encodings   | yes           | (none)            |
| `Vec<u8>`           | [`[u8; N]`][prim][^plainbytearr]                | `plainbytes`        | yes           | (none)            |
| `String`/`Vec<u8>`* | [`bstr::BString`][bstr][^bstrnote]              | general encodings   | yes           | "bstr"            |
| `String`            | [`Cow<str>`][cow]                               | general encodings   | yes           | (none)            |
| `String`            | [`bytestring::ByteString`][bytestring][^bzcopy] | general encodings   | yes           | "bytestring"      |

[^bstrnote]: [`bstr::BString`][bstr] is like `String` in that it has many useful
features for working with text, yet it is also like `Vec<u8>` in that it can
hold any unvalidated bytes content (it can work with UTF-8 text, but it doesn't
*necessarily* contain valid UTF-8 text). This can be useful for both speed and
for semi-valid data that is mostly textual, and its third-party support is
included here for those use cases. If it's not immediately convenient as a value
type, the crate also provides [`bstr::BStr`][bstrref] as a reference type (
analogous to
`str`) which can be used with any `&[u8]`.

[^bzcopy]: When decoding from a `bytes::Bytes` object, both `bytes::Bytes` and
`bytes::ByteString` have a zero-copy optimization and will reference the decoded
buffer rather than copying. (This could also work for any other input type that
has a zero-copy `bytes::Buf::copy_to_bytes()` optimization.)

[^plainbytearr]: Plain byte arrays, as we might expect, only accept one exact
length of data; other lengths are considered invalid values.

[^blob]: `bilrost::Blob` is a transparent wrapper for `Vec<u8>` in that is a
drop-in replacement in most situations and is supported by the default `general`
encoding for maximum ease of use. If nothing but `Vec<u8>` will do,
the `plainbytes` encoding will still encode a plain `Vec<u8>` as its bytes
value.

| Container type | Alternative                                           | Distinguished | Feature to enable |
|----------------|-------------------------------------------------------|---------------|-------------------|
| `Vec<T>`       | [`Cow<[T]>`][cow]                                     | when `T` is   | (none)            |
| `Vec<T>`       | [`arrayvec::ArrayVec<[T; N]>`][arrayvec][^bounded]    | when `T` is   | "arrayvec"        |
| `Vec<T>`       | [`smallvec::SmallVec<[T]>`][smallvec]                 | when `T` is   | "smallvec"        |
| `Vec<T>`       | [`thin_vec::ThinVec<[T]>`][thinvec]                   | when `T` is   | "thin-vec"        |
| `Vec<T>`       | [`tinyvec::ArrayVec<[T; N]>`][tinyarrayvec][^bounded] | when `T` is   | "tinyvec"         |
| `Vec<T>`       | [`tinyvec::TinyVec<[T]>`][tinyvec]                    | when `T` is   | "tinyvec"         |
| `BTreeMap<T>`  | [`HashMap<T>`][hashmap][^hashnoncanon]                | no            | "std" (default)   |
| `BTreeSet<T>`  | [`HashSet<T>`][hashset][^hashnoncanon]                | no            | "std" (default)   |
| `BTreeMap<T>`  | [`hashbrown::HashMap<T>`][hbmap][^hashnoncanon]       | no            | "hashbrown"       |
| `BTreeSet<T>`  | [`hashbrown::HashSet<T>`][hbset][^hashnoncanon]       | no            | "hashbrown"       |

[array]: https://doc.rust-lang.org/std/primitive.array.html

[arrayvec]: https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html

[box]: https://doc.rust-lang.org/std/boxed/struct.Box.html

[bstr]: https://docs.rs/bstr/latest/bstr/struct.BString.html

[bstrref]: https://docs.rs/bstr/latest/bstr/struct.BStr.html

[bytestring]: https://docs.rs/bytestring/latest/bytestring/struct.ByteString.html

[btmap]: https://doc.rust-lang.org/std/collections/btree_map/struct.BTreeMap.html

[btset]: https://doc.rust-lang.org/std/collections/struct.BTreeSet.html

[bytes]: https://docs.rs/bytes/latest/bytes/struct.Bytes.html

[cow]: https://doc.rust-lang.org/std/borrow/enum.Cow.html

[hashmap]: https://doc.rust-lang.org/std/collections/struct.HashMap.html

[hashset]: https://doc.rust-lang.org/std/collections/struct.HashSet.html

[hbmap]: https://docs.rs/hashbrown/latest/hashbrown/struct.HashMap.html

[hbset]: https://docs.rs/hashbrown/latest/hashbrown/struct.HashSet.html

[opt]: https://doc.rust-lang.org/std/option/enum.Option.html

[prim]: https://doc.rust-lang.org/std/index.html#primitives

[smallvec]: https://docs.rs/smallvec/latest/smallvec/struct.SmallVec.html

[str]: https://doc.rust-lang.org/std/string/struct.String.html

[thinvec]: https://docs.rs/thin-vec/latest/thin_vec/struct.ThinVec.html

[tinyvec]: https://docs.rs/tinyvec/latest/tinyvec/enum.TinyVec.html

[tinyarrayvec]: https://docs.rs/tinyvec/latest/tinyvec/struct.ArrayVec.html

[tuple]: https://doc.rust-lang.org/std/primitive.tuple.html

[vec]: https://doc.rust-lang.org/std/vec/struct.Vec.html

[^bounded]: Some containers, notably `ArrayVec` flavors, have a built-in maximum
capacity. When more bytes or items than will fit in these containers are
encountered while decoding, decoding will fail with an "invalid value" error.

[^hashnoncanon]: Hash-table-based maps and sets are implemented, but are not
compatible with distinguished encoding or decoding. If distinguished decoding is
required, a container which stores its values in sorted order must be used.

While it's possible to nest and recursively nest `Message` types with `Box`,
`Vec`, etc., `bilrost` does not do any kind of runtime check to avoid infinite
recursion in the event of a cycle. The chosen supported types and containers
should not be able to become *infinite* as implemented, but if the situation
were induced to happen anyway it would not end well. (Note that creative usage
of `Cow<[T]>` can create messages that encode absurdly large, but the borrow
checker keeps them from becoming infinite mathematically if not practically.)

#### Tuples

Tuple types can be included in messages, but there are some notable features
that merit additional explanation.

Tuples can have each of their members' encodings specified by using
an encoding that is shaped just like the value. For example, `(i8, String, u32)`
can use the encoding `(varint, general, fixed)`! This method of specifying the
encoding can be nested as well.

Tuples encode and decode exactly as if they were nested messages with the same
field types and encodings, and the tags assigned to those fields are the same as
the index of the member of the tuple. So, the assigned tags start at zero; this
is in contrast to derived message implementations which *by default* will assign
tags starting at 1.

The `general` encoding is also directly applicable to tuple types as long as
each of the tuple's fields is compatible with the `general` encoding itself, and
all the fields will use that encoding.

Like most of the Rust standard library, `bilrost` implements encoding for tuples
up to arity 12.

#### Enumerations

`bilrost` can derive the required implementations for a numeric enumeration type
from an `enum` with no fields in its variants, where each variant has either

1. an explicit discriminant that is a valid `u32` value, or
2. a `#[bilrost = 123]` or `#[bilrost(123)]` attribute that specifies a valid
   `u32` const expression and match pattern (here with the example value `123`).

```rust
#[derive(Clone, PartialEq, Eq, bilrost::Enumeration)]
enum SimpleEnum {
    Unknown = 0,
    A = 1,
    B = 2,
    C = 3,
}

const FOUR: u32 = 4;

#[derive(Clone, PartialEq, Eq, bilrost::Enumeration)]
#[repr(u8)] // The type needn't have a u32 repr
enum ComplexEnum {
    One = 1,
    #[bilrost = 2]
    Two,
    #[bilrost(3)]
    Three,
    #[bilrost(FOUR)]
    Four,
    // When both discriminant and attribute exist, bilrost uses the attribute.
    #[bilrost(5)]
    Five = 8,
}

// Enumerations can also have non-unit variants as long as they have no fields.
#[derive(Clone, PartialEq, Eq, bilrost::Enumeration)]
enum EnumWithNonUnitVariants {
    #[bilrost(1)]
    Unit,
    #[bilrost(2)]
    Tuple(),
    #[bilrost(3)]
    Struct { },
}
```

All enumeration types are encoded and decoded by conversion to and from the Rust
`u32` type, using `Into<u32>` and `TryFrom<u32, Error = bilrost::DecodeError>`.
In addition to deriving trait impls with `Enumeration`, the following additional
traits are also mandatory: `Clone` and `Eq` (and thus `PartialEq` as well).

If the discriminants of an enumeration conflict at all, compilation will fail;
the discriminants must be unique within any given enumeration.

```rust,compile_fail
# use bilrost::Enumeration;
#[derive(Clone, PartialEq, Eq, Enumeration)]
enum Foo {
    A = 1,
    #[bilrost(1)] // error: unreachable pattern
    B = 2,
}
```

For an enumeration type to qualify for direct inclusion as a message field
rather than only as a nested value (within `Option`, `Vec`, etc.), one of the
discriminants must be spelled exactly "0".

#### Compatible Widening

While many types have different representations and interpretations in the
encoding, there are several classes of types which have the same encoding *and*
the same interpretation as long as the values are in range for both types. For
example, it's possible to change an `i16` field and change its type to `i32`,
and any number that can be represented in `i16` will have the same encoded
representation for both types.

Widening fields along these routes is always supported in the following way:
Old message data will always decode to an equivalent/corresponding value, and
those corresponding values will re-encode from the new widened struct into the
same representation.

| Change                                                                                 | Corresponding values                                                                        | Backwards compatibility breaks when...                                                                                                |
|----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| `bool` --> `u8` --> `u16` --> `u32` --> `u64`, all with `general` or `varint` encoding | `true`/`false` becomes 1/0                                                                  | value is out of range of the narrower type                                                                                            |
| `bool` --> `i8` --> `i16` --> `i32` --> `i64`, all with `general` or `varint` encoding | `true`/`false` becomes -1/0                                                                 | value is out of range of the narrower type                                                                                            |
| any `NonZero` number type --> the plain number type                                    | the unchanged numeric value                                                                 | numeric value is zero                                                                                                                 |
| `String` --> `Vec<u8>`                                                                 | string becomes its UTF-8 data                                                               | value contains invalid UTF-8                                                                                                          |
| `T` --> `Option<T>`                                                                    | default value of `T` becomes `None`                                                         | `Some(empty)` is encoded; it will be considered non-canonical                                                                         |
| `Option<T>` --> `Vec<T>` (with `unpacked` encoding)                                    | maybe-contained value is identical                                                          | multiple values are in the `Vec`                                                                                                      |
| `[T; N]` --> `Vec<T>`                                                                  | when each array value is empty, the `Vec` will be empty instead of filled with empty values | data is a nonzero length different than that of the array                                                                             |
| `Option<[T; N]>` --> `Vec<T>`                                                          | no change                                                                                   | data is a length different than that of the array                                                                                     |
| `Range<T>` or `RangeInclusive<T>` <--> `(start, end)` tuple (with the same encoding)   | no change                                                                                   | never                                                                                                                                 |
| `Message` types --> with new fields added                                              | no change, new fields are empty                                                             | new fields are not empty; it will be considered non-canonical                                                                         |
| `Enumeration` types --> with new variants added                                        | no change                                                                                   | value is a new variant                                                                                                                |
| `chrono::NaiveDate` --> `chrono::NaiveDateTime`                                        | midnight on the corresponding date                                                          | value has a non-midnight time component                                                                                               |
| `time::Date` --> `time::PrimitiveDateTime`                                             | midnight on the corresponding date                                                          | value as a non-midnight time component                                                                                                |
| `chrono::Utc` --> `chrono::FixedOffset` (and `chrono::DateTime` using those)           | timezone is always UTC                                                                      | value has a non-UTC offset                                                                                                            |
| `chrono::NaiveDate` <--> `time::Date`                                                  | no change                                                                                   | whenever one library is out of its supported range                                                                                    |
| `chrono::NaiveTime` <--> `time::Time`                                                  | no change                                                                                   | whenever one library is out of its supported range (including leap seconds)                                                           |
| `chrono::NaiveDateTime` <--> `time::PrimitiveDateTime`                                 | no change                                                                                   | whenever one library is out of its supported range                                                                                    |
| `chrono::FixedOffset` <--> `time::UtcOffset`                                           | no change                                                                                   | whenever one library is out of its supported range                                                                                    |
| `chrono::DateTime<Tz>` <--> `time::OffsetDateTime`                                     | no change                                                                                   | whenever one library is out of its supported range                                                                                    |
| `chrono::TimeDelta` <--> `time::Duration` <--> `bilrost_types::Duration`               | no change                                                                                   | whenever one library is out of its supported range. `time` and `chrono` impls are strict about seconds and nanos having matching sign |

`Vec<T>` and other list- and set-like collections that contain repeated values
can also be changed between `unpacked` and `packed` encoding, as long as the
inner value type `T` does not have a length-delimited representation. This will
break compatibility with distinguished decoding in both directions whenever the
field is present and not [empty](#empty-values) because it will also change the
encoded representation, but relaxed decoding will still work.

## Strengths, Aims, and Advantages

Strengths of Bilrost's encoding include those of protocol buffers:

* Encoded messages are very durable, with greatly extensible forward
  compatibility
* Encoded messages are relatively very compact, and their representation "on the
  wire" is very simple
* The encoding is minimally[^floatbits] platform-dependent; each byte is
  specified, and there are no endianness incompatibility issues
* When decoding, text-string and byte-string data is represented verbatim and
  can be referenced without copying
* Skipping irrelevant, undesired, or unknown-extension data is inexpensive as
  most nested and repeated fields are stored with a length prefix

...as well as more:

* In Bilrost, decoded data means what it says. If a value is decoded, it
  contains all the information that was present in the encoding (no silent
  integer truncation!)
* Bilrost supports distinguished decoding for types where it makes sense, and is
  designed from a protocol level to make invalid values unrepresentable where
  possible
* Bilrost is more compact than protobuf without incurring significant overhead.
  Any nuanced representations that are possible in protobuf that Bilrost cannot
  represent or has no analog for are either permanently deprecated, or all
  conforming protobuf decoders are required to discard the difference anyway.
* `bilrost` aims to be as ergonomic as is practical in plain rust, with basic
  annotations and derive macros. It's possible for such a library to be quite
  nice to use!

[^floatbits]: The main area of potential incompatibility is with the
representation of signaling vs. quiet NaN floating point values; see
[`f64::from_bits()`][floatbits].

[floatbits]: https://doc.rust-lang.org/std/primitive.f64.html#method.from_bits

## What Bilrost and the library won't do

Bilrost does *not* have a robust reflection ecosystem. It does not (yet) have an
intermediate schema language like protobuf does, nor implementations for very
many languages, nor RPC framework support, nor an independent validation
framework. These things are possible, they just don't exist yet.

This library also does not have support for encoding/decoding its message types
to and from JSON or other readable text formats. However, because it supports
deriving Bilrost encoding implementations from existing structs, it is possible
(and recommended) to use other, preexisting tools to do this. `Debug` can also
be derived for a `bilrost` message type, as can other encodings that similarly
support deriving implementations from preexisting types.

## Encoding specification

Philosophically, there are two "sides" to the encoding scheme: the opaque data
that comprises it, and conventions for how that data is interpreted.

### Opaque format

Bilrost data is encoded as zero or more key-value pairs, referred to as
"fields". Keys are numeric and bear information about both the tag of the field
and the opaque type of its value.

Values in bilrost are encoded opaquely as strings of bytes or as non-negative
integers not greater than the maximum value representable in an unsigned 64 bit
integer (2^64-1). The only four scalar types supported by the encoding format
itself are these integers, byte strings of any (64-bit representable) length,
and byte strings with lengths of exactly 4 or exactly 8.

This opaque format should remain entirely stable, and is (for what it is worth)
self-describing. The *meaning* of the tags and their values is likely to vary
widely depending on the schema in use (which is *not* self-describing), but
outside of the opaque data's interpretation the format will not vary.

#### Messages

The basic functional unit of encoded Bilrost data is a message. An encoded
message is some string of zero or more bytes with a specific length.

#### Fields

Encoded messages are comprised of zero or more encoded fields. Each field has a
numeric "tag", a number in the range representable by an unsigned 32 bit
integer, and some type of value.

Each field is encoded as two parts: first its key, and then its value. The
field's key is always encoded as a varint. The interpretation of the encoded
value of that varint is in two parts: the value divided by 4 is the *tag-delta*,
and the remainder of that division determines the value's *wire-type*. The
tag-delta encodes the non-negative difference between the tag of the
previously-encoded field (or zero, if it is the first field) and the tag of the
field the key is part of. Wire-types map to the remainder, and determine the
form and representation of the field value as follows:

**0: varint** - the value is an opaque number, encoded as a single varint.

**1: length-delimited** - the value is a string of bytes; its length in bytes is
encoded first as a single varint, then immediately followed by exactly that many
bytes comprising the value itself.

**2: fixed-length 32 bits** - the value is a string of exactly 4 bytes, encoded
with no additional prelude.

**3: fixed-length 64 bits** - the value is a string of exaclty 8 bytes, encoded
with no additional prelude.

Note that because field keys encode only the *delta* from the previous tag, it
is not possible to encode fields in anything but sorted order according to their
tags. Unsorted fields are *unrepresentable*.

If a field key's tag-delta indicates a tag that is greater than would fit in an
unsigned 32 bit integer (2^32-1), the encoded message is not valid and must be
rejected.

#### Varints (LEB128-bijective encoding)

Varints are a variable-length encoding of an unsigned 64 bit integer value.
Encoded varints are between one and nine bytes, with lesser numeric values
having shorter representations in the encoding. At the same time, each number in
this range has exactly one possible encoded representation.

1. The final byte of a varint is the first byte that does not have its most
   significant bit set, or the ninth byte, whichever comes first.
2. The value of the encoded varint is the sum of each byte's unsigned integer
   value, multiplied by 128 (shifted left/up by 7 bits) for each byte that
   preceded it.
3. Varints representing values greater than 2^64-1 are invalid.

Several outstanding examples of very similar varint encodings exist:

| Implementation             | Format                         | Limits length?                           | Endianness | Bijective |
|----------------------------|--------------------------------|------------------------------------------|------------|-----------|
| [sqlite][sqlitevarint]     | base 128 with continuation bit | yes (9 bytes)                            | big        | no        |
| [protobuf][protobufvarint] | base 128 with continuation bit | no (10th byte uses only 1 bit)           | little     | no        |
| [git][gitvarint]           | base 128 with continuation bit | no (large values generally not relevant) | big        | yes       |
| bilrost                    | base 128 with continuation bit | yes (9 bytes)                            | little     | yes       |

[gitvarint]: https://git.kernel.org/pub/scm/git/git.git/tree/varint.c?h=v2.43.2

[protobufvarint]: https://protobuf.dev/programming-guides/encoding/#varints

[sqlitevarint]: https://www.sqlite.org/fileformat2.html#varint

##### Mathematics

Bilrost's varint representation is a base 128 [bijective numeration][bn] scheme
with a continuation bit. In such a numbering scheme, each possible values in a
given scheme is greater than each possible value with fewer digits. (Many people
are already unknowingly familiar with bijective numeration via the column names
in spreadsheet software: A, B, ... Y, Z, AA, AB, ...)

[bn]: https://en.wikipedia.org/wiki/Bijective_numeration

Classical bijective numerations have no zero digit, but represent zero with the
empty string. This doesn't work for us because we must always encode at least
one byte to avoid ambiguity. Consider instead:

* A base 128 bijective numeration,
* which represents the digits valued 1 through 128 with the byte values 0
  through 127,
* is encoded least significant digit first with a continuation bit in the most
  significant bit of each byte,
* and encodes the represented value plus one...

...this is *almost exactly* the Bilrost varint encoding. The sole exception is
that, starting at the value 9295997013522923648 (hexadecimal
0x8102_0408_1020_4080, encoded as
`[128, 128, 128, 128, 128, 128, 128, 128, 128, 0]`) and the maximum
18446744073709551615 (hexadecimal 0xffff_ffff_ffff_ffff, encoded as
`[255, 254, 254, 254, 254, 254, 254, 254, 254, 0]`), there is always a tenth
byte and it is always zero.

For practical applications it's not necessary to be able to encode byte lengths
outside the 64 bit range, it is rare to need to encode values outside the range,
and if it were desirable to encode integer-like values larger than this (for
example, 128-bit UUIDs) it is more efficient to represent them in
length-delimited values, which take 1 extra byte to represent their size. For
these reasons, in the Bilrost varint encoding we do not encode this trailing
zero byte.

##### Example varint values and algorithms

<details><summary>Some examples of encoded varints</summary>

| Value                   | Bytes (decimal)                                 |
|-------------------------|-------------------------------------------------|
| 0                       | `[0]`                                           |
| 1                       | `[1]`                                           |
| 101                     | `[101]`                                         |
| 127                     | `[127]`                                         |
| 128                     | `[128, 0]`                                      |
| 255                     | `[255, 0]`                                      |
| 256                     | `[128, 1]`                                      |
| 1001                    | `[233, 6]`                                      |
| 16511                   | `[255, 127]`                                    |
| 16512                   | `[128, 128, 0]`                                 |
| 32895                   | `[255, 255, 0]`                                 |
| 32896                   | `[128, 128, 1]`                                 |
| 1000001                 | `[193, 131, 60]`                                |
| 1234567890              | `[150, 180, 252, 207, 3]`                       |
| 987654321123456789      | `[149, 237, 196, 218, 243, 202, 181, 217, 12]`  |
| 12345678900987654321    | `[177, 224, 156, 226, 204, 176, 169, 169, 170]` |
| (maximum `u64`: 2^64-1) | `[255, 254, 254, 254, 254, 254, 254, 254, 254]` |

</details>

<details><summary>Varint algorithm</summary>

The following is python example code, written for clarity rather than
performance:

```python
def encode_varint(n: int) -> bytes:
    assert 0 <= n < 2**64
    bytes_to_encode = []
    # Encode up to 8 preceding bytes
    while n >= 128 and len(bytes_to_encode) < 8:
        bytes_to_encode.append(128 + (n % 128))
        n = (n // 128) - 1
    # Always encode at least one byte
    bytes_to_encode.append(n)
    return bytes(bytes_to_encode)


def decode_varint_from_byte_iterator(it: Iterable[int]) -> int:
    n = 0
    for byte_index, byte_value in enumerate(it):
        assert 0 <= byte_value < 256
        n += byte_value * (128**byte_index)
        if byte_value < 128 or byte_index == 8:
            # Varints encoding values greater than 64 bits MUST be rejected
            if n >= 2**64:
                raise ValueError("invalid varint")
            return n
    # Reached end of data before the end of the varint
    raise ValueError("varint truncated")
```

</details>

### Standard interpretation

To make the encoding useful, these opaque values have standard interpretations
for many common data types.

*The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this section are to be
interpreted as described in [RFC 2119][rfc2119].*

[rfc2119]: https://www.ietf.org/rfc/rfc2119.txt

In general, whenever a decoded value represents a value that is outside the
domain of the type of the field it is being decoded into (for instance, when the
field type is `u16` but the value is a million, or when the field type is an
enumeration and there is no corresponding variant of the enumeration) the
decoding must be rejected with an error in any decoding mode.

Unsigned integers represented as varints are interpreted exactly. The varint
encoding of the number 10 has the same meaning in `u8`, `u16`, `u32`, and `u64`
field types.

Signed integers represented as varints are always [zig-zag encoded][zigzag],
with the sign of the number denoted in the least significant bit. Thus,
non-negative integers are translated to unsigned for encoding by doubling them,
and negative integers are translated by negating, then doubling, then
subtracting one.

[zigzag]: https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding

Booleans use the varint value 0 for `false`, and 1 for `true`.

Unsigned integers encoded in fixed-width must be encoded in little-endian
byte order; signed integers must likewise be encoded in little-endian byte
order, and must have a [two's complement][twos] representation.

[twos]: https://en.wikipedia.org/wiki/Two%27s_complement

Floating point numbers must be encoded in little-endian byte order, and must
have [IEEE 754 binary32/binary64][ieee754] standard representation. Floating
point numbers are encoded as four- and eight-byte fixed-width values.

Arrays, plain byte strings, and collections must be encoded in order, with their
lowest-indexed (first) bytes or items encoded first. For example, the
fixed-width encodings of the `u8` array `[1, 2, 3, 4]` and the 32 bit unsigned
integer `0x04030201` (67305985) are identical.

<details><summary>Demonstration of the above</summary>

```rust
use bilrost::Message;

#[derive(Message)]
struct Foo<T>(#[bilrost(encoding(fixed))] T);

// Both of these messages encode as the bytes `b'\x06\x01\x02\x03\x04'`
assert_eq!(
    Foo(0x04030201u32).encode_to_vec(),
    Foo([1u8, 2, 3, 4]).encode_to_vec(),
);
```

</details>

String values must always be valid UTF-8 text, containing the canonical encoding
for some sequence of Unicode codepoints. Codepoints with over-long encodings and
surrogate codepoints should be rejected with an error in any decoding mode, and
must be considered non-canonical. Bilrost does not impose any restrictions on
the ordering or presence of valid non-surrogate codepoints; it may be desirable
in an application to constrain text to a canonicalized form (such as
[NFC][uninormal]), but that should be considered outside the scope of Bilrost's
responsibilities of *encoding and decoding* and instead part of *validation,*
which is the responsibility of the application.

[uninormal]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

Nested messages should be represented as a length-delimited value containing
the bytes of that message's encoding. There cannot be any extra bytes following
that value, and nested messages' validity must include the results of decoding
every byte of the value.

Collections of items (such as `Vec<String>`) encoded in the unpacked
representation consist of one field for each item. Collections encoded in the
packed representation consist of a single length-delimited value, containing
each item's value encoded one after the other. In relaxed decoding mode,
decoding should succeed when expecting a packed representation but detecting an
unpacked representation, or vice versa (though the encoding must be considered
non-canonical). Detecting this situation is only possible when the values
themselves never have a length-delimited representation, in which case the
wire-type of the field can be used to distinguish the two cases.

Sets (collections of unique values) are encoded and decoded in exactly the same
form as non-unique collections. If a value in a set appears more than once when
decoding, the message must be rejected with an error in any decoding mode. The
items must be in [canonical order](#canonical-ordering) for the encoding to be
considered canonical.

Mappings are represented as a length-delimited value, containing alternately
encoded keys and values for each entry in the mapping. Keys must be distinct,
and if a map is found to have two equivalent keys the message must be rejected
with an error in any decoding mode. In distinguished decoding mode, the entries
in the mapping must be encoded in [canonical order](#canonical-ordering) for the
encoding to be considered canonical.

Any field whose value is [empty](#empty-values) should always be omitted from
the encoding. The presence of any field represented in the encoding with an
empty value must cause the encoding to be considered non-canonical.

Fields whose types do not encode into multiple fields must not occur more than
once. If they do, the message must be rejected with an error in any decoding
mode. This currently includes every type of field not encoded with an unpacked
representation.

Oneofs, sets of mutually exclusive fields, must not have conflicting values
present in the encoding. If they do, the message must be rejected with an error
in any decoding mode.

If a field whose tag that is not known/specified in the message is encountered
in relaxed decoding mode, it should be ignored for purposes of decoding.

#### Distinguished constraints

In distinguished decoding mode, in addition to the above constraints on value
ordering in sets and mappings, all values must be represented in exactly the way
they would encode. If an [empty](#empty-values) value is found to be represented
in the encoding, the message is not canonical. (In the case of an optional
field, `Some(0)` is not considered empty, and is distinct from the always-empty
value `None`; this is the purpose of optional fields.)

Also in distinguished mode, if fields whose tags are not in the message's schema
are encountered the encoding can no longer be considered canonical.

#### Empty values

The type of each field of a Bilrost message has an "empty" value, which is never
represented as encoded data on the wire.

| Type                                                  | Empty value                        |
|-------------------------------------------------------|------------------------------------|
| boolean                                               | false                              |
| any integer                                           | 0                                  |
| any floating point number                             | exactly +0.0                       |
| fixed-size byte array                                 | all zeros                          |
| text string, byte string, collection, mapping, or set | containing no bytes or items       |
| tuples `(A, B, C, ...)`                               | each item is empty                 |
| arrays `[T; N]`                                       | each item is empty                 |
| `Enumeration` type                                    | the variant represented by 0       |
| `Message`                                             | each field of the message is empty |
| `Oneof`                                               | `None` or the empty variant        |
| any optional value (`Option<T>`)                      | `None`                             |

The empty byte string is always a valid and canonical encoding of any Bilrost
message type, and represents the value of the message in which every field has
its empty value.

#### Canonical ordering

For supported non-message types, the following orderings are standardized:

| Type                                 | Standard ordering                                                                     |
|--------------------------------------|---------------------------------------------------------------------------------------|
| boolean                              | false, then true                                                                      |
| integer                              | ascending numeric value                                                               |
| text string, byte string, byte array | [lexicographically][lex] ascending, by bytes or UTF-8 bytes[^u8bytes]                 |
| tuple                                | lexicographically ascending, by nested values                                         |
| array                                | lexicographically ascending, by nested values                                         |
| collection (vec)                     | lexicographically ascending, by nested values                                         |
| unordered collection (set)           | lexicographically ascending, by ascending nested values                               |
| mapping                              | lexicographically ascending, by ascending keys alternating key-then-value             |
| floating point number                | [(not specified, nor recommended)]#floating-point-values-and-distinguished-decoding |
| `Enumeration` types                  | [(not specified)]#canonical-order-and-distinguished-representation                  |
| `Message` types                      | [(not specified)]#canonical-order-and-distinguished-representation                  |
| `Option<T>`                          | (not applicable, cannot repeat)                                                       |
| `Oneof` types                        | (not applicable, not a single value, cannot repeat)                                   |

[lex]: https://en.wikipedia.org/wiki/Lexicographic_order

[^u8bytes]: Bytes are considered to be unsigned. The least-valued byte is the
nul byte `0x00`, and the greatest is `0xff`.

This standardization corresponds to the existing definitions of [`Ord`][ord] in
the Rust language for booleans, integers, strings, arrays/slices, ordered sets,
and ordered maps.

## `bilrost` vs. `prost`

`bilrost` is a direct fork of the `prost` crate, though it has been mostly
rewritten since then. Both libraries are designed for largely the same purpose,
but have different capabilities and have strengths in different situations.

[`prost`][p] is an implementation of [Protobuf][pb], and as a consequence it
brings many concerns and heavy tooling of that ecosystem with it, for better and
for worse. Protobuf messages are specified by a dedicated schema file, and the
code that implements those types is then usually automatically generated.
`prost` has tooling to do this via the "protoc" compiler; other implementations
variously do the same thing or reimplement complete parsers for that [DSL][dsl].

[dsl]: https://en.wikipedia.org/wiki/Domain-specific_language

`bilrost` by comparison is an implementation of a new encoding that isn't
compatible with Protobuf. If Protobuf isn't specifically required, consider the
[tradeoffs](#what-bilrost-and-the-library-wont-do) and [comparison](
#differences-from-protobuf) to the Protobuf encoding.

The code generated by `prost-build` is relatively messy and explicit, at least
when compared to handwritten code. This generated code in turn uses derive
macros to generate the more complex parts of the implementation, so the
generated code can in theory be committed and modified too, but it's not
significantly more flexible used this way.

`bilrost` refactors the encoding implementations to use trait-based dispatch
instead of explicit implementations that have to be selected for each field
type. This allows `bilrost` to have very broad type support without requiring
explicit annotations on most fields, and makes it very comfortable and easy to
use without any generated code other than the derive macros. (This same
trait-based dispatch could be back-ported to `prost` to make it easier to use,
but it might be a significant API break.)

`bilrost` has also implemented a couple requested features not yet available in
`prost`:

* message fields can be [ignored via attribute]#ignoring-fields
* implementations are available for `no_std`-compatible hash maps, vecs that
  inline short values, `ByteString`, etc.
* message traits are dyn-compatible and provide [full functionality as trait
  objects](#using-dyn-with-message-traits). At time of writing, `prost 0.13.4`
  has very little functionality exposed in a dyn-compatible way; the only
  methods usable via a `&dyn Trait` object compute the encoded length of the
  message and clear its fields.

## Differences from Protobuf

The Bilrost encoding is heavily based upon that of Protobuf, with a small number
of key changes.

* Bilrost supports more types
* Bilrost is slightly more compact
* Bilrost has first-class support for distinguished canonical encoding
* Bilrost removes some mistake-prone choices
* Bilrost does not have a giant ecosystem

<details><summary>In greater detail</summary>

* The varint encoding is different: Bilrost varints are bijective (having only
  one possible representation per value) and have a shorter maximum length, as
  it doesn't make sense to extend the encoding beyond 64 bit integers.

  Despite Protobuf varints being nominally simpler (since they directly
  transpose the bits of the encoding into the final value), it is difficult to
  impossible to realize this simplicity as improved performance in reality.
  Almost all of the cost on modern computing hardware is consumed by the
  fact that the values are a variable number of bytes in size.

  Protobuf varints are also subject to zero-extension, because they are not
  bijective. This is a recurring problem whenever attempts are made to guarantee
  canonical representation in Protobuf data, and requires extra care.
* Messages are only representable with their fields in ascending tag order,
  something Protobuf has declined to enforce or guarantee for decades and
  probably won't begin any time soon.

  Compliant Protobuf implementations allow several interesting operations by not
  guaranteeing or enforcing field order:
    * Unknown fields can be preserved as entirely opaque runs of bytes and
      concatenated to a message
    * Concatenating fields to a message has a *merge* semantic: singular fields'
      values are replaced (or merged, if they are messages), and repeated fields
      are appended to. This means that sometimes messages can be blindly
      concatenated with patches that override some of their fields.

  By guaranteeing field order in Bilrost, these (vanishingly rarely used or
  wanted) abilities are lost, but several powerful advantages are gained:
    * It is always trivially obvious when a field occurs more than once in a
      message when it shouldn't. No decisions need to be made or special checks
      performed to handle this case.
    * If desired (it probably isn't), it is even possible to enforce the
      required presence of particular fields in the encoding at run-time without
      maintaining presence data for those fields when decoding.

  Another hidden benefit of the obligate field ordering is that, because field
  tags are encoded as deltas, messages with very large numbers of fields are
  significantly smaller to encode. Protobuf field keys with tags above 15 always
  take multiple bytes to encode; in Bilrost, the only time a field key takes
  more than a single byte is when more than 31 tags have been skipped in a row.
* Fields' tags are less constrained. In Protobuf field tags are restricted to
  the range [1, 2^29-1]; in Bilrost we have made the decision to allow any
  unsigned 32 bit integer as a tag number.
* Protobuf uses three bits in field keys for the wire type, and has six of these
  wire types allocated; two are used as data-less delimiting markers for
  "groups", which are a legacy and long-deprecated method of nesting data within
  messages.

  In nearly twenty years, the Protobuf authors have never found cause to
  populate the final two unallocated wire types, which gives us at least some
  measure of confidence that the four that Bilrost has borrowed are sufficient
  for practical use.

There are also a couple key changes to how values are interpreted in Bilrost,
informed by experience with Protobuf:

* Bilrost representations of signed integers are always zig-zag encoded. In
  Protobuf there are two different modes for signed integers: "int32" is always
  encoded like two's complement, and "sint32" is zig-zag encoded. In practice
  the plain two's complement encoding is a tremendous footgun, because any
  negative integer always becomes *ten bytes* on the wire. Yes, even the 32 bit
  ones, because they are sign-extended all the way to 64 bits in case the field
  is to be widened in the future.
* Learning again from the footguns and mistakes of Protobuf (and C/C++ in
  general), Bilrost also enforces errors when values are out of range. Protobuf
  values will silently coerce to smaller types by truncation during decoding,
  and any nonzero varint will silently convert to the boolean value `true`. This
  is often surprising, bug-prone, and undesirable.
* `bilrost` makes special effort to preserve every bit of floating-point numbers
  when they are encoded and decoded. Whenever possible this should be matched by
  Bilrost libraries for other languages.
* Bilrost is much more permissive of nested values. Length-delimited values are
  permitted to be encoded in a "packed" representation, with warnings to the
  user; this allows nesting vecs within vecs, maps within maps, and more without
  creating explicit sub-message schemas for every single level of nesting.
* Bilrost has first-class mappings. Maps in Protobuf are a construct of unpacked
  repeated values that are nested sub-messages with keys and values in fields
  tagged 1 and 2, a situation whose official field types and APIs came long
  after it was already in production. Protobuf also to this day forbids byte
  strings as map keys, for unclear reasons possibly relating to the usage of
  nul-terminated C-strings as the representation of map keys in some
  implementations.

  Because Bilrost maps are packed into a single length-delimited value, they can
  freely have optional presence or be repeated or nested at will.

</details>

### Distinguished representation on the wire in `bilrost`

Leveraging the changes to varint representation and field order, Bilrost
standardizes easily-distinguishable canonical encodings for many message types.
Zero-extension of varints and unordered fields are the two main things that can
lead Protobuf encodings to vary for the same meaning, and most of what remains
involves enforcing that empty values are never encoded, packed/unpacked
collections have a matching representation, map keys are in sorted order, and
keeping track of whether any unknown fields exist in the encoding.

## Comparisons to other encodings

A very [incomplete][formats] comparison of various alternative encodings we
might consider.

[formats]: https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats

In addition to this general summary, benchmarks are now also available in
[`rust_serialization_benchmark`][bench].

[bench]: https://github.com/djkoloski/rust_serialization_benchmark

| Encoding             | Encoding complexity | Schemaless?           | Backwards/forwards compatible? | Human readable? | Canonical encodings?                          | Better than Bilrost                                                                   | Worse than Bilrost                                      |
|----------------------|---------------------|-----------------------|--------------------------------|-----------------|-----------------------------------------------|---------------------------------------------------------------------------------------|---------------------------------------------------------|
| Bilrost              | very low            | schemaful             | yes                            | no              | [yes]#distinguished-decoding!               | 🌈                                                                                    | 🌈                                                      |
| [Protobuf][pb]       | almost as simple    | schemaful             | yes                            | no              | no                                            | big ecosystem, has a schema DSL                                                       | slightly less compact, more footguns, less type support |
| [ASN.1 DER][asn1]    | quite high          | schemaful             | yes                            | no              | [yes][asn1]                                   | highly standardized & validated canonicity                                            | painful to use & implement                              |
| [Cap'n Proto][capnp] | medium              | schemaful             | yes                            | no              | no                                            | very fast, supports zero-copy style decoding, schema DSL, lots of languages supported | less compact, heavily relies on generated types         |
| [Flatbuffers][flatb] | medium              | schemaful             | yes                            | no              | no                                            | very fast, supports zero-copy style decoding, schema DSL, lots of languages supported | less compact, heavily relies on generated types         |
| [rkyv][rkyv]         | ?                   | fixed to struct       | no                             | no              | ?                                             | extremely fast zero-copy archival encoding                                            | built for a very different purpose                      |
| [bincode][bincode]   | low                 | fixed to struct       | no                             | no              | ?                                             | faster, more compact                                                                  | not compatible when new fields are added                |
| [JSON][json]         | medium-low          | schemaless            | yes                            | yes             | [standardized][jsoncanon], might be supported | near-universal support, readability                                                   | less compact, more lossy, poor fit for many value types |
| [BSON][bson]         | medium              | schemaless            | yes                            | no              | no                                            | it's JSON but compact                                                                 | less compact, not canonical                             |
| [msgpack][msgpack]   | medium              | schemaless            | yes                            | no              | no                                            | it's JSON but compact                                                                 | less compact, not canonical                             |
| [CBOR][cbor]         | medium              | schemaless            | yes                            | no              | [yes][cborcanon]                              | standardized, it's JSON but compact                                                   | less compact                                            |
| [XML][xml]           | high                | philosophers disagree | yes                            | yes             | [apparently yes][xmlcanon]                    | you've heard of it, you know it, it's everywhere                                      | far less compact, an inelegant weapon from a bygone era |

[asn1]: https://www.itu.int/rec/T-REC-X.690/

[bincode]: https://docs.rs/bincode/latest/bincode/

[bson]: https://bsonspec.org/

[capnp]: https://capnproto.org/

[cbor]: https://cbor.io/

[cborcanon]: https://datatracker.ietf.org/doc/html/rfc8949#det-enc

[flatb]: https://flatbuffers.dev/

[json]: https://www.json.org/json-en.html

[jsoncanon]: https://datatracker.ietf.org/doc/html/rfc8785

[msgpack]: https://msgpack.org/index.html

[rkyv]: https://rkyv.org/

[xml]: https://www.w3.org/TR/xml/

[xmlcanon]: https://www.w3.org/TR/xml-c14n11/#XMLCanonicalization

## FAQ

1. **Why another one?**

Because I can make one that does what I want.

Protobuf, for all its power and grace, is burdened with decades of legacy in
both stored data and usage in practice that [prevent it from changing][hy].
Bizarre corner case behaviors in practice that were originally implemented out
of expediency have deeply ramified themselves into the official specification of
the encoding (such as how repeated presence of nested messages in a non-repeated
field merges them together, etc.).

[hy]: https://www.hyrumslaw.com/

With a careful approach to a newer standard, we can solve many of these problems
and make a very similar encoding that is far more robust against shenanigans and
edge cases with little overhead (if fields are unordered, detecting that they
have repeated requires overhead, but if they *must* be ordered it is trivial).
Along with this, with only a little more work, we also achieve inherent
canonicalization for our distinguished message types. Accomplishing the same
thing in protobuf is an onerous task, and one I have almost never seen correctly
described in the wild. Quite a few people have, as the saying goes, tried and
died.

tl;dr: I had the conceit that I could make the protobuf encoding better. For my
personal purposes, this is true. Perhaps the same will be true for you as well.

2. **Could the Bilrost encoding be implemented as a serializer for
   [Serde][se]?**

Probably not, though `serde` experts are free to weigh in. There are multiple
complications with trying to serialize Bilrost messages with Serde:

- Bilrost fields bear a numbered tag, and currently there appears to be no
  mechanism suitable for this in `serde`.
- Bilrost fields are also associated with a specific encoding, such as `general`
  or `fixed`, which may alter their representation. Purely trait-based dispatch
  will work poorly for this, especially when the values become nested within
  other data structures like maps and `Vec` and encodings may begin to look
  like `map<plainbytes, packed<fixed>>`.
- Bilrost messages must encode their fields in tag order, which may (in the case
  of `oneof` fields) vary depending on their value, and it's not clear how or if
  this could be solved in `serde`.
- Bilrost has both relaxed and distinguished decoding modes, and promises that
  encoding a message that implements distinguished decoding always produces
  canonical output. This may be beyond what is practical to implement.

Despite all this, it is possible to place `serde` derive tags onto the generated
types, so the same structure can support both `bilrost` and `Serde`.

[se]: https://serde.rs/

## Why "Bilrost?"

Protocol Buffers, originating at Google, took on the portmanteau "protobuf". In
turn, Protobuf for Rust became "prost".

To fork that library, one might call it... "Frost"? But that name is taken.
"Bifrost" is a nice name, and a sort of pun on "frost, 2"; but that is also
taken. "Bilrost" is another name for the original Norse "Bifrost", and it is
quite nice, so here we are.

## License

`bilrost` is distributed under the terms of the Apache License (Version 2.0).

See [LICENSE](./LICENSE) & [NOTICE](./NOTICE) in the source for details, or the
[license][ghlicense] and [notice][ghnotice] on github.

[ghlicense]: https://github.com/mumbleskates/bilrost/blob/bilrost/LICENSE

[ghnotice]: https://github.com/mumbleskates/bilrost/blob/bilrost/NOTICE

Copyright 2023-2025 Kent Ross  
Copyright 2022 Dan Burkert & Tokio Contributors