parsely-rs 0.1.6

Macro-based struct serialization/deserialization
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
# Development Notes

Random notes/thoughts while developing.

### byte order modeling

At first I had the ParselyRead trait look like this:

```rust
pub trait ParselyRead<Ctx>: Sized {
    fn read<T: ByteOrder, B: BitRead>(buf: &mut B, ctx: Ctx) -> ParselyResult<Self>;
}
```

but then I realized that when reading a struct, I had to specify the ByteOrder,
even though I planned to annotate that on the struct itself.  I guess it could
be argued that doing it on the struct _technically_ doesn't make sense, since
you could read a given struct from multiple places?  So maybe it _should_ be on
the read call, and then all other values inherit it?

If we go this direction, then I don't think having a byteorder attribute at the
parsely level makes sense?  And the trait definition can stay as it is.

--> Decided to go with this: it makes the most sense for the byteorder to be
determined at call time, not at definition time.  The byteorder will have to be
passed to the initial read call and then will propagate to all nested calls.

### 'assertion' attribute

I originally had a 'fixed' attribute that required a field be set to a specific
value, but I saw that I'd also had an 'assertion' attribute which seems to
cover the 'fixed' use case plus more (supporting not only a specific value, but
other types of assertions).  My original idea for 'fixed' might have been that
it would automatically use that fixed value for writing, but the field would
still need to be set in the struct, so I'm not sure it adds much.  We can still
enforce the assertion on write as well, so will try just 'assertion' for now.

### passing through context values

I usually format packet "layers" like this:

```
struct Header { ... }

struct SomePacket { 
  header: Header,
  ...other fields
}
```

This makes it easy to progressively parse them: parse the header to figure out
how to parse the rest of the payload, etc. But this means that part of a struct
has already been parsed/comes from somewhere else.  So, in addition to using
fields of that previously-parsed piece as context, it also needs to know not to
try and read that field from the buffer.

Should this try and leverage the more generic 'custom_reader' type attribute
(to come later)?  Maybe we can avoid that altogether and just defer to newtypes
and custom impls of ParselyRead for them?  Either way, I'm not sure we'd want
to re-use that for this use case, since that will return a result and this is
intended to be more of a direct "pass-through" assignment.  Could this even
require an Ident, not an Expr?  If it's set around this idea of 'from_context',
then that would be fine.

Note on the current implementation of this: for consistency it wraps the
assignment in an 'Ok' and then the same 'with_context' that would be added to
any field read is added as well, even though it's totally redundant in this
case (because it's just a simple assignment).  They _could_ be given special
treatment and not have that added, but that'd require a unique code path and,
for now at least, I think it'd be better to keep things aligned.

### 'mapping' a field

My initial use case for a way to map a field (i.e. read a different type than
what's defined in the struct and then apply some mapping to it to arrive at the
final type) was for `String` fields in structs.  I took some inspiration from
Deku's implementation but still found it didn't work very well for reading a
`String` as a `Vec<u8>` and then mapping it because reading `Vec<u8>` itself
takes a fair bit of special handling (including additional attributes like
'count').  Deku appears to punt on this and recommends using custom readers for
these cases, which seems like it may make sense.  Will probably play with some
implementations of that and see how they feel.

An interesting scenario came up with regards to the 'write' map path vs the 'read':

Both sides want to take advantage of type inference to avoid having to make the
caller explicitly state types.  Assuming a struct like this:

```rust
#[derive(ParselyRead, ParselyWrite)]
struct Foo {
    #[parsely_read(map = "|v: u8| { v.to_string() }")]
    #[parsely_write(map = "|v: &str| { v.parse::<u8>() }")]
    value: String,
}
```

the generated read code can look like:

```rust
  let original_value = ::parsely_rs::ParselyRead::read::<T>(buf, ()).with_context(...)?;
  (|v: u8| { v.to_string() })(original_value)
      .with_context(...)?;

```

In this case the `read` map function isn't ambiguous about its type, but even
if it was it would be inferred ok because eventually we're going to assign it
to the field which _does_ have an explicit type, so it can be inferred from
there.  The write side is trickier, but mainly because i didn't want to have to
force the caller to explicitly return a result, i wanted to be able to handle a
function that returned either a raw value or a result that could be converted
into a ParselyResult.  The write code looks like this:

```rust
  let mapped_value = (|v: &str| { v.parse() })(&self.value)
      .with_context(...)?;
  ::parsely_rs::ParselyWrite::write::<T>(&mapped_value, buf, ())
      .with_context(...)?;
```

We need to somehow wrap the result of the function in such a way that we end up with a `ParselyResult<T>` regardless of whether it returned some raw type T or some kind of Result<T, E>.

The first solution that comes to mind to try and do that is this:

```rust
pub trait IntoParselyResult<T> {
    fn into_parsely_result(self) -> ParselyResult<T>;
}

impl<T, E> IntoParselyResult<T> for Result<T, E>
where
    E: Into<anyhow::Error>,
{
    fn into_parsely_result(self) -> ParselyResult<T> {
        self.map_err(Into::into)
    }
}

impl<T> IntoParselyResult<T> for T {
    fn into_parsely_result(self) -> ParselyResult<T> {
        Ok(self)
    }
}
```

The problem is that for the case where the expression returns `Result<T, E>`
there's ambiguity: it can't tell if what we want is `ParselyResult<T>` or
`ParselyResult<Result<T, E>>` so the compiler complains about both
implementations being possible.  I tried a bunch of different wrappers and
macros to try and work around this but couldn't come up with anything.  Finally
my only idea was to tweak the implementation for `T` slightly:

```rust
impl<T> IntoParselyResult<T> for T where T: ParselyWrite {
    fn into_parsely_result(self) -> ParselyResult<T> {
        Ok(self)
    }
}
```

Since we don't impl `ParselyWrite` for any `Result` type (and I don't think
we'd have a need to) this disambiguates the two cases enough that there's no
issue trying to infer which one touse.

In order to do this, though, we'll need to change the definition of
`ParselyWrite` to use associated types instead of generics, but I think that
may actually make more sense anyway.

is there an inherent "ordering" to all the attributes that we could always apply in a consistent way?

context reads (assigning the context tuple variables into named variables defined by the 'required_context' attribute): I think these can always be first

determining the type we'll actually read from the buffer: this will be the 'nested' type if its an Option or a Vec, for example.

context values that need to be passed to the read method of the type: this just needs to happen before the read call

generating the actual read call: this needs to know the read type and the context and if it's a collection

fixed values: if the fixed attribute is present, this needs to be appended to the read call in an and_then block

error context: this needs to be added at the very end of the read call

optional field: if the field is optional, then the entire read call above gets wrapped in a conditional block that checks if the when clause passes

Reader {
  plain {
    type
  },
  collection {
    inner_type,
    count
  },
  assign {
    value
  },
  reader_func {
    ident
  },
  map {
    mapper
  }
}

wrapper {
  option {
    when,
    Reader,
  },
  assertion {
    assertion_expr
  }
}

plain read:
  <type>::read(...)

collection read:
  (0..COUNT_EXPR).map(|idx| {
    PLAIN_READ.with_context(|| idx...)
  })

option read:
  if WHEN_EXPR {
    Some(PLAIN_READ.with_context(...))
  } else {
    None
  }

assign from:
  Ok(value)

reader:
  READER::read(...)

// mapping is particularly special because we never explicitly identify the
// original type or the mapped type: we let the mapper function work to infer
// those for us so we can avoid that.  (technically knowing the mapped type
// isn't hard, as it's the type of the field...)
map:
  let original = ParselyRead::read(...).with_context(...)
  let mapped = (MAPPER)(original).with_context(...)
  Ok(mapped)

assertion:
  currently the assertion is done as an and_then and appended onto whatever the output is (but _before_ the 'option' wrapper is added)

### 'Linked' fields

Often times two fields are related, e.g. a length field and its corresponding
vector.  Sometimes these are two "peer" fields in the same type, and sometimes
they're spread across types (for example: the length field in an rtcp header is
a function of the size of the rtcp payload).  I believe this link should impact
both reading and writing:

When reading, it might influence/impact what should be considered the "end" of
a buffer when trying to read the rest of a payload (even though the actual
buffer may go beyond it, i.e. in the case of compound rtcp or a different
field).

When writing, it needs to go the other direction: the size of some payload
needs to be reflected "upward" into a length field.

For example, an RTCP BYE packet:

When reading:

First we read the header.  The length value from the header should be used to
create a bounded subcursor to be used when reading the payload, this way the
payload reading doesn't risk reading beyond where it should.  In this case,
this would take place "outside" of Parsely, since those two operations are
independent.  For 'peer' fields this is also already covered by something like
the 'count' attribute.  Will see if other cases emerge that require different
handling.

When writing:
Before writing the header, the length field should be sync'd to reflect the
proper length of the packet.  This makes me think we should prevent/discourage
this field from being written to manually.  Should parsely define method to
access fields as well?

We _could_ do this using context for the write path:

```rust
#[parsely_write(required_context("payload_length_bytes: u32"))]
struct RtcpHeader { 
  ...
  // Ignore the current value of length_field (_v) completely and don't even
  // touch the field: just // use the map function as a hook to calculate the
  // value that should be written
  #[parsely_write(map("|_v: &u16 | payload_length_bytes / 4 + 1))]
  length_field: u16
}

struct RtcpByePacket {
  #[parsely_write(context("self.length_bytes()"))]
  header: RtcpHeader,
  ...
}
```

This feels a little weird, since there's already an existing field that models
this length, though.  And I'm not sure how we'd take that context into account
when writing `length_bytes`, using `map` maybe?

```rust
#[parsely_write(sync_args("payload_length_bytes: u32"))]
struct RtcpHeader { ... }

impl RtcpHeader {
  fn sync<(u32,)>(&mut self, (payload_length_bytes,): (u32,)) {
    self.length_bytes = payload_length_bytes / 4 + 1;
  } 
}

struct RtcpByePacket {
  #[parsely_write(sync("self.length_bytes()"))]
  header: RtcpHeader,
  ...
}
```

This is another option, where we denote that a field needs updating before
writing, so the `ParselyWrite` impl would first call `sync` on that field with
the given arguments before calling write on it.

This feels better, I think.  It does mean manual implementations have to
remember to call sync, whereas doing it via the write context guarantees it
would need to be passed, though.

--> My initial thought is that, although a separate method call means there's
something to possibly miss, the 'sync' style feels better, so going to give
that a shot.

So I took a look at this and it ends up still being pretty sticky.  One thing I
ran into is that `ParselyWrite::write` currently takes `&self`, but if `write` is
going to automatically call `sync` then it'd need `&mut self` and I'm not sure
I want to do that.  So maybe the call to `sync` will have to be manual and it
will just be generated to do the right things?  Going with that for now.

TODO:
  can we get rid of the ambiguity of a plain "Ok" in sync_func? Could we make
it such that plain (no Ok needed) would also work?

Follow-up on this: I think that, when writing, the generated code should call
'sync' on every field, that way if the user forgets to set the 'sync_with'
attribute it'll cause a compile error (since the correct types won't be passed)
which will help eliminate some bugs.  For this to work, though, _every_ type
needs a sync method (including the built-ins) so we need a trait for this that
can be implemented for them.  Should this function be part of the ParselyWrite
trait? Or something separate? It'd be required for every ParselyWrite tpe, so
maybe just doing it in ParselyWrite is better?  But then the sync_with args
would need to be part of the ParselyWrite generics, which isn't great...

...Do we ever use Ctx when writing?  Could the sync type be the equivalent of
the context for writing?  I don't think context is needed for writing (I've
seen no use cases of it so far, and it seems like any write context), but
logically it seems like a use-case could exist for it...

Follow-up:
I've separated out the sync code into a trait (`StateSync`).  I've also moved
the arguments to an associated type, which means we can make `StateSync` a
supertrait of `ParselyWrite`.  The idea is that, when generating the write,
`sync` will be called on every field so there's no chance of it being missed.

One issue with this is built-in types: they need a default impl so that the
compiler doesn't complain about calling `sync` on them.  But sometimes we need
to sync fields that are built-in types.  For example, the length field of the
RtcpHeader needs to be sync'd using the sync args.

There are 3 aspect to synchronizing fields:

* `sync_args`: This is attached to a type definition and it describes the names
and types of the arguments that will be expected in that type's `StateSync`
impl.
* `sync_with`: This is attached to a field, and describes the values that will be
used when calling that field's `sync` method.
* `sync_func`: This is attached to a field and describes how _that field_
should be updated in the `sync` method for its surrounding type using the args
passed to it via `sync_args`

The last one is confusing.  Maybe it's better named 'sync_expr'?

### The buffer type

Currently, ParselyRead takes any buffer that is BitRead and ParselyWrite takes
any buffer that is BitWrite.  The issue here is that BitRead and BitWrite are
very limited: they can't seek, they can't create sub-buffers, they can't tell
how many bytes are remaining.  It would be nice to allow crates ways to add
additional trait bounds on the buffer type so that they can use their own types
that implement BitRead/BitRead but might also provide other functionality.

This might look like:

```rust
#[derive(ParselyRead)]
#[parsely(buffer_type = "BitRead + Seek + Sliceable + Sized")]
struct MyStruct { ... }
```

where those values would be added as trait bounds in the ParselyRead impl, e.g.:

```rust
imp ParselyRead<()> for MyStruct {
  fn read<T: ByteOrder, B: BitRead + Seek + Sliceable + Sized>(buf: &mut B) -> ParselyResult<Self> { ... }
}
or, obviously, you could create a trait to encompass those:

```rust
trait PacketBuffer: BitRead + Seek + Sliceable + Sized { ... }
...

#[derive(ParselyRead)]
#[parsely(buffer_type = "PacketBuffer")]
struct MyStruct { ... }
```

Some thoughts:

* I think a crate would have to be very consistent with their use of this: mixing and matching could lead to problems
* Is it possible to write an 'alias' for a macro attribute?

### Post hooks

Now that custom buffers can be set, we're in better shape to handle things like
consuming/adding padding.  I was trying to think about how we might do that
from a parsely perspective:

At first I was thinking of a `padded` attribute which would denote that padding
should be consumed after reading a field and added after writing a field, but
the core parsely-code only knows about BitRead/BitWrite, so it can't do padding
on its own.

So that means we need something more generic.  My next thought was a "post
hook" that would allow the caller to specify some operation that should happen
after the read/write.  For example:

```rust
#[derive(ParselyRead, ParselyWrite)]
#[parsely_read(buffer_type = "PacketBuffer")]
#[parsely_write(buffer_type = "PacketBufferMut")]
struct MyStruct {
  length: u8,
  #[parsely_read(count = "length", post = "buf.consume_padding()")]
  #[parsely_write(post = "buf.add_padding()")]
  data: Vec<u8>,
}
```

Where "consume_padding" and "add_padding" would be features of some custom
buffer type (PacketBuffer/PacketBufferMut here).

Maybe it should be "after"?

--> After the transition to Buf/BufMut, I think we can pull off handling
padding at the parsely/attribute level.  When reading, we can take the number
of remaining bytes in the buffer at the start of the read and then after it's
done and then compute how much padding we need to get aligned.  It assumes the
buffer _starts_ aligned, but I think that's reasonable.

It's similar on the write side: take the remaining amount at the start and at
the end and add any padding necessary.

### Custom reader/writer functions

I had these originally, but don't think they feel necessary as opposed to just
doing a custom ParselyRead/ParselyWrite impl.  This approach will require a
newtype in some cases where maybe you wouldn't otherwise have one, but at this
point I don't think it feels worth it.

### Separate 'read/write' and 'read/write_with_context'

I notice I'm getting pretty tired of having to pass an empty tuple in places
where no context is needed.  Could we do 2 methods, where if no context is
needed it could be called without a context arg?  I wouldn't want a type that
_does_ need context to be able to call the one without it, though.  Maybe a
special extension trait that's implemented for only ParselyRead<()>?  But I
have the buffer type in the trait now too, and specifying that might not be
doable?  The extension trait impl would have to be dynamically generated per
type, maybe?

### 'while' attribute for collections

I needed this in rtp to be able to keep reading until a buffer is empty, but
after doing the code I realized I had no good way to test it because the
built-in buf doesn't provide anything that could be used with it, which makes
me think putting it in parsely might not make sense.  This use case _could_
have used the custom reader attribute, maybe?

### Using ParselyRead/ParselyWrite with types other than BitBuf/BitBufMut

For things like RTP parsing, we don't want to use the BitBuf/BitBufMut
abstractions because they limit the efficiency we can get when we use
Bits/BitsMut directly.  It would be nice to be able to leverage
ParselyRead/ParselyWrite for other types as well: i.e. manually implementing it
for the RTP packet types.  The problem is that the trait bound
(BitBuf/BitBufMut) is built into the trait, so we can't really do that.  I
looked at changing the "buffer" type to be an associated type, but that has a
couple problems:

* We can't do `type Buf = impl BitBuf` or `type Buf = dyn BitBUf` so we lose
the ability to say "some BitBuf type".  We _could_ add another layer of
indirection here, but that's a bit of a bummer.
* Supporting multiple buffer types also makes me think that it'd be nice to be
able to provide different/optimized read/write impls for different types (e.g.
one for BitBuf and another when we know it's a Bits specifically or something)

--> look into a refactoring of the read/write traits to accomplish this

--> Once I cleaned up the design of BitBuf/BitBufMut/Bits/BitsMut, I found I
didn't end up really needing this.  So although it was kinda a neat trick, it
also added a lot of complexity (and is really verbose when you wanted to use a
different buffer type, which was annoying) so I ended up ripping this out.

### Order of operations for different attributes

As of this writing, for a ParselyRead impl the following attributes are supported:

* Assertion (assert on a read value)
* Map (map a read value)
* Alignment (consuming data after a read to reach the given alignment)
* Context (pass values when calling a field's read method)
* Limit (denote how many values a collection field contains)
* Assignment (assign a field to a value instead of reading it)
* When (denote when an optional field should be read)

Some of these attributes are pretty orthogonal to the others (alignment,
context) but others could potentially "interact" in different ways depending on
some desired order.  For example: if 'map' and 'assertion' are present, should
the assertion be performed on the read value before or after the map function
was applied?  There is no right answer to that question, of course: a caller
could want it either way.  So the question is how to implement support for
that.  Probably the order of the attributes should be used to determine the
order things are applied:

This example should:

1. Read the value
1. Run the assertion on the 'raw' value
1. Run the mapping function

```rust
#[derive(ParselyRead)]
struct Foo {
  #[parsely_read(assertion = "|v: &u8| v % 2 == 0", map = "|v: u8| v * 2")]
  field_one: u8
}
```

This example should:

1. Read the value
1. Run the mapping function
1. Run the assertion on the mapped value

```rust
#[derive(ParselyRead)]
struct Foo {
  #[parsely_read(map = "|v: u8| v * 2", assertion = "|v: &u8| v % 2 == 0")]
  field_one: u8
}
```

Order-sensitive interactions of attributes:
Assertion/Map - assertion run on raw value or mapped value
Map/Limit - map function run on individual items or entire collection? Not sure
both are really needed.

Given the limited number of interactions, I think for now I'll do an implicit,
hard-coded order-of-operations:

1. When - In a way this has the highest precedence because it determines if
   _any_ of the read code for this field is run at all
1. Assignment
1. Limit - For collection fields this determines how many times the 'core' read
   is run.
1. (Context - This is really orthogonal so don't think it has much effect
   either way)
1. Assertion - this is the first thing done after a value has been read from
   the buffer.  It's always done on the 'raw' (un-mapped) value.
1. Map
1. (Alignment - this is 'outside' the scope of a field's specific read, so is
   also pretty orthogonal)