kawa 0.6.8 - Docs.rs

- Maybe store separately:
  - headers
  - cookies
  - StatusLine
    - Block size from 104 to 48
- Store Static and Vec in indexed structs
  - Store size from 24 to 12
  - Block size from 104 to 52 (or 48 to 24)
- Handle expect 100
- Handle CONNECT/Upgrade
- Handle cookies (split/join)

- [ ] a
  - [ ] 

- 68% parsing
- 13% storage
-  8% process
- 11% ???

std::collections::VecDeque        => 2400 MB/s
crate::storage::VecDeque          => 2600 MB/s
crate::storage::VecDeque(NoClear) => 2850 MB/s

Optimized:
std::collections::VecDeque                 => 2700 MB/s
crate::storage::VecDeque                   => 3100 MB/s
crate::storage::VecDeque(NoClear)          => 3300 MB/s
std::collections::VecDeque(NoCookie)       => 3500 MB/s
crate::storage::VecDeque(NoCookie)         => 4200 MB/s
crate::storage::VecDeque(NoClear/NoCookie) => 4400 MB/s

10000000001000000000100000000010000000001000000000100000000010000000001000000000100000000010000000002222
1000000000100000000010000000001000000000100000000022
100000000010000000002222
AAAAAAAAAAAA

left.iter()
    .zip(right)
    .all(|(a, b)| *a | 0b00_10_00_00 == *b | 0b00_10_00_00)


[L1-L3] hardware RX queue DMA copy to kernel RX ring (indexed by skb)
[L4]    skb rearanged in read kernel queue (no copy)
[L5]    packet copy to userland, skb released (kernel RX ring freed)
[L6]    packet decrypted in plain text queue
[L7]    packet copy to Kawa buffers


[L6]    packet copy from Kawa to plain text queue, crypted in crypted queue
[L5]    copy packet to kernl TX ring, push skb in write kernel queue
[L4-L2] skb scheduled by qdisc in qdisc queue
[L1]    packet DMA copy to hardware hardware TX queue, skb released (kernel TX ring freed)

rutls:
- read_tls:
  - prepare_read: potentially grow buf Vec
  - read: read max bytes in buf
  - process_new_packets: push Vec<u8> to received_plaintext (Vec<Vec<u8>>)
  - 

overhead:
- context switch kernel/userspace
- packet copy kernel/userspace
- alloc skb
- per packet interrupt

DPDK/mTCP:
- RSS: split L2 stream in NIC queues, forward to bind CPU
- L2 stored in huge preallocated rings
- mTCP rearange on multithread
-