- Maybe store separately:
- headers
- cookies
- StatusLine
- Block size from 104 to 48
- Store Static and Vec in indexed structs
- Store size from 24 to 12
- Block size from 104 to 52 (or 48 to 24)
- Handle expect 100
- Handle CONNECT/Upgrade
- Handle cookies (split/join)
- [ ] a
- [ ]
- 68% parsing
- 13% storage
- 8% process
- 11% ???
std::collections::VecDeque => 2400 MB/s
crate::storage::VecDeque => 2600 MB/s
crate::storage::VecDeque(NoClear) => 2850 MB/s
Optimized:
std::collections::VecDeque => 2700 MB/s
crate::storage::VecDeque => 3100 MB/s
crate::storage::VecDeque(NoClear) => 3300 MB/s
std::collections::VecDeque(NoCookie) => 3500 MB/s
crate::storage::VecDeque(NoCookie) => 4200 MB/s
crate::storage::VecDeque(NoClear/NoCookie) => 4400 MB/s
10000000001000000000100000000010000000001000000000100000000010000000001000000000100000000010000000002222
1000000000100000000010000000001000000000100000000022
100000000010000000002222
AAAAAAAAAAAA
left.iter()
.zip(right)
.all(|(a, b)| *a | 0b00_10_00_00 == *b | 0b00_10_00_00)
[L1-L3] hardware RX queue DMA copy to kernel RX ring (indexed by skb)
[L4] skb rearanged in read kernel queue (no copy)
[L5] packet copy to userland, skb released (kernel RX ring freed)
[L6] packet decrypted in plain text queue
[L7] packet copy to Kawa buffers
[L6] packet copy from Kawa to plain text queue, crypted in crypted queue
[L5] copy packet to kernl TX ring, push skb in write kernel queue
[L4-L2] skb scheduled by qdisc in qdisc queue
[L1] packet DMA copy to hardware hardware TX queue, skb released (kernel TX ring freed)
rutls:
- read_tls:
- prepare_read: potentially grow buf Vec
- read: read max bytes in buf
- process_new_packets: push Vec<u8> to received_plaintext (Vec<Vec<u8>>)
-
overhead:
- context switch kernel/userspace
- packet copy kernel/userspace
- alloc skb
- per packet interrupt
DPDK/mTCP:
- RSS: split L2 stream in NIC queues, forward to bind CPU
- L2 stored in huge preallocated rings
- mTCP rearange on multithread
-