difa 0.1.1

Parsing of Amateur Data Interchange Format (ADIF) files
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
# Heap Allocation Audit Report for ADIF Crate

## Executive Summary

Total Issues Identified: 32
- High Impact: 6 issues
- Medium Impact: 13 issues
- Low Impact: 13 issues

Hot Path Allocations per Record (estimated):
- Parsing: 3-5 allocations per field
- Writing: 3 allocations per field

Quick Win Opportunities: 8 issues fixable with minimal code changes

================================================================================
HIGH IMPACT ISSUES
================================================================================

1. FIELD NAME CASE CONVERSION IN PARSER (HOT PATH) ✅ DONE
Location: src/parse/mod.rs:97
Current:
  let name = String::from_utf8_lossy(name).to_lowercase();
Why it allocates:
  - from_utf8_lossy() returns Cow that may allocate
  - to_lowercase() always allocates new String
  - Called for EVERY field parsed
Optimization:
  let name = str::from_utf8(name)
      .map_err(|_| Self::invalid_tag(tag))?
      .to_ascii_lowercase();
Impact: HIGH - Called once per field in every record
Status: FIXED - Now uses str::from_utf8 and to_ascii_lowercase()

---

2. TYPE SPECIFIER CASE CONVERSION IN PARSER ✅ DONE
Location: src/parse/mod.rs:102
Current:
  let typ = typ.map(|t| String::from_utf8_lossy(t).to_lowercase());
Why it allocates:
  - Same pattern as field names
  - Type specifiers are typically single chars ('b', 'n', 'd', 't', 's')
Optimization:
  let typ = typ.map(|t| str::from_utf8(t)).transpose().map_err(|_| Self::invalid_tag(tag))?;
Impact: HIGH - Called once per typed field
Status: FIXED - Now uses str::from_utf8 (type lowercasing handled elsewhere)

---

3. STRING VALUE ALLOCATION IN PARSER ✅ DONE
Location: src/parse/mod.rs:81
Current:
  _ => Ok(Datum::String(s.to_string())),
Context: s is already Cow<str> from line 110
Why it allocates:
  - When s is already owned, we allocate again with to_string()
  - Should use into_owned() to avoid double allocation
Optimization:
  _ => Ok(Datum::String(s.to_owned())),
Impact: HIGH - Called for every untyped string field (most common)
Status: FIXED - Changed caller to use str::from_utf8 instead of String::from_utf8_lossy,
        eliminating the Cow allocation. Now single allocation from &str → String.

---

4. BOOLEAN STRING CONVERSION DURING PARSING ✅ DONE
Location: src/parse/mod.rs:64
Current:
  let b = match s.to_uppercase().as_str() {
      "Y" => true,
      "N" => false,
      _ => return Err(Self::invalid_tag(tag)),
  };
Why it allocates:
  - to_uppercase() allocates for single character comparison
  - Boolean values are always 'Y' or 'N' (single ASCII char)
Optimization:
  let b = match s {
      "Y" | "y" => true,
      "N" | "n" => false,
      _ => return Err(Self::invalid_tag(tag)),
  };
Impact: HIGH - Called for every boolean field
Status: FIXED - Now uses pattern matching instead of to_uppercase()

BONUS: Also refactored .map_err(|_| Self::invalid_tag(tag))? duplication
        into a helper closure: let err = || Self::invalid_tag(tag);

---

5. RECORD FIELD CLONING IN WRITE PATH ✅ DONE
Location: src/write/mod.rs:221
Current:
  for (name, value) in item.fields() {
      let field = Field::new(name.clone(), value.clone());
      Pin::new(&mut self.inner).start_send(Tag::Field(field))?;
  }
Why it allocates:
  - Clones both name (String) and value (Datum) for every field
  - Field is immediately consumed, so cloning unnecessary
Optimization:
  Change Field to support borrowing or encode directly without intermediate Field
Impact: HIGH - Called for every field during writing
Status: FIXED - Added Record::into_fields() consuming iterator that returns owned values,
        eliminating both clone() calls in src/write/mod.rs:221

---

6. VALUE LENGTH CONVERSION TO STRING IN WRITER ✅ DONE
Location: src/write/mod.rs:133
Current:
  dst.put_slice(value.len().to_string().as_bytes());
Why it allocates:
  - to_string() allocates for simple integer
  - Frequent allocation in write path
Optimization:
  Use itoa crate or manual buffer:
  let mut buf = itoa::Buffer::new();
  dst.put_slice(buf.format(value.len()).as_bytes());
Impact: MEDIUM-HIGH - Called once per field during writing
Status: FIXED - Added itoa dependency and replaced to_string() with stack-allocated
        itoa::Buffer in src/write/mod.rs:133-134

================================================================================
MEDIUM IMPACT ISSUES
================================================================================

7. DATUM::AS_STR() FORMAT ALLOCATIONS
Location: src/lib.rs:146-150
Current:
  Self::Number(n) => Some(Cow::Owned(n.to_string())),
  Self::Date(d) => Some(Cow::Owned(d.format("%Y%m%d").to_string())),
  Self::Time(t) => Some(Cow::Owned(t.format("%H%M%S").to_string())),
  Self::DateTime(dt) => Some(Cow::Owned(dt.format("%Y%m%d %H%M%S").to_string())),
Why it allocates:
  - Every format call allocates
  - Called during write operations and conversions
Optimization:
  Pre-allocate buffer or use thread-local buffer for formatting
Impact: MEDIUM - Called during write operations

---

8. DATUM::AS_BOOL() STRING ALLOCATION ✅ DONE
Location: src/lib.rs:83
Current:
  Self::String(s) => match s.to_uppercase().as_str() {
      "Y" => Some(true),
      "N" => Some(false),
      _ => None,
  },
Why it allocates:
  - to_uppercase() allocates unnecessarily
Optimization:
  Self::String(s) => match s.as_str() {
      "Y" | "y" => Some(true),
      "N" | "n" => Some(false),
      _ => None,
  },
Impact: MEDIUM - Called during type coercion
Status: FIXED - Uses pattern matching instead of to_uppercase()

---

9. FILTER: BAND NORMALIZATION ✅ DONE
Location: src/filter/mod.rs:224
Current:
  let _ = record
      .insert(":band".to_string(), Datum::String(band.to_uppercase()));
Why it allocates:
  - to_uppercase() always allocates
  - ":band".to_string() allocates literal string
  - Wasteful if band already uppercase
Optimization:
  const BAND_FIELD: &str = ":band";
  let band = if band.chars().all(|c| c.is_uppercase() || !c.is_alphabetic()) {
      band.to_string()
  } else {
      band.to_uppercase()
  };
  let _ = record.insert(BAND_FIELD, Datum::String(band));
Impact: MEDIUM - Called per record when band normalization used
Status: FIXED - Uses const for field name, only uppercases if needed

---

10. FILTER: MODE NORMALIZATION ✅ DONE
Location: src/filter/mod.rs:193
Current:
  let _ = record.insert(":mode".to_string(), Datum::String(mode.to_string()));
Why it allocates:
  - ":mode".to_string() allocates literal
  - mode is &str, must allocate String
Optimization:
  const MODE_FIELD: &str = ":mode";
  let _ = record.insert(MODE_FIELD, Datum::String(mode.into_owned()));
Impact: MEDIUM - Called per record when mode normalization used
Status: FIXED - Uses const for field name and .into_owned() instead of .to_string()

---

11. FILTER: TIME FIELD NAME ALLOCATIONS ✅ DONE
Location: src/filter/mod.rs:136, 151
Current:
  let _ = record.insert(":time_on".to_string(), Datum::DateTime(dt));
  let _ = record.insert(":time_off".to_string(), Datum::DateTime(dt));
Why it allocates:
  - Literal strings allocated every time
Optimization:
  const TIME_ON_FIELD: &str = ":time_on";
  const TIME_OFF_FIELD: &str = ":time_off";
  let _ = record.insert(TIME_ON_FIELD, Datum::DateTime(dt));
  let _ = record.insert(TIME_OFF_FIELD, Datum::DateTime(dt));
Impact: MEDIUM - Called per record when time normalization used
Status: FIXED - Uses const for both field names

---

12. FILTER: EXCLUDE CALLSIGNS HASHSET
Location: src/filter/mod.rs:237-238
Current:
  let exclude: HashSet<String> =
      callsigns.iter().map(|c| c.to_uppercase()).collect();
Why it allocates:
  - Creates HashSet with uppercase copies of all callsigns
  - Could use case-insensitive comparison instead
Optimization:
  Use case-insensitive wrapper or just iterate if list is small:
  stream.filter(move |record| {
      let Some(call) = record.get("call").and_then(|c| c.as_str()) else {
          return true;
      };
      !callsigns.iter().any(|e| e.eq_ignore_ascii_case(&call))
  })
Impact: MEDIUM - One-time allocation, proportional to excluded callsign count

---

13. ERROR MESSAGE FORMATTING
Location: src/lib.rs:382
Current:
  Err(Error::InvalidFormat(format!("duplicate key: {}", e.key())))
Why it allocates:
  - format!() in error path
Optimization:
  let mut msg = String::with_capacity(20 + e.key().len());
  msg.push_str("duplicate key: ");
  msg.push_str(e.key());
  Err(Error::InvalidFormat(msg))
Impact: LOW-MEDIUM - Error paths only

---

14. PARSE ERROR: INVALID TAG
Location: src/parse/mod.rs:51
Current:
  Error::InvalidFormat(String::from_utf8_lossy(tag).to_string())
Why it allocates:
  - Double allocation: from_utf8_lossy creates Cow, then to_string()
Optimization:
  Error::InvalidFormat(String::from_utf8_lossy(tag).into_owned())
Impact: LOW - Error path only

---

15. PARSE ERROR: PARTIAL DATA
Location: src/parse/mod.rs:171
Current:
  Err(Error::InvalidFormat("partial data at end of stream".to_string()))
Why it allocates:
  - Allocating literal string constant
Optimization:
  const PARTIAL_DATA_MSG: &str = "partial data at end of stream";
  Err(Error::InvalidFormat(PARTIAL_DATA_MSG.to_owned()))
Impact: LOW - Error path only

---

16. WRITE ERROR: DATETIME MESSAGE
Location: src/write/mod.rs:118-121
Current:
  return Err(Error::InvalidFormat(
      "DateTime cannot be output directly; split into date and time fields"
          .to_string(),
  ));
Optimization:
  const DATETIME_ERROR_MSG: &str =
      "DateTime cannot be output directly; split into date and time fields";
  return Err(Error::InvalidFormat(DATETIME_ERROR_MSG.to_owned()));
Impact: LOW - Error path only

---

17. WRITE ERROR: STRING CONVERSION
Location: src/write/mod.rs:126
Current:
  let e = "Cannot convert value to string".to_string();
  Error::InvalidFormat(e)
Optimization:
  const CONVERT_ERROR_MSG: &str = "Cannot convert value to string";
  Error::InvalidFormat(CONVERT_ERROR_MSG.to_owned())
Impact: LOW - Error path only

---

18. PARSER: TAG SPLIT INTO VEC ✅ DONE
Location: src/parse/mod.rs:88
Current:
  let parts: Vec<&[u8]> = tag.split(|&b| b == b':').collect();
Why it allocates:
  - Collects into Vec for simple 2-3 part split
  - Could use iterator directly
Optimization:
  let mut parts = tag.split(|&b| b == b':');
  let (name, len, typ) = match (parts.next(), parts.next(), parts.next(), parts.next()) {
      (Some(name), Some(len), None, None) => (name, len, None),
      (Some(name), Some(len), Some(typ), None) => (name, len, Some(typ)),
      _ => return Err(Self::invalid_tag(tag)),
  };
Impact: MEDIUM - Called once per field during parsing
Status: FIXED - Replaced Vec collection with direct iterator pattern matching in src/parse/mod.rs:90-95

---

19. FIELD::NAME STORED AS STRING
Location: src/lib.rs:201
Current:
  pub struct Field {
      name: String,
      value: Datum,
  }
Why it allocates:
  - Field names always allocated
  - Could use Box<str> for immutable strings (saves capacity overhead)
Optimization:
  pub struct Field {
      name: Box<str>,  // or Arc<str> for sharing
      value: Datum,
  }
Impact: LOW-MEDIUM - Saves 8 bytes per field (capacity), adds complexity

================================================================================
LOW IMPACT ISSUES (20-32)
================================================================================

Issues 20-32: Various allocations in test code and error paths
- Test code allocations don't affect production performance
- Error message literals in test assertions
- format!() calls in test code

Impact: NONE - Test code only

================================================================================
RECOMMENDATIONS BY PRIORITY
================================================================================

PRIORITY 1: IMMEDIATE WINS (High Impact, Low Risk)
--------------------------------------------------
1. Fix parser case conversions (#1, #2, #4)
   - Use ASCII operations instead of UTF-8 uppercase/lowercase

2. Fix Datum::String double allocation (#3)
   - Use into_owned() instead of to_string()

3. Fix filter field name literals (#9, #10, #11)
   - Use const &str instead of .to_string()

4. Fix as_bool allocation (#8)
   - Pattern match instead of .to_uppercase()

Estimated Impact: 4-6 allocations eliminated per field during parsing

PRIORITY 2: MEDIUM EFFORT, HIGH REWARD
---------------------------------------
1. Avoid cloning in RecordSink (#5)
   - Requires API redesign but eliminates 2 allocations per field during write

2. Optimize tag split (#18)
   - Use iterator instead of collecting Vec

3. Optimize itoa (#6)
   - Use fast integer formatting

Estimated Impact: 3-4 allocations eliminated per field during writing

PRIORITY 3: ADVANCED OPTIMIZATIONS
-----------------------------------
1. Datum formatting optimizations (#7)
   - Use buffers or thread-locals

2. Case-insensitive comparisons (#12)
   - Avoid HashSet with uppercase strings

3. Field name interning (#19)
   - Use Arc<str> or Box<str> for field names

Estimated Impact: Reduces memory footprint in specialized use cases

================================================================================
TESTING RECOMMENDATIONS
================================================================================

After implementing optimizations:

1. Run cargo test to ensure correctness
2. Run cargo bench if benchmarks exist
3. Profile with real ADIF files using:
   - valgrind --tool=massif for heap profiling
   - cargo flamegraph for allocation flamegraphs
   - dhat for detailed allocation tracking

4. Measure before/after:
   - Total allocations per record
   - Peak memory usage
   - Parsing/writing throughput