difa 0.1.1 - Docs.rs

# Heap Allocation Audit Report for ADIF Crate

## Executive Summary

Total Issues Identified: 32
- High Impact: 6 issues
- Medium Impact: 13 issues
- Low Impact: 13 issues

Hot Path Allocations per Record (estimated):
- Parsing: 3-5 allocations per field
- Writing: 3 allocations per field

Quick Win Opportunities: 8 issues fixable with minimal code changes

================================================================================
HIGH IMPACT ISSUES
================================================================================

1. FIELD NAME CASE CONVERSION IN PARSER (HOT PATH) ✅ DONE
Location: src/parse/mod.rs:97
Current:
  let name = String::from_utf8_lossy(name).to_lowercase();
Why it allocates:
  - from_utf8_lossy() returns Cow that may allocate
  - to_lowercase() always allocates new String
  - Called for EVERY field parsed
Optimization:
  let name = str::from_utf8(name)
      .map_err(|_| Self::invalid_tag(tag))?
      .to_ascii_lowercase();
Impact: HIGH - Called once per field in every record
Status: FIXED - Now uses str::from_utf8 and to_ascii_lowercase()

---

2. TYPE SPECIFIER CASE CONVERSION IN PARSER ✅ DONE
Location: src/parse/mod.rs:102
Current:
  let typ = typ.map(|t| String::from_utf8_lossy(t).to_lowercase());
Why it allocates:
  - Same pattern as field names
  - Type specifiers are typically single chars ('b', 'n', 'd', 't', 's')
Optimization:
  let typ = typ.map(|t| str::from_utf8(t)).transpose().map_err(|_| Self::invalid_tag(tag))?;
Impact: HIGH - Called once per typed field
Status: FIXED - Now uses str::from_utf8 (type lowercasing handled elsewhere)

---

3. STRING VALUE ALLOCATION IN PARSER ✅ DONE
Location: src/parse/mod.rs:81
Current:
  _ => Ok(Datum::String(s.to_string())),
Context: s is already Cow<str> from line 110
Why it allocates:
  - When s is already owned, we allocate again with to_string()
  - Should use into_owned() to avoid double allocation
Optimization:
  _ => Ok(Datum::String(s.to_owned())),
Impact: HIGH - Called for every untyped string field (most common)
Status: FIXED - Changed caller to use str::from_utf8 instead of String::from_utf8_lossy,
        eliminating the Cow allocation. Now single allocation from &str → String.

---

4. BOOLEAN STRING CONVERSION DURING PARSING ✅ DONE
Location: src/parse/mod.rs:64
Current:
  let b = match s.to_uppercase().as_str() {
      "Y" => true,
      "N" => false,
      _ => return Err(Self::invalid_tag(tag)),
  };
Why it allocates:
  - to_uppercase() allocates for single character comparison
  - Boolean values are always 'Y' or 'N' (single ASCII char)
Optimization:
  let b = match s {
      "Y" | "y" => true,
      "N" | "n" => false,
      _ => return Err(Self::invalid_tag(tag)),
  };
Impact: HIGH - Called for every boolean field
Status: FIXED - Now uses pattern matching instead of to_uppercase()

BONUS: Also refactored .map_err(|_| Self::invalid_tag(tag))? duplication
        into a helper closure: let err = || Self::invalid_tag(tag);

---

5. RECORD FIELD CLONING IN WRITE PATH ✅ DONE
Location: src/write/mod.rs:221
Current:
  for (name, value) in item.fields() {
      let field = Field::new(name.clone(), value.clone());
      Pin::new(&mut self.inner).start_send(Tag::Field(field))?;
  }
Why it allocates:
  - Clones both name (String) and value (Datum) for every field
  - Field is immediately consumed, so cloning unnecessary
Optimization:
  Change Field to support borrowing or encode directly without intermediate Field
Impact: HIGH - Called for every field during writing
Status: FIXED - Added Record::into_fields() consuming iterator that returns owned values,
        eliminating both clone() calls in src/write/mod.rs:221

---

6. VALUE LENGTH CONVERSION TO STRING IN WRITER ✅ DONE
Location: src/write/mod.rs:133
Current:
  dst.put_slice(value.len().to_string().as_bytes());
Why it allocates:
  - to_string() allocates for simple integer
  - Frequent allocation in write path
Optimization:
  Use itoa crate or manual buffer:
  let mut buf = itoa::Buffer::new();
  dst.put_slice(buf.format(value.len()).as_bytes());
Impact: MEDIUM-HIGH - Called once per field during writing
Status: FIXED - Added itoa dependency and replaced to_string() with stack-allocated
        itoa::Buffer in src/write/mod.rs:133-134

================================================================================
MEDIUM IMPACT ISSUES
================================================================================

7. DATUM::AS_STR() FORMAT ALLOCATIONS
Location: src/lib.rs:146-150
Current:
  Self::Number(n) => Some(Cow::Owned(n.to_string())),
  Self::Date(d) => Some(Cow::Owned(d.format("%Y%m%d").to_string())),
  Self::Time(t) => Some(Cow::Owned(t.format("%H%M%S").to_string())),
  Self::DateTime(dt) => Some(Cow::Owned(dt.format("%Y%m%d %H%M%S").to_string())),
Why it allocates:
  - Every format call allocates
  - Called during write operations and conversions
Optimization:
  Pre-allocate buffer or use thread-local buffer for formatting
Impact: MEDIUM - Called during write operations

---

8. DATUM::AS_BOOL() STRING ALLOCATION ✅ DONE
Location: src/lib.rs:83
Current:
  Self::String(s) => match s.to_uppercase().as_str() {
      "Y" => Some(true),
      "N" => Some(false),
      _ => None,
  },
Why it allocates:
  - to_uppercase() allocates unnecessarily
Optimization:
  Self::String(s) => match s.as_str() {
      "Y" | "y" => Some(true),
      "N" | "n" => Some(false),
      _ => None,
  },
Impact: MEDIUM - Called during type coercion
Status: FIXED - Uses pattern matching instead of to_uppercase()

---

9. FILTER: BAND NORMALIZATION ✅ DONE
Location: src/filter/mod.rs:224
Current:
  let _ = record
      .insert(":band".to_string(), Datum::String(band.to_uppercase()));
Why it allocates:
  - to_uppercase() always allocates
  - ":band".to_string() allocates literal string
  - Wasteful if band already uppercase
Optimization:
  const BAND_FIELD: &str = ":band";
  let band = if band.chars().all(|c| c.is_uppercase() || !c.is_alphabetic()) {
      band.to_string()
  } else {
      band.to_uppercase()
  };
  let _ = record.insert(BAND_FIELD, Datum::String(band));
Impact: MEDIUM - Called per record when band normalization used
Status: FIXED - Uses const for field name, only uppercases if needed

---

10. FILTER: MODE NORMALIZATION ✅ DONE
Location: src/filter/mod.rs:193
Current:
  let _ = record.insert(":mode".to_string(), Datum::String(mode.to_string()));
Why it allocates:
  - ":mode".to_string() allocates literal
  - mode is &str, must allocate String
Optimization:
  const MODE_FIELD: &str = ":mode";
  let _ = record.insert(MODE_FIELD, Datum::String(mode.into_owned()));
Impact: MEDIUM - Called per record when mode normalization used
Status: FIXED - Uses const for field name and .into_owned() instead of .to_string()

---

11. FILTER: TIME FIELD NAME ALLOCATIONS ✅ DONE
Location: src/filter/mod.rs:136, 151
Current:
  let _ = record.insert(":time_on".to_string(), Datum::DateTime(dt));
  let _ = record.insert(":time_off".to_string(), Datum::DateTime(dt));
Why it allocates:
  - Literal strings allocated every time
Optimization:
  const TIME_ON_FIELD: &str = ":time_on";
  const TIME_OFF_FIELD: &str = ":time_off";
  let _ = record.insert(TIME_ON_FIELD, Datum::DateTime(dt));
  let _ = record.insert(TIME_OFF_FIELD, Datum::DateTime(dt));
Impact: MEDIUM - Called per record when time normalization used
Status: FIXED - Uses const for both field names

---

12. FILTER: EXCLUDE CALLSIGNS HASHSET
Location: src/filter/mod.rs:237-238
Current:
  let exclude: HashSet<String> =
      callsigns.iter().map(|c| c.to_uppercase()).collect();
Why it allocates:
  - Creates HashSet with uppercase copies of all callsigns
  - Could use case-insensitive comparison instead
Optimization:
  Use case-insensitive wrapper or just iterate if list is small:
  stream.filter(move |record| {
      let Some(call) = record.get("call").and_then(|c| c.as_str()) else {
          return true;
      };
      !callsigns.iter().any(|e| e.eq_ignore_ascii_case(&call))
  })
Impact: MEDIUM - One-time allocation, proportional to excluded callsign count

---

13. ERROR MESSAGE FORMATTING
Location: src/lib.rs:382
Current:
  Err(Error::InvalidFormat(format!("duplicate key: {}", e.key())))
Why it allocates:
  - format!() in error path
Optimization:
  let mut msg = String::with_capacity(20 + e.key().len());
  msg.push_str("duplicate key: ");
  msg.push_str(e.key());
  Err(Error::InvalidFormat(msg))
Impact: LOW-MEDIUM - Error paths only

---

14. PARSE ERROR: INVALID TAG
Location: src/parse/mod.rs:51
Current:
  Error::InvalidFormat(String::from_utf8_lossy(tag).to_string())
Why it allocates:
  - Double allocation: from_utf8_lossy creates Cow, then to_string()
Optimization:
  Error::InvalidFormat(String::from_utf8_lossy(tag).into_owned())
Impact: LOW - Error path only

---

15. PARSE ERROR: PARTIAL DATA
Location: src/parse/mod.rs:171
Current:
  Err(Error::InvalidFormat("partial data at end of stream".to_string()))
Why it allocates:
  - Allocating literal string constant
Optimization:
  const PARTIAL_DATA_MSG: &str = "partial data at end of stream";
  Err(Error::InvalidFormat(PARTIAL_DATA_MSG.to_owned()))
Impact: LOW - Error path only

---

16. WRITE ERROR: DATETIME MESSAGE
Location: src/write/mod.rs:118-121
Current:
  return Err(Error::InvalidFormat(
      "DateTime cannot be output directly; split into date and time fields"
          .to_string(),
  ));
Optimization:
  const DATETIME_ERROR_MSG: &str =
      "DateTime cannot be output directly; split into date and time fields";
  return Err(Error::InvalidFormat(DATETIME_ERROR_MSG.to_owned()));
Impact: LOW - Error path only

---

17. WRITE ERROR: STRING CONVERSION
Location: src/write/mod.rs:126
Current:
  let e = "Cannot convert value to string".to_string();
  Error::InvalidFormat(e)
Optimization:
  const CONVERT_ERROR_MSG: &str = "Cannot convert value to string";
  Error::InvalidFormat(CONVERT_ERROR_MSG.to_owned())
Impact: LOW - Error path only

---

18. PARSER: TAG SPLIT INTO VEC ✅ DONE
Location: src/parse/mod.rs:88
Current:
  let parts: Vec<&[u8]> = tag.split(|&b| b == b':').collect();
Why it allocates:
  - Collects into Vec for simple 2-3 part split
  - Could use iterator directly
Optimization:
  let mut parts = tag.split(|&b| b == b':');
  let (name, len, typ) = match (parts.next(), parts.next(), parts.next(), parts.next()) {
      (Some(name), Some(len), None, None) => (name, len, None),
      (Some(name), Some(len), Some(typ), None) => (name, len, Some(typ)),
      _ => return Err(Self::invalid_tag(tag)),
  };
Impact: MEDIUM - Called once per field during parsing
Status: FIXED - Replaced Vec collection with direct iterator pattern matching in src/parse/mod.rs:90-95

---

19. FIELD::NAME STORED AS STRING
Location: src/lib.rs:201
Current:
  pub struct Field {
      name: String,
      value: Datum,
  }
Why it allocates:
  - Field names always allocated
  - Could use Box<str> for immutable strings (saves capacity overhead)
Optimization:
  pub struct Field {
      name: Box<str>,  // or Arc<str> for sharing
      value: Datum,
  }
Impact: LOW-MEDIUM - Saves 8 bytes per field (capacity), adds complexity

================================================================================
LOW IMPACT ISSUES (20-32)
================================================================================

Issues 20-32: Various allocations in test code and error paths
- Test code allocations don't affect production performance
- Error message literals in test assertions
- format!() calls in test code

Impact: NONE - Test code only

================================================================================
RECOMMENDATIONS BY PRIORITY
================================================================================

PRIORITY 1: IMMEDIATE WINS (High Impact, Low Risk)
--------------------------------------------------
1. Fix parser case conversions (#1, #2, #4)
   - Use ASCII operations instead of UTF-8 uppercase/lowercase

2. Fix Datum::String double allocation (#3)
   - Use into_owned() instead of to_string()

3. Fix filter field name literals (#9, #10, #11)
   - Use const &str instead of .to_string()

4. Fix as_bool allocation (#8)
   - Pattern match instead of .to_uppercase()

Estimated Impact: 4-6 allocations eliminated per field during parsing

PRIORITY 2: MEDIUM EFFORT, HIGH REWARD
---------------------------------------
1. Avoid cloning in RecordSink (#5)
   - Requires API redesign but eliminates 2 allocations per field during write

2. Optimize tag split (#18)
   - Use iterator instead of collecting Vec

3. Optimize itoa (#6)
   - Use fast integer formatting

Estimated Impact: 3-4 allocations eliminated per field during writing

PRIORITY 3: ADVANCED OPTIMIZATIONS
-----------------------------------
1. Datum formatting optimizations (#7)
   - Use buffers or thread-locals

2. Case-insensitive comparisons (#12)
   - Avoid HashSet with uppercase strings

3. Field name interning (#19)
   - Use Arc<str> or Box<str> for field names

Estimated Impact: Reduces memory footprint in specialized use cases

================================================================================
TESTING RECOMMENDATIONS
================================================================================

After implementing optimizations:

1. Run cargo test to ensure correctness
2. Run cargo bench if benchmarks exist
3. Profile with real ADIF files using:
   - valgrind --tool=massif for heap profiling
   - cargo flamegraph for allocation flamegraphs
   - dhat for detailed allocation tracking

4. Measure before/after:
   - Total allocations per record
   - Peak memory usage
   - Parsing/writing throughput