mrrc 0.7.6

A Rust library for reading, writing, and manipulating MARC bibliographic records in ISO 2709 binary format
Documentation
# Pure Rust Single-Threaded Performance Profile Summary

**Objective:** Within-mode bottleneck analysis for pure Rust sequential implementation  
**Date:** 2026-01-08  
**Status:** Complete  

---

## Quick Links

| Document | Purpose |
|----------|---------|
| [PROFILING_PLAN.md]./PROFILING_PLAN.md | Methodology and tools |
| [RUST_SINGLE_THREADED_PROFILING_RESULTS.md]./RUST_SINGLE_THREADED_PROFILING_RESULTS.md | Detailed baseline results |
| [PHASE_2_DETAILED_ANALYSIS.md]./PHASE_2_DETAILED_ANALYSIS.md | CPU intensity and memory analysis |

---

## Executive Summary

Pure Rust (mrrc) single-threaded implementation shows excellent performance with well-understood bottlenecks.

### Performance Baseline: Excellent ✓
- **Throughput:** 1.06M rec/s (0.94 µs/record latency)
- **Field access overhead:** 2-5% minimal
- **Consistency:** Linear scaling across 1k-10k record files
- **Classification:** CPU-bound, not I/O or memory-bound

### No Algorithmic Bottlenecks Found
- Parsing logic is efficient
- nom parser performs well for MARC format
- Field lookups are fast
- Record structure is sound

### Memory Efficiency: Improvable ⚠️
- **Heap usage:** 39.4 MB for 10k records
- **Allocation overhead:** 73% of heap is metadata
- **Primary issue:** String headers (24B each) for fixed-size data
- **Opportunity:** Compact encoding for tags and indicators

---

## Performance Analysis

### Phase 1: Baseline Measurements

**Benchmark Results**

| Operation | Throughput | Latency | Overhead |
|-----------|-----------|---------|----------|
| Read only (1k) | 1.06M rec/s | 0.94 µs | baseline |
| Read only (10k) | 1.06M rec/s | 0.94 µs | +0.2% |
| Parse + field access (1k) | 1.03M rec/s | 0.96 µs | +4.6% |
| Parse + field access (10k) | 1.04M rec/s | 0.96 µs | +2.6% |
| Parse + JSON serialize | 299k rec/s | 3.3 ms | +254% (expected) |
| Parse + XML serialize | 249k rec/s | 4.0 ms | +326% (expected) |

**Interpretation:**
- Read operation alone: 1.06M rec/s (excellent)
- Field access adds minimal overhead (2-5%)
- Serialization is expensive, but expected (not a parsing issue)
- Linear scaling indicates predictable performance

---

### Phase 2: CPU and Memory Analysis

#### CPU Intensity
- **Estimated cycles per record:** 3,340 (at 3 GHz)
- **Classification:** Compute-bound
- **Implication:** Should parallelize well to multiple cores

#### Memory Allocation Hotspots (per 10k records)

| Allocation Type | Count | Total | Avg | % of Heap |
|-----------------|-------|-------|-----|-----------|
| Subfield data Strings | 500k | 25 MB | 50 B | 63% |
| Field Vec | 10k | 6.4 MB | 640 B | 16% |
| Tag Strings | 200k | 600 KB | 3 B | 1.5% |
| Indicator Strings | 200k | 400 KB | 2 B | 1% |
| Other overhead || 7 MB || 18% |
| **TOTAL** | **910k** | **39.4 MB** || **100%** |

#### Memory Inefficiency Breakdown

**Per-Record Memory Usage**
```
Actual data content:      ~2.6 KB
String headers (24B×70):  ~1.7 KB
Vec overhead:             ~0.3 KB
Other pointers/metadata:  ~0.7 KB
─────────────────────────────────
Total per record:         ~5.3 KB (ideal)
                          ~10.2 KB (actual, with overhead)
```

**Key Issue: String Header Overhead**

Every string uses 24-byte header, including fixed-size data:
```
Current approach:
  Tag "245"        → String { ptr, len, cap } + "245" = 27 bytes
  Indicator "10"   → String { ptr, len, cap } + "10"  = 26 bytes
  
Optimal approach:
  Tag "245"        → u16 (tag number) = 2 bytes
  Indicator "10"   → [u8; 2] = 2 bytes
```

---

## Identified Bottlenecks

### Bottleneck #1: Memory Allocation Metadata (High Impact)
- **Cost:** 73% of heap is metadata, only 27% is actual data
- **Root cause:** String headers for small, fixed-size data (tags, indicators)
- **Type:** Memory efficiency, not performance-critical
- **Opportunity:** Encode tags as u16, indicators as [u8; 2]

### Bottleneck #2: Vec Allocations for Fields (Medium Impact)
- **Cost:** 16% of heap for field Vecs
- **Root cause:** Every record allocates Vec, even for typical 20-field records
- **Opportunity:** Use SmallVec<[Field; 20]> to avoid heap for common case

### Bottleneck #3: Serialization Cost (Expected, Low Priority)
- **Cost:** +254-326% overhead for JSON/XML
- **Root cause:** serde_json and quick-xml are general-purpose serializers
- **Note:** This is expected, not a parsing bottleneck
- **Resolution:** Users should avoid serializing every record if possible

---

## What Limits Performance in This Mode?

In pure Rust single-threaded:
1. **CPU throughput** - The algorithm uses 3,340 cycles per record, which is reasonable for MARC parsing
2. **Memory bandwidth** - Reading 264-byte records from file and heap
3. **Serialization** - If outputting to JSON/XML (avoidable)

What does NOT limit performance:
- I/O (buffering is adequate)
- Parsing logic (nom is efficient)
- Field access (minimal overhead)
- Allocation frequency (expected patterns)

---

## Notes on File Sizes

Results are consistent across:
- **1k records** (257 KB)
- **10k records** (2.5 MB)

Scaling is linear, suggesting:
- No pathological behavior at larger scales
- Buffer cache hits for smaller files
- Consistent algorithm performance

---

## Recommendations for Optimization

For optimization proposals based on these profiling results, see **docs/design/OPTIMIZATION_PROPOSAL.md**.

Key findings that inform optimization decisions:
- Memory overhead is fixable (73% metadata)
- CPU efficiency is excellent (no algorithmic bottlenecks)
- Concurrency gap in Rust likely comes from work distribution, not parsing
- SmallVec and compact tag encoding have low risk and high ROI

---

## Confidence Levels

| Finding | Confidence | Basis |
|---------|-----------|-------|
| Throughput is 1.06M rec/s | Very High | Criterion.rs consistent measurements |
| No algorithm bottleneck | Very High | Detailed analysis + CPU profiling |
| Memory overhead is 73% metadata | High | Allocation analysis |
| Field access overhead is 2-5% | High | Consistent measurements |

---

## References

- Profiling methodology: `docs/design/profiling/PROFILING_PLAN.md`
- Detailed results: `docs/design/profiling/RUST_SINGLE_THREADED_PROFILING_RESULTS.md`
- CPU/memory details: `docs/design/profiling/PHASE_2_DETAILED_ANALYSIS.md`
- Optimization proposals: `docs/design/OPTIMIZATION_PROPOSAL.md`
- Related issues: mrrc-u33.2, mrrc-u33.1