ld-so-cache 0.1.0

A parser for glibc ld.so.cache files
Documentation
# ld.so.cache File Format Specification

## Overview

The `ld.so.cache` file is a binary cache format used by the GNU C Library's dynamic linker (`ld.so`) to accelerate library loading. This cache maps library names to their filesystem paths, avoiding expensive directory searches during program startup.

The cache supports multiple format versions for backward compatibility and includes hardware capability information for optimized library selection.

## File Structure

```
┌─────────────────────────┐
│     File Header         │
├─────────────────────────┤
│   Library Entries       │
│   (Array of entries)    │
├─────────────────────────┤
│    String Table         │
│ (Null-terminated strs)  │
├─────────────────────────┤
│  Extension Directory    │
│     (Optional)          │
└─────────────────────────┘
```

## Format Versions

### Legacy Format (Old Cache)

**Magic Number**: `"ld.so-1.7.0"` (11 bytes)

**File Header**:
```c
struct cache_file {
    char magic[11];           // "ld.so-1.7.0"
    uint32_t nlibs;          // Number of library entries
    // Followed by: struct file_entry libs[nlibs]
};
```

**Entry Format**:
```c
struct file_entry {
    int32_t flags;           // Library type flags (1 = ELF)
    uint32_t key;            // Offset to library name in string table
    uint32_t value;          // Offset to library path in string table
};
```

### New Format (Current)

**Magic Number**: `"glibc-ld.so.cache1.1"` (19 bytes total)
- Base magic: `"glibc-ld.so.cache"` (17 bytes)
- Version: `"1.1"` (3 bytes)

**File Header** (48 bytes):
```c
struct cache_file_new {
    char magic[17];          // "glibc-ld.so.cache"
    char version[3];         // "1.1"
    uint32_t nlibs;          // Number of library entries
    uint32_t len_strings;    // Size of string table in bytes
    uint8_t flags;           // Endianness and format flags
    uint8_t padding_unused[3]; // Reserved, must be zero
    uint32_t extension_offset; // Offset to extension directory (0 if none)
    uint32_t unused[3];      // Reserved for future use, must be zero
    // Followed by: struct file_entry_new libs[nlibs]
};
```

**Entry Format**:
```c
struct file_entry_new {
    int32_t flags;           // Library type flags
    uint32_t key;            // Offset to library name in string table
    uint32_t value;          // Offset to library path in string table
    uint32_t osversion_unused; // Unused, must be zero
    uint64_t hwcap;          // Hardware capabilities mask
};
```

## Endianness and Flags

The `flags` field in the new format header indicates file endianness:

| Value | Meaning |
|-------|---------|
| 0 | Endianness unset (legacy) |
| 1 | Invalid cache |
| 2 | Little endian |
| 3 | Big endian |

## String Table

- **Location**: Immediately follows the library entries array
- **Format**: Null-terminated strings concatenated without padding
- **Indexing**: Entry offsets (`key`, `value`) are byte offsets from string table start
- **Size**: Total size specified in `len_strings` field (new format only)

### String Table Layout
```
offset 0: "libc.so.6\0libm.so.6\0libdl.so.2\0"
          ^         ^         ^
          key[0]    key[1]    key[2]
```

## Hardware Capabilities (hwcap)

The `hwcap` field in new entries encodes hardware capability requirements:

### Bit Layout (64-bit field)
```
Bits 63-32: ISA level and extension flags
Bits 31-0:  Platform-specific hwcap mask
```

### Special Bits
- **Bit 62**: `DL_CACHE_HWCAP_EXTENSION` - Indicates glibc-hwcaps subdirectory
- **Bits 61-52**: ISA level (0-1023, typically 0-10 used)
- **Bits 51-32**: Reserved
- **Bits 31-0**: Traditional hwcap mask or extension index

## Extension System

The new format supports extensions via an extension directory.

### Extension Directory Header
```c
struct cache_extension {
    uint32_t magic;          // Always 0xEA8D4E78 (-358342284)
    uint32_t count;          // Number of extension sections
    // Followed by: struct cache_extension_section sections[count]
};
```

### Extension Section
```c
struct cache_extension_section {
    uint32_t tag;            // Section type identifier
    uint32_t flags;          // Section-specific flags
    uint32_t offset;         // Offset to section data
    uint32_t size;           // Size of section data
};
```

### Extension Section Types

| Tag | Name | Purpose |
|-----|------|---------|
| 1 | `cache_extension_tag_generator` | Generator version info |
| 2 | `cache_extension_tag_glibc_hwcaps` | Hardware capability subdirectories |

## Compatible Format Layout

For backward compatibility, a single file may contain both formats:

```
┌─────────────────────────┐
│   Old Format Header     │
├─────────────────────────┤
│   Old Format Entries    │
├─────────────────────────┤
│      Padding            │ ← Alignment for new format
├─────────────────────────┤
│   New Format Header     │
├─────────────────────────┤
│   New Format Entries    │
├─────────────────────────┤
│    String Table         │ ← Shared between formats
├─────────────────────────┤
│  Extension Directory    │
└─────────────────────────┘
```

## File Processing Algorithm

1. **Format Detection**:
   - Read first 19 bytes
   - Check for new format magic `"glibc-ld.so.cache1.1"`
   - If not found, check for old format magic `"ld.so-1.7.0"`

2. **Old Format Processing**:
   - Parse old format header and entries
   - Look for embedded new format after old entries
   - String table follows the last entry format found

3. **New Format Processing**:
   - Validate endianness flags
   - Parse entries with hardware capability information
   - Process extension directory if present

4. **String Resolution**:
   - Calculate string table base address
   - Resolve entry keys/values using offsets

## Alignment Requirements

- **File Header**: New format header should be aligned to `sizeof(void*)`
- **Extension Offset**: Must be multiple of 4 bytes
- **Strings**: No special alignment required (byte-aligned)

## Auxiliary Cache

A separate auxiliary cache may exist with magic `"glibc-ld.so.auxcache-2.0"` containing additional metadata.

## Constants

| Name | Value | Description |
|------|-------|-------------|
| `_DL_CACHE_DEFAULT_ID` | 3 | Default cache identifier |
| `DL_CACHE_HWCAP_EXTENSION` | `1ULL << 62` | Hardware extension flag |
| `DL_CACHE_HWCAP_ISA_LEVEL_COUNT` | 10 | Number of ISA levels |

## Implementation Notes

- All multi-byte integers are stored in native endianness (indicated by flags)
- String offsets are validated against string table bounds
- Hardware capability matching uses bitwise operations
- Extension processing is optional and should be skipped if not understood
- Cache files should be atomically updated to prevent corruption

## Error Handling

Implementations should handle:
- Invalid magic numbers
- Truncated files
- String offset bounds checking
- Endianness mismatches
- Unknown extension types (skip gracefully)
- Malformed extension directories

This specification provides sufficient detail for implementing complete ld.so.cache parsers and generators compatible with the GNU C Library.