Struct Strings
pub struct Strings<'a> { /* private fields */ }Expand description
ECMA-335 compliant #Strings heap providing UTF-8 identifier string access.
The Strings struct represents the strings heap from .NET metadata, which stores
UTF-8 encoded identifier strings referenced by metadata tables. According to ECMA-335
Section II.24.2.3, the strings heap contains null-terminated UTF-8 strings used for
type names, member names, namespaces, and other symbolic identifiers throughout the assembly.
§Heap Structure and Format
The strings heap follows a specific binary layout mandated by ECMA-335:
Offset | Content | Description
-------|----------------------|------------------------------------------
0 | 0x00 | Mandatory null byte (empty string at index 0)
1 | String₁ + 0x00 | First null-terminated UTF-8 string
N | String₂ + 0x00 | Second null-terminated UTF-8 string
... | StringN + 0x00 | Additional strings in sequential order§String Categories
The heap contains various types of identifier strings:
- Namespace identifiers:
System,Microsoft.Extensions,MyApp.Services - Type names:
Console,String,List<T>,Dictionary<TKey,TValue> - Member names:
WriteLine,ToString,Add,Remove,PropertyChanged - Parameter names:
value,index,count,predicate - Assembly/module names:
mscorlib,System.Core,<Module>
§Memory and Performance
- Zero-copy design: String access returns borrowed slices without allocation
- UTF-8 validation: Each string access validates UTF-8 encoding for safety
- Sequential storage: Strings stored contiguously for efficient iteration
- Lifetime safety: Rust borrow checker prevents dangling string references
§Examples
§Basic String Access
use dotscope::metadata::streams::Strings;
// Construct a sample strings heap
#[rustfmt::skip]
let heap_data = [
0x00, // Index 0: required empty string
b'S', b'y', b's', b't', b'e', b'm', 0x00, // Index 1: "System"
b'C', b'o', b'n', b's', b'o', b'l', b'e', 0x00, // Index 8: "Console"
];
let strings = Strings::from(&heap_data)?;
// Access strings by their heap offsets
assert_eq!(strings.get(1)?, "System");
assert_eq!(strings.get(8)?, "Console");
// Index 0 always contains empty string per ECMA-335
assert_eq!(strings.get(0)?, "");§Iterating Over All Strings
use dotscope::metadata::streams::Strings;
let heap_data = [
0x00, // Empty string at index 0
b'H', b'e', b'l', b'l', b'o', 0x00, // "Hello" at index 1
b'W', b'o', b'r', b'l', b'd', 0x00, // "World" at index 7
];
let strings = Strings::from(&heap_data)?;
// Iterate with offset information
for (offset, string) in strings.iter() {
println!("String at offset {}: '{}'", offset, string);
}
// Collect all strings for batch processing
let strings_list: Vec<_> = strings.iter().collect();
assert_eq!(strings_list.len(), 2); // "Hello" + "World" (empty string at index 0 is skipped)
assert_eq!(strings_list[0], (1, "Hello"));
assert_eq!(strings_list[1], (7, "World"));§Error Handling
use dotscope::metadata::streams::Strings;
// Invalid heap: missing mandatory null byte at index 0
let invalid_heap = [b'I', b'n', b'v', b'a', b'l', b'i', b'd'];
assert!(Strings::from(&invalid_heap).is_err());
// Valid heap for testing access errors
let valid_heap = [0x00, b'T', b'e', b's', b't', 0x00];
let strings = Strings::from(&valid_heap).unwrap();
// Error cases
assert!(strings.get(100).is_err()); // Out of bounds
assert!(strings.get(valid_heap.len()).is_err()); // Exact boundary
// Valid access
assert_eq!(strings.get(1).unwrap(), "Test");§Real-World Metadata Processing
use dotscope::metadata::streams::Strings;
// Function to safely resolve string references from metadata tables
fn resolve_string_ref(strings: &Strings, offset: usize) -> String {
strings.get(offset)
.map(|s| s.to_string())
.unwrap_or_else(|_| format!("<invalid_ref@{}>", offset))
}
// Simulate processing TypeDef table entries
let namespace_offset = 1; // Points to "System"
let type_name_offset = 8; // Points to "Object"
let method_name_offset = 15; // Points to "ToString"
let full_type_name = format!("{}.{}",
resolve_string_ref(&strings, namespace_offset),
resolve_string_ref(&strings, type_name_offset)
);
println!("Type: {}", full_type_name); // "System.Object"
println!("Method: {}", resolve_string_ref(&strings, method_name_offset)); // "ToString"§ECMA-335 Compliance
This implementation strictly follows ECMA-335 Partition II, Section 24.2.3:
- Enforces mandatory null byte at heap index 0
- Validates UTF-8 encoding for all string content
- Preserves null-termination requirements for each string
- Maintains sequential storage without gaps or padding
§Security Considerations
- Input validation: UTF-8 encoding verified for all string access
- Bounds checking: Index validation prevents buffer overrun attacks
- Memory safety: Rust lifetime system prevents use-after-free vulnerabilities
- Format compliance: ECMA-335 format requirements enforced during construction
§See Also
crate::metadata::streams::UserStrings: UTF-16 user string literals (#USheap)crate::metadata::tables: Metadata tables containing string heap referencescrate::metadata::streams::strings::StringsIterator: Iterator for sequential string heap traversal- ECMA-335 II.24.2.3: Strings heap specification
Implementations§
§impl<'a> Strings<'a>
impl<'a> Strings<'a>
pub fn from(data: &[u8]) -> Result<Strings<'_>>
pub fn from(data: &[u8]) -> Result<Strings<'_>>
Create a crate::metadata::streams::strings::Strings heap accessor from binary data with ECMA-335 validation.
Constructs a strings heap accessor from the raw binary data of a #Strings stream.
This method validates the heap format according to ECMA-335 Section II.24.2.3,
ensuring the mandatory null byte at index 0 is present and the data is well-formed.
§ECMA-335 Format Requirements
The strings heap must conform to the following structure:
- Index 0: Must contain a null byte (0x00) representing the empty string
- Sequential layout: Strings stored contiguously without gaps
- Null termination: Each string must end with a 0x00 byte
- UTF-8 encoding: All string content must be valid UTF-8 sequences
§Validation Performed
This method enforces ECMA-335 compliance by checking:
- Non-empty data: Heap must contain at least one byte
- Mandatory null byte: First byte must be 0x00 (empty string at index 0)
- Format integrity: Basic structural validation of heap data
Note: Individual string UTF-8 validation is performed during access via Strings::get
to optimize construction performance for large heaps.
§Arguments
data- Raw binary data slice containing the complete#Stringsheap
§Returns
Ok(Strings)- Successfully validated and constructed strings heap accessorErr(Error)- Validation failed due to ECMA-335 format violations
§Errors
Returns crate::Error in the following cases:
- Empty heap: Data slice is empty (violates ECMA-335 requirement)
- Missing null byte: First byte is not 0x00 (violates mandatory empty string at index 0)
§Examples
§Valid Strings Heap Construction
use dotscope::metadata::streams::Strings;
// Properly formatted strings heap with mandatory null byte
#[rustfmt::skip]
let valid_heap = [
0x00, // Index 0: mandatory empty string
b'S', b'y', b's', b't', b'e', b'm', 0x00, // Index 1: "System"
b'C', b'o', b'n', b's', b'o', b'l', b'e', 0x00, // Index 8: "Console"
b'W', b'r', b'i', b't', b'e', b'L', b'i', b'n', b'e', 0x00, // Index 16: "WriteLine"
];
let strings = Strings::from(&valid_heap)?;
// Verify successful construction and access
assert_eq!(strings.get(0)?, ""); // Empty string at index 0
assert_eq!(strings.get(1)?, "System"); // First identifier
assert_eq!(strings.get(8)?, "Console"); // Second identifier§Minimal Valid Heap
use dotscope::metadata::streams::Strings;
// Minimal heap with only the mandatory empty string
let minimal_heap = [0x00];
let strings = Strings::from(&minimal_heap)?;
// Can access the empty string at index 0
assert_eq!(strings.get(0)?, "");
// Iterator returns no strings (empty string is implicit)
let string_count = strings.iter().count();
assert_eq!(string_count, 0);§Error Cases
use dotscope::metadata::streams::Strings;
// Error: Empty data (no mandatory null byte)
let empty_data = [];
assert!(Strings::from(&empty_data).is_err());
// Error: Missing mandatory null byte at index 0
let no_null_byte = [b'I', b'n', b'v', b'a', b'l', b'i', b'd'];
assert!(Strings::from(&no_null_byte).is_err());
// Error: Non-zero first byte
let wrong_start = [0x01, b'T', b'e', b's', b't', 0x00];
assert!(Strings::from(&wrong_start).is_err());§Real-World Assembly Processing
use dotscope::metadata::streams::Strings;
// Realistic strings heap from a .NET assembly
#[rustfmt::skip]
let dotnet_heap = [
0x00, // Index 0: empty string
b'<', b'M', b'o', b'd', b'u', b'l', b'e', b'>', 0x00, // Index 1: "<Module>"
b'S', b'y', b's', b't', b'e', b'm', 0x00, // Index 10: "System"
b'O', b'b', b'j', b'e', b'c', b't', 0x00, // Index 17: "Object"
b'T', b'o', b'S', b't', b'r', b'i', b'n', b'g', 0x00, // Index 24: "ToString"
b'E', b'q', b'u', b'a', b'l', b's', 0x00, // Index 33: "Equals"
b'G', b'e', b't', b'H', b'a', b's', b'h', b'C', b'o', b'd', b'e', 0x00, // Index 40: "GetHashCode"
];
let strings = Strings::from(&dotnet_heap)?;
// Verify we can access all the expected strings
assert_eq!(strings.get(1)?, "<Module>");
assert_eq!(strings.get(10)?, "System");
assert_eq!(strings.get(17)?, "Object");
assert_eq!(strings.get(24)?, "ToString");
assert_eq!(strings.get(33)?, "Equals");
assert_eq!(strings.get(40)?, "GetHashCode");
// Count total strings (excluding empty string at index 0)
let string_count = strings.iter().count();
assert_eq!(string_count, 6);§Performance Considerations
- Construction: O(1) validation with minimal overhead
- Memory: Zero-copy design with borrowed data references
- Deferred validation: UTF-8 validation performed during string access
- Thread safety: Safe for concurrent construction from different data sources
§ECMA-335 Compliance
This method implements ECMA-335 Partition II, Section 24.2.3 requirements:
- Enforces mandatory null byte at heap index 0
- Validates basic heap structure and format
- Prepares for proper UTF-8 string access and iteration
§See Also
Strings::get: Access individual strings with UTF-8 validationStrings::iter: Sequential iteration over all heap strings- ECMA-335 II.24.2.3: Strings heap format specification
pub fn get(&self, index: usize) -> Result<&'a str>
pub fn get(&self, index: usize) -> Result<&'a str>
Access a UTF-8 string at the specified heap offset with comprehensive validation.
Retrieves a null-terminated UTF-8 string from the strings heap at the given byte offset. This method performs complete ECMA-335 compliance validation including bounds checking, null-termination verification, and UTF-8 encoding validation to ensure safe string access.
§String Resolution Process
- Bounds Validation: Verify the index is within the heap data boundaries
- Null Termination: Locate the null terminator (0x00) byte for the string
- UTF-8 Validation: Ensure all bytes form a valid UTF-8 sequence
- String Creation: Return a borrowed string slice without allocation
§Offset Interpretation
Offsets are byte positions within the strings heap:
- Index 0: Always points to the mandatory empty string (single null byte)
- Index 1+: Points to the start of actual identifier strings
- Mid-string offsets: Invalid - will cause UTF-8 validation errors
- Table references: Metadata tables store these offsets to reference strings
§Arguments
index- Byte offset within the strings heap where the string begins
§Returns
Ok(&str)- Valid UTF-8 string slice with lifetime tied to heap dataErr(Error)- Access failed due to bounds or validation errors
§Errors
Returns crate::Error in the following cases:
crate::Error::OutOfBounds: Index exceeds heap data length- Malformed string: No null terminator found within remaining heap data
- Invalid UTF-8: String bytes do not form a valid UTF-8 sequence
- Encoding errors: Non-ASCII bytes that violate UTF-8 encoding rules
§Examples
§Basic String Access
use dotscope::metadata::streams::Strings;
#[rustfmt::skip]
let heap_data = [
0x00, // Index 0: empty string
b'S', b'y', b's', b't', b'e', b'm', 0x00, // Index 1: "System"
b'C', b'o', b'n', b's', b'o', b'l', b'e', 0x00, // Index 8: "Console"
];
let strings = Strings::from(&heap_data)?;
// Access strings by their heap offsets
assert_eq!(strings.get(0)?, ""); // Empty string always at index 0
assert_eq!(strings.get(1)?, "System"); // First identifier string
assert_eq!(strings.get(8)?, "Console"); // Second identifier string
// These offsets would typically come from metadata table entries
let namespace_ref = 1; // From TypeDef.TypeNamespace field
let type_name_ref = 8; // From TypeDef.TypeName field
println!("Type: {}.{}", strings.get(namespace_ref)?, strings.get(type_name_ref)?);§UTF-8 and Unicode Support
use dotscope::metadata::streams::Strings;
// Strings heap with Unicode content (UTF-8 encoded)
#[rustfmt::skip]
let unicode_heap = [
0x00, // Index 0: empty string
// "Café" in UTF-8: C3 A9 is 'é'
b'C', b'a', b'f', 0xC3, 0xA9, 0x00, // Index 1: "Café"
// "测试" (test in Chinese) in UTF-8
0xE6, 0xB5, 0x8B, 0xE8, 0xAF, 0x95, 0x00, // Index 7: "测试"
];
let strings = Strings::from(&unicode_heap)?;
assert_eq!(strings.get(1)?, "Café");
assert_eq!(strings.get(7)?, "测试");
// Verify proper UTF-8 character handling
let cafe = strings.get(1)?;
assert_eq!(cafe.chars().count(), 4); // 4 Unicode characters
assert_eq!(cafe.len(), 5); // 5 UTF-8 bytes§Error Handling and Edge Cases
use dotscope::metadata::streams::Strings;
let heap_data = [
0x00, // Index 0: empty string
b'T', b'e', b's', b't', 0x00, // Index 1: "Test"
];
let strings = Strings::from(&heap_data)?;
// Valid accesses
assert_eq!(strings.get(0)?, ""); // Empty string
assert_eq!(strings.get(1)?, "Test"); // Valid string
// Error cases
assert!(strings.get(100).is_err()); // Way out of bounds
assert!(strings.get(heap_data.len()).is_err()); // Exact boundary
assert!(strings.get(heap_data.len() + 1).is_err()); // Beyond boundary
// Mid-string access would cause UTF-8 validation error
// (Though bounds checking might catch it first)
assert!(strings.get(3).is_err()); // Points to 's' in "Test"§Metadata Table Integration
use dotscope::metadata::streams::Strings;
// Simulated metadata processing scenario
// Function to safely resolve string references from metadata tables
fn get_string_or_invalid(strings: &Strings, offset: usize) -> String {
strings.get(offset)
.map(|s| s.to_string())
.unwrap_or_else(|_| format!("<invalid@{}>", offset))
}
// Simulate TypeDef table processing
struct TypeDefRow {
type_name: usize, // String heap offset
type_namespace: usize, // String heap offset
}
let type_def = TypeDefRow {
type_namespace: 1, // "System"
type_name: 8, // "Object"
};
let namespace = get_string_or_invalid(&strings, type_def.type_namespace);
let type_name = get_string_or_invalid(&strings, type_def.type_name);
let full_name = format!("{}.{}", namespace, type_name);
assert_eq!(full_name, "System.Object");§Performance-Sensitive Usage
use dotscope::metadata::streams::Strings;
use dotscope::Error;
// For repeated access to the same strings, consider caching
use std::collections::HashMap;
let mut string_cache: HashMap<usize, String> = HashMap::new();
fn cached_get(strings: &Strings, cache: &mut HashMap<usize, String>, offset: usize) -> Result<String, Error> {
if let Some(cached) = cache.get(&offset) {
Ok(cached.clone())
} else {
let string = strings.get(offset)?.to_string();
cache.insert(offset, string.clone());
Ok(string)
}
}
// First access validates and caches
let result1 = cached_get(&strings, &mut string_cache, 1)?;
// Second access uses cache
let result2 = cached_get(&strings, &mut string_cache, 1)?;
assert_eq!(result1, result2);§Security Considerations
- Bounds checking: Prevents buffer overrun vulnerabilities
- UTF-8 validation: Ensures string content is well-formed
- Memory safety: Rust lifetime system prevents use-after-free
- No allocation: Eliminates memory exhaustion attacks
§ECMA-335 Compliance
This method fully implements ECMA-335 Partition II, Section 24.2.3:
- Proper null-terminated string parsing
- UTF-8 encoding validation as required by the specification
- Correct handling of the empty string at index 0
§See Also
crate::metadata::streams::strings::Strings::iter: Sequential iteration over all strings in the heapcrate::metadata::streams::strings::StringsIterator: Iterator implementation for strings heap traversalcrate::metadata::tables: Metadata tables containing string references- ECMA-335 II.24.2.3: Strings heap specification
pub fn iter(&self) -> StringsIterator<'_> ⓘ
pub fn iter(&self) -> StringsIterator<'_> ⓘ
Create an iterator for sequential traversal of all strings in the heap.
Returns a crate::metadata::streams::strings::StringsIterator that provides zero-copy access to all null-terminated
UTF-8 strings stored in the heap. The iterator yields tuples of (offset, string)
where offset is the byte position within the heap and string is the UTF-8 content.
§Iteration Behavior
- Sequential access: Strings are visited in storage order within the heap
- Zero-copy design: String references borrow from original heap data
- UTF-8 validation: Each string is validated during iteration
- Error handling: Iterator stops on invalid strings instead of panicking
- Empty string skipped: The mandatory empty string at index 0 is not yielded
§Error Handling
The iterator gracefully handles malformed heap data:
- Invalid UTF-8 sequences cause iterator termination
- Missing null terminators cause iterator termination
- Corrupted heap structure detected during iteration
§Returns
crate::metadata::streams::strings::StringsIterator that yields (usize, &str) for each string
§Examples
§Basic Iteration
use dotscope::metadata::streams::Strings;
#[rustfmt::skip]
let heap_data = [
0x00, // Index 0: empty string (not yielded)
b'H', b'e', b'l', b'l', b'o', 0x00, // Index 1: "Hello"
b'W', b'o', b'r', b'l', b'd', 0x00, // Index 7: "World"
b'T', b'e', b's', b't', 0x00, // Index 13: "Test"
];
let strings = Strings::from(&heap_data)?;
// Iterate over all strings with their offsets
for (offset, string) in strings.iter() {
println!("String at offset {}: '{}'", offset, string);
}
// Expected output:
// String at offset 1: 'Hello'
// String at offset 7: 'World'
// String at offset 13: 'Test'§Collecting All Valid Strings
use dotscope::metadata::streams::Strings;
let heap_data = [
0x00, // Empty string (skipped)
b'S', b'y', b's', b't', b'e', b'm', 0x00, // "System"
b'C', b'o', b'n', b's', b'o', b'l', b'e', 0x00, // "Console"
b'O', b'b', b'j', b'e', b'c', b't', 0x00, // "Object"
];
let strings = Strings::from(&heap_data)?;
// Collect all strings, handling errors
let all_strings: Vec<_> = strings.iter().collect();
assert_eq!(all_strings.len(), 3);
assert_eq!(all_strings[0], (1, "System"));
assert_eq!(all_strings[1], (8, "Console"));
assert_eq!(all_strings[2], (16, "Object"));
for (offset, string) in all_strings {
println!("Found identifier: '{}' at offset {}", string, offset);
}§Filtering and Processing
use dotscope::metadata::streams::Strings;
// Find all method names (strings containing common method patterns)
let method_names: Vec<_> = strings.iter()
.filter(|(_, string)| {
string.chars().next().map_or(false, |c| c.is_uppercase()) &&
(string.contains("Get") || string.contains("Set") ||
string.contains("To") || string.contains("Equals"))
})
.collect();
// Find all namespace-like strings (containing dots)
let namespaces: Vec<_> = strings.iter()
.filter(|(_, string)| string.contains('.'))
.map(|(offset, string)| (offset, string.to_string()))
.collect();
println!("Found {} method names", method_names.len());
println!("Found {} namespace strings", namespaces.len());§Memory-Efficient Processing
use dotscope::metadata::streams::Strings;
// Process large string heaps without collecting all strings in memory
let mut total_length = 0;
let mut max_length = 0;
let mut string_count = 0;
for (_, string) in strings.iter() {
total_length += string.len();
max_length = max_length.max(string.len());
string_count += 1;
// Process string immediately without storing
if string.len() > 50 {
println!("Long identifier found: '{}'", string);
}
}
let average_length = if string_count > 0 { total_length / string_count } else { 0 };
println!("Statistics: {} strings, avg length: {}, max length: {}",
string_count, average_length, max_length);§Integration with IntoIterator
use dotscope::metadata::streams::Strings;
// Can use with for loops directly via IntoIterator implementation
for (offset, string) in &strings {
println!("{}: {}", offset, string);
}
// Or with iterator methods
let string_lengths: Vec<_> = (&strings).into_iter()
.map(|(_, string)| string.len())
.collect();§Performance Notes
- Streaming processing: Iterator allows processing large heaps without memory allocation
- Early termination: Can stop iteration on first error or when specific strings found
- Cache-friendly: Sequential memory access pattern optimizes CPU cache usage
- UTF-8 overhead: Validation cost scales with total string content size
§ECMA-335 Compliance
The iterator correctly implements ECMA-335 Section II.24.2.3 requirements:
- Skips the mandatory empty string at index 0
- Properly handles null-terminated string format
- Validates UTF-8 encoding as required by specification
- Maintains sequential storage order for deterministic iteration
§See Also
crate::metadata::streams::strings::StringsIterator: The iterator implementation with detailed behavior documentationcrate::metadata::streams::strings::Strings::get: Direct string access by offset for random access patternscrate::metadata::streams::UserStrings: UTF-16 user strings iteration- ECMA-335 II.24.2.3: Strings heap specification
pub fn data(&self) -> &[u8] ⓘ
pub fn data(&self) -> &[u8] ⓘ
Returns the raw underlying data of the strings heap.
This provides access to the complete heap data including the null byte at offset 0 and all string entries in their original binary format. This method is useful for heap size calculation, bounds checking, and low-level metadata analysis.
§Returns
A byte slice containing the complete strings heap data.
§Examples
use dotscope::metadata::streams::Strings;
let heap_data = [0x00, b'T', b'e', b's', b't', 0x00];
let strings = Strings::from(&heap_data)?;
assert_eq!(strings.data().len(), 6);
assert_eq!(strings.data()[0], 0x00); // Mandatory null byteTrait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for Strings<'a>
impl<'a> RefUnwindSafe for Strings<'a>
impl<'a> Send for Strings<'a>
impl<'a> Sync for Strings<'a>
impl<'a> Unpin for Strings<'a>
impl<'a> UnwindSafe for Strings<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more