# strs_tools Architecture and Implementation Specification
This document contains detailed technical information about the strs_tools crate implementation, architecture decisions, and compliance with design standards.
## Architecture Overview
### Module Structure
strs_tools follows a layered architecture using the `mod_interface!` pattern:
```
src/
├── lib.rs # Main crate entry point
├── simd.rs # SIMD optimization features
└── string/
├── mod.rs # String module interface
├── indentation.rs # Text indentation tools
├── isolate.rs # String isolation functionality
├── number.rs # Number parsing utilities
├── parse_request.rs # Command parsing tools
├── split.rs # Advanced string splitting
└── split/
├── simd.rs # SIMD-accelerated splitting
└── split_behavior.rs # Split configuration
```
### Design Rulebook Compliance
This crate follows strict Design and Codestyle Rulebook compliance:
#### Core Principles
- **Explicit Lifetimes**: All function signatures with references use explicit lifetime parameters
- **mod_interface Pattern**: Uses `mod_interface!` macro instead of manual namespace definitions
- **Workspace Dependencies**: All external deps inherit from workspace for version consistency
- **Testing Architecture**: All tests in `tests/` directory, never in `src/`
- **Error Handling**: Uses `error_tools` exclusively, no `anyhow` or `thiserror`
#### Code Style
- **Universal Formatting**: Consistent 2-space indentation and proper attribute spacing
- **Documentation Strategy**: Entry files use `include_str!` to avoid documentation duplication
- **Explicit Exposure**: All `mod_interface!` exports are explicitly listed, never using wildcards
- **Feature Gating**: Every workspace crate has `enabled` and `full` features
## Feature Architecture
### Feature Dependencies
The crate uses a hierarchical feature system:
```toml
default = ["enabled", "string_indentation", "string_isolate", "string_parse_request", "string_parse_number", "string_split", "simd"]
full = ["enabled", "string_indentation", "string_isolate", "string_parse_request", "string_parse_number", "string_split", "simd"]
# Performance optimization
simd = ["memchr", "aho-corasick", "bytecount", "lazy_static"]
# Core functionality
enabled = []
string_split = ["split"]
string_indentation = ["indentation"]
# ... other features
```
### SIMD Optimization
Optional SIMD dependencies provide significant performance improvements:
- **memchr**: Hardware-accelerated byte searching
- **aho-corasick**: Multi-pattern string searching
- **bytecount**: Fast byte counting operations
- **lazy_static**: Cached pattern compilation
Performance benefits:
- 2-10x faster string searching on large datasets
- Parallel pattern matching capabilities
- Reduced CPU cycles for bulk operations
## API Design Principles
### Memory Efficiency
- **Zero-Copy Operations**: String slices returned where possible using `Cow<str>`
- **Lazy Evaluation**: Iterator-based processing avoids unnecessary allocations
- **Reference Preservation**: Original string references maintained when splitting
### Error Handling Strategy
All error handling follows the centralized `error_tools` pattern:
```rust
use error_tools::{ err, Result };
fn parse_operation() -> Result<ParsedData>
{
// Structured error handling
match validation_step()
{
Ok( data ) => Ok( data ),
Err( _ ) => Err( err!( ParseError::InvalidFormat ) ),
}
}
```
### Async-Ready Design
While the current implementation is synchronous, the API is designed to support async operations:
- Iterator-based processing enables easy async adaptation
- No blocking I/O in core operations
- State machines can be made async-aware
## Performance Characteristics
### Benchmarking Results
Performance benchmarks are maintained in the `benchmarks/` directory:
- **Baseline Results**: Standard library comparisons
- **SIMD Benefits**: Hardware acceleration measurements
- **Memory Usage**: Allocation and reference analysis
- **Scalability**: Large dataset processing metrics
See `benchmarks/readme.md` for current performance data.
### Optimization Strategies
1. **SIMD Utilization**: Vectorized operations for pattern matching
2. **Cache Efficiency**: Minimize memory allocations and copies
3. **Lazy Processing**: Iterator chains avoid intermediate collections
4. **String Interning**: Reuse common patterns and delimiters
## Testing Strategy
### Test Organization
Following the Design Rulebook, all tests are in `tests/`:
```
tests/
├── smoke_test.rs # Basic functionality
├── strs_tools_tests.rs # Main test entry
└── inc/ # Detailed test modules
├── indentation_test.rs
├── isolate_test.rs
├── number_test.rs
├── parse_test.rs
└── split_test/ # Comprehensive splitting tests
├── basic_split_tests.rs
├── quoting_options_tests.rs
└── ... (other test categories)
```
### Test Matrix Approach
Each test module includes a Test Matrix documenting:
- **Test Factors**: Input variations, configuration options
- **Test Combinations**: Systematic coverage of scenarios
- **Expected Outcomes**: Clearly defined success criteria
- **Edge Cases**: Boundary conditions and error scenarios
### Integration Test Features
Integration tests are feature-gated for flexible CI:
```rust
#![cfg(feature = "integration")]
#[test]
fn test_large_dataset_processing()
{
// Performance and stress tests
}
```
## Security Considerations
### Input Validation
- **Bounds Checking**: All string operations validate input boundaries
- **Escape Handling**: Raw string slices returned to prevent injection attacks
- **Error Boundaries**: Parsing failures are contained and reported safely
### Memory Safety
- **No Unsafe Code**: All operations use safe Rust constructs
- **Reference Lifetimes**: Explicit lifetime management prevents use-after-free
- **Allocation Control**: Predictable memory usage patterns
## Compatibility and Portability
### Platform Support
- **no_std Compatibility**: Core functionality available in embedded environments
- **SIMD Fallbacks**: Graceful degradation when hardware acceleration unavailable
- **Endianness Agnostic**: Correct operation on all target architectures
### Version Compatibility
- **Semantic Versioning**: API stability guarantees through SemVer
- **Feature Evolution**: Additive changes maintain backward compatibility
- **Migration Support**: Clear upgrade paths between major versions
## Development Workflow
### Code Generation
Some functionality uses procedural macros following the established workflow:
1. **Manual Implementation**: Hand-written reference implementation
2. **Test Development**: Comprehensive test coverage
3. **Macro Creation**: Procedural macro generating equivalent code
4. **Validation**: Comparison testing between manual and generated versions
### Contribution Guidelines
- **Rulebook Compliance**: All code must follow Design and Codestyle rules
- **Test Requirements**: New features require comprehensive test coverage
- **Performance Testing**: Benchmark validation for performance-sensitive changes
- **Documentation**: Rich examples and API documentation required
## Migration from Standard Library
### Common Patterns
| `str.split()` | `string::split().src().delimeter().perform()` | Quote awareness, delimiter preservation |
| Manual parsing | `string::parse_request::parse()` | Structured command parsing |
| `str.trim()` + parsing | `string::number::parse()` | Robust number format support |
### Performance Benefits
- **Large Data**: 2-10x improvement with SIMD features
- **Memory Usage**: 50-90% reduction with zero-copy operations
- **Complex Parsing**: 5-20x faster than manual implementations
### API Advantages
- **Type Safety**: Compile-time validation of operations
- **Error Handling**: Comprehensive error types and recovery
- **Extensibility**: Plugin architecture for custom operations
- **Testing**: Built-in test utilities and helpers