omicsx 1.0.2

omicsx: SIMD-accelerated sequence alignment and bioinformatics analysis for petabyte-scale genomic data
Documentation
# Contributing to omicsx

**Maintained by**: Raghav Maheshwari (@techusic)  
**Email**: raghavmkota@gmail.com  
**Repository**: https://github.com/techusic/omicsx

Thank you for your interest in contributing to omicsx! This document provides guidelines and procedures for contributing.

## Code of Conduct

Be respectful, inclusive, and professional in all interactions.

## Getting Started

1. Fork the repository: https://github.com/techusic/omicsx/fork
2. Clone your fork: `git clone https://github.com/YOUR_USERNAME/omicsx.git`
3. Create a feature branch: `git checkout -b feature/your-feature-name`
4. Set up your environment: `cargo build --release`

## Development Guidelines

### Rust Standards

- **Edition**: Rust 2021
- **MSRV**: 1.70+
- **Format**: Run `cargo fmt` before committing
- **Lint**: Pass `cargo clippy --release`
- **Tests**: All tests must pass: `cargo test --lib`

### Code Quality

- All public APIs must have documentation with examples
- Unit tests required for new functionality
- Integration tests for end-to-end workflows
- No `unsafe` code except in SIMD kernel modules with clear comments
- No panics in library code (only in tests/examples)

### Commit Messages

Use conventional commits format:
```
<type>(<scope>): <subject>

<body>

<footer>
```

Types: `feat`, `fix`, `docs`, `style`, `refactor`, `perf`, `test`, `chore`

Example:
```
feat(alignment): add banded DP optimization

Implements O(k·n) complexity algorithm for similar sequences
using diagonal band restriction.

Closes #42
```

## Making Changes

### Phase 1: Protein Primitives
- File: `src/protein/mod.rs`
- Add tests in `mod.rs` tests section
- Update README.md if adding public APIs

### Phase 2: Scoring Infrastructure
- File: `src/scoring/mod.rs`
- Update matrix data in dedicated functions
- Test with known reference values

### Phase 3: SIMD Kernels
- Files: `src/alignment/kernel/*.rs`
- Implement scalar version first
- Profile before optimizing with SIMD
- Benchmark with Criterion.rs

### Future Enhancements
- Files: `src/futures/*.rs`
- Each module is self-contained
- Replace `todo!()` macros with implementations
- Update `#[ignore]` tests to be active
- Maintain error type hierarchy

## Testing

```bash
# Run all tests
cargo test --lib

# Run specific test
cargo test --lib path::to::test

# Run with output
cargo test --lib -- --nocapture

# Benchmark
cargo bench --bench alignment_benchmarks -- --verbose

# Check with Clippy
cargo clippy --release

# Format check
cargo fmt --check
```

## Performance Considerations

1. Profile before optimizing
2. SIMD optimization only after scalar correctness
3. Benchmark against baseline implementations
4. Document performance characteristics
5. Test on multiple architectures (x86-64, ARM64)

## Documentation

- Add doc comments to all public items
- Include examples in public API docs
- Update README.md for major features
- Add inline comments for complex algorithms
- Use /// for documentation, // for implementation notes

## Pull Request Process

1. Update tests and documentation
2. Ensure all tests pass: `cargo test --lib`
3. Ensure no warnings: `cargo clippy --release`
4. Ensure formatted: `cargo fmt`
5. Squash related commits
6. Write clear PR description

### File Management Before Submission

When implementing major enhancements to existing modules:

- **Archive originals**: Keep backup copies of modified files with `_original.rs` suffix (local only, not committed)
- **Test against baselines**: Compare enhanced version against original for regression
- **Update .gitignore**: Add patterns for backup files (`src/module/*_original.rs`, etc.)
- **Document changes**: Reference `DEVELOPMENT.md` section on backup files in commit message

Example workflow:
```bash
# Before modifying phylogeny_likelihood.rs
cp src/futures/phylogeny_likelihood.rs src/futures/phylogeny_likelihood_original.rs

# ... implement enhancements ...

# Verify against original (test both)
cargo test --lib phylogeny_likelihood

# Ensure backup won't be committed (check .gitignore)
git check-ignore src/futures/phylogeny_likelihood_original.rs

# Commit only the enhanced version
git add src/futures/phylogeny_likelihood.rs
git commit -m "feat(phylogeny): add NNI/SPR topology optimization (backup: phylogeny_likelihood_original.rs)"
```

**Benefits**:
- ✅ Easy regression testing and comparison
- ✅ Keep git history clean without bloating commits
- ✅ Support quick rollback if issues arise
- ✅ Preserve architectural decisions for documentation

## Architecture

```
src/
├── protein/          # Phase 1: Types and validation
├── scoring/          # Phase 2: Matrices and penalties
├── alignment/        # Phase 3: SIMD kernels
│   ├── kernel/      # Scalar, AVX2, NEON implementations
│   ├── batch.rs     # Parallel processing
│   ├── bam.rs       # Binary format
│   └── mod.rs       # Integration
└── futures/         # Future enhancements
    ├── matrices.rs  # Additional scoring matrices
    ├── formats.rs   # BLAST/GFF3 export
    ├── gpu.rs       # GPU acceleration
    ├── msa.rs       # Multiple sequence alignment
    ├── hmm.rs       # Profile HMM
    └── phylogeny.rs # Phylogenetic trees
```

## Architecture Decisions

- **Type Safety**: Use enums instead of flags/raw values
- **Error Handling**: `Result<T>` for all fallible operations
- **Memory Safety**: Leverage Rust's ownership model
- **Performance**: SIMD where measurable benefit exists
- **Portability**: Support both x86-64 and ARM64

## Issues and Features

- Bug reports: Use GitHub Issues with reproducible example
- Feature requests: Discuss design before implementation
- Documentation: PRs for typos and clarifications welcome

## Licensing

By contributing to omicsx, you agree that your contributions will be made available under:
- **MIT License** for non-commercial use
- **Dual commercial license** as part of the project's commercial licensing model

Your contributions may be used in both open-source and commercial contexts.

## Questions?

- Read the README.md for usage examples
- Check existing tests for patterns
- Review the copilot-instructions.md for architecture notes
- Open a discussion issue for architectural questions

Thank you for contributing! 🧬