midas_processor 1.0.0

# Contributing to MIDAS Converter

Thank you for your interest in contributing to MIDAS Converter! This document provides guidelines and information for contributors.

## Code of Conduct

This project adheres to a code of conduct that we expect all contributors to follow. Please be respectful and constructive in all interactions.

## How to Contribute

### Reporting Issues

Before creating an issue, please:
1. **Search existing issues** to avoid duplicates
2. **Use the issue templates** when available
3. **Provide clear reproduction steps** for bugs
4. **Include system information** (OS, Rust version, dataset details)

### Suggesting Features

We welcome feature suggestions! Please:
1. **Check if similar features exist** or are planned
2. **Describe the use case** and why it would be valuable
3. **Consider the impact** on performance and complexity
4. **Propose an implementation approach** if possible

### Development Workflow

1. **Fork the repository** and create a feature branch
2. **Make your changes** following our coding standards
3. **Add tests** for new functionality
4. **Update documentation** as needed
5. **Submit a pull request** with a clear description

## Development Setup

### Prerequisites

- **Rust 1.70+** with Rust 2024 edition support
- **Git** for version control
- **Test datasets** (use midas-fetcher to download samples)

### Local Development

```bash
# Clone your fork
git clone https://github.com/your-username/midas-converter
cd midas-converter

# Build and test
cargo build
cargo test
cargo clippy
cargo fmt --check

# Run with sample data
cargo run -- --help
```

### Testing

We use several types of tests:

**Unit Tests**
```bash
cargo test
```

**Integration Tests**
```bash
cargo test --test integration
```

**Benchmarks**
```bash
cargo bench
```

**Manual Testing**
```bash
# Test with real datasets
cargo run -- /path/to/test/dataset --verbose
```

## Coding Standards

### Code Style

- **Use `cargo fmt`** for consistent formatting
- **Run `cargo clippy`** and fix all warnings
- **Follow Rust naming conventions** (snake_case, CamelCase, etc.)
- **Write clear, self-documenting code** with meaningful variable names

### Documentation

- **Add rustdoc comments** to all public APIs
- **Include examples** in documentation where helpful
- **Update README.md** for user-facing changes
- **Add inline comments** for complex logic

### Error Handling

- **Use `Result<T, E>`** for fallible operations
- **Provide context** with `.context()` for error chains
- **Use specific error types** rather than generic errors
- **Handle errors gracefully** with helpful user messages

### Performance

- **Benchmark performance-critical changes**
- **Use profiling tools** to identify bottlenecks
- **Consider memory usage** for large datasets
- **Test with realistic data sizes**

## Architecture Guidelines

### Code Organization

```
src/
├── main.rs          # CLI entry point
├── lib.rs           # Public library interface
├── cli.rs           # Command-line interface
├── config.rs        # Configuration management
├── processor.rs     # Main processing engine
├── schema.rs        # Schema detection and management
├── header.rs        # BADC-CSV header parsing
├── models.rs        # Data structures
└── error.rs         # Error types
```

### Design Principles

1. **Separation of concerns** - Each module has a clear responsibility
2. **Dependency injection** - Components accept dependencies for testability
3. **Error propagation** - Use `?` operator and error chains
4. **Async by default** - Use async/await for I/O operations
5. **Memory efficiency** - Stream data when possible

### Adding New Features

When adding features, consider:

1. **Backward compatibility** - Don't break existing workflows
2. **Configuration** - Make features configurable when appropriate
3. **Testing** - Add comprehensive tests including edge cases
4. **Documentation** - Update user and developer documentation
5. **Performance** - Measure impact on processing speed and memory

## Testing Guidelines

### Test Structure

```rust
#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_feature_works() {
        // Arrange
        let input = create_test_input();
        
        // Act
        let result = function_under_test(input);
        
        // Assert
        assert!(result.is_ok());
        assert_eq!(result.unwrap(), expected_value);
    }
}
```

### Test Data

- **Use minimal test datasets** in `tests/data/`
- **Create synthetic data** when real data isn't available
- **Clean up temporary files** in test teardown
- **Use consistent test fixtures** across tests

### Async Testing

```rust
#[tokio::test]
async fn test_async_function() {
    let result = async_function().await;
    assert!(result.is_ok());
}
```

## Performance Optimization

### Profiling

Use these tools to identify performance bottlenecks:

```bash
# CPU profiling
cargo install flamegraph
sudo cargo flamegraph --bin midas -- test-dataset

# Memory profiling
cargo install heaptrack
heaptrack cargo run -- test-dataset

# Benchmarking
cargo bench
```

### Common Optimizations

1. **Reduce allocations** - Use references and iterators
2. **Batch operations** - Process multiple items together
3. **Parallel processing** - Use rayon for CPU-intensive tasks
4. **Streaming** - Process data without loading everything into memory
5. **Caching** - Cache expensive computations

## Documentation

### Code Documentation

```rust
/// Processes a MIDAS dataset and converts it to Parquet format.
///
/// # Arguments
/// * `dataset_path` - Path to the MIDAS dataset directory
/// * `config` - Processing configuration options
///
/// # Returns
/// * `Ok(stats)` - Processing statistics on success
/// * `Err(error)` - Error details on failure
///
/// # Example
/// ```
/// let stats = process_dataset(&path, &config).await?;
/// println!("Processed {} files", stats.files_processed);
/// ```
pub async fn process_dataset(
    dataset_path: &Path,
    config: &Config,
) -> Result<ProcessingStats> {
    // Implementation
}
```

### User Documentation

Update these files for user-facing changes:
- `README.md` - Main documentation
- `CHANGELOG.md` - Version history
- `docs/` - Detailed guides (if applicable)

## Pull Request Guidelines

### Before Submitting

- [ ] **All tests pass** (`cargo test`)
- [ ] **No clippy warnings** (`cargo clippy`)
- [ ] **Code is formatted** (`cargo fmt`)
- [ ] **Documentation is updated**
- [ ] **CHANGELOG.md is updated** (for significant changes)

### PR Description Template

```markdown
## Summary
Brief description of changes

## Changes
- List of specific changes made
- Include any breaking changes

## Testing
- How the changes were tested
- Any manual testing performed

## Related Issues
- Fixes #123
- Related to #456
```

### Review Process

1. **Automated checks** must pass (CI/CD)
2. **Code review** by maintainers
3. **Testing** with real datasets (if applicable)
4. **Documentation review** for clarity
5. **Merge** after approval

## Release Process

### Version Numbering

We follow [Semantic Versioning](https://semver.org/):
- **MAJOR**: Breaking changes
- **MINOR**: New features (backward compatible)
- **PATCH**: Bug fixes (backward compatible)

### Release Checklist

1. **Update version** in `Cargo.toml`
2. **Update CHANGELOG.md** with release notes
3. **Tag release** in git
4. **Build and test** release artifacts
5. **Publish to crates.io** (maintainers only)

## Community

### Getting Help

- **GitHub Discussions** - Ask questions and share ideas
- **GitHub Issues** - Report bugs and request features
- **Documentation** - Check existing docs first

### Staying Updated

- **Watch the repository** for notifications
- **Follow the changelog** for new features
- **Join discussions** to influence direction

## Recognition

We recognize contributors in several ways:
- **Contributor list** in README.md
- **Release notes** mentioning significant contributions
- **GitHub contributor graph** showing activity

Thank you for contributing to MIDAS Converter!