semtools 1.2.2

Semantic search and document parsing tools for the command line
Documentation
# Contributing to SemTools

We welcome contributions to SemTools! This document provides guidelines for contributing to the project.

## Getting Started

### Prerequisites

- Rust 1.70 or later
- Git
- For the parse tool: LlamaIndex Cloud API key (for testing)

### Development Setup

1. **Clone the repository**
   ```bash
   git clone https://github.com/run-llama/semtools
   cd semtools
   ```

2. **Build the project**
   ```bash
   cargo build
   ```

3. **Run tests**
   ```bash
   cargo test
   ```

4. **Install for development**
   ```bash
   cargo install --path .
   ```

### Project Structure

```
semtools/
├── crates/
│   ├── parse/              # Document parsing tool
│   │   ├── src/
│   │   │   ├── main.rs     # CLI interface
│   │   │   └── llama_parse_backend.rs  # LlamaIndex API integration
│   │   └── Cargo.toml
│   ├── search/             # Semantic search tool  
│   │   ├── src/
│   │   │   └── main.rs     # CLI interface and search logic
│   │   └── Cargo.toml
│   └── common/             # Shared utilities (future)
├── docs/                   # Documentation
├── tests/                  # Integration tests
└── Cargo.toml             # Workspace configuration
```

## How to Contribute

### Reporting Issues

When reporting issues, please include:

- **Clear title and description**
- **Steps to reproduce** the issue
- **Expected vs actual behavior**
- **Environment details** (OS, Rust version, etc.)
- **Sample files** if relevant (for parsing issues)

### Suggesting Features

For feature requests:

1. **Check existing issues** to avoid duplicates
2. **Describe the use case** and problem being solved
3. **Provide examples** of how the feature would be used
4. **Consider alternatives** and why this approach is best

### Pull Requests

1. **Fork the repository** and create a feature branch
   ```bash
   git checkout -b feature/your-feature-name
   ```

2. **Make your changes** following our coding standards
3. **Add tests** for new functionality
4. **Update documentation** if needed
5. **Ensure tests pass**
   ```bash
   cargo test
   cargo clippy
   cargo fmt
   ```

6. **Submit a pull request** with:
   - Clear title and description
   - Reference to related issues
   - Summary of changes made

## Coding Standards

### Rust Guidelines

- **Follow Rust conventions** (use `cargo fmt` and `cargo clippy`)
- **Write clear, self-documenting code**
- **Use meaningful variable and function names**
- **Add doc comments** for public APIs
- **Handle errors appropriately** (use `anyhow::Result`)

### Code Style

```rust
// Good: Clear function with documentation
/// Searches for semantically similar text in the given documents
pub fn search_documents(
    query: &str,
    documents: &[Document],
    threshold: f64,
) -> Result<Vec<SearchResult>> {
    // Implementation
}

// Good: Error handling
let config = LlamaParseConfig::from_config_file(&config_path)
    .context("Failed to load configuration")?;

// Good: Clear variable names
let similarity_threshold = args.threshold.unwrap_or(0.3);
let context_lines = args.context;
```

### CLI Design Principles

- **Follow Unix philosophy**: Do one thing well
- **Support pipelines**: Read from stdin, write to stdout (`println!` vs. `eprintln!` !)
- **Provide helpful error messages**
- **Use consistent argument naming**
- **Include examples in help text**

### Testing

#### Unit Tests
```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_similarity_calculation() {
        // Test implementation
    }
}
```

#### Integration Tests
```bash
# Add integration tests in tests/ directory
tests/
├── parse_integration.rs
└── search_integration.rs
```

### Documentation

- **Update README files** for user-facing changes
- **Add inline comments** for complex logic
- **Include usage examples** in documentation
- **Update CLI help text** when adding options

## Development Workflow

### Adding a New Feature

1. **Create an issue** to discuss the feature
2. **Design the API** and get feedback
3. **Implement the feature** with tests
4. **Update documentation**
5. **Submit a pull request**

### Bug Fixes

1. **Reproduce the bug** and add a test case
2. **Fix the issue** with minimal changes
3. **Verify the fix** doesn't break existing functionality
4. **Update tests** if needed

### Performance Improvements

1. **Benchmark current performance** 
2. **Profile to identify bottlenecks**
3. **Implement improvements** with measurements
4. **Ensure no regressions** in functionality

## Specific Areas for Contribution

### Parse Tool

- **Add new backends** (local parsing, other APIs)
- **Improve error handling** and retry logic
- **Add more configuration options**
- **Optimize caching strategy**

### Search Tool

- **Support different embedding models**
- **Add more similarity metrics**
- **Improve result ranking**
- **Add search result highlighting**

### General Improvements

- **Better error messages** and help text
- **Performance optimizations**
- **Additional output formats** (JSON, CSV)
- **Integration with more tools**

## Code Review Guidelines

### For Contributors

- **Keep PRs focused** on a single feature/fix
- **Write clear commit messages**
- **Respond to feedback** constructively
- **Update based on review comments**

### For Reviewers

- **Be constructive** and helpful
- **Focus on code quality** and correctness
- **Consider maintainability** and performance
- **Suggest improvements** rather than just pointing out issues

## Release Process

1. **Update version numbers** in Cargo.toml files
2. **Create a release tag** - GitHub will automatically generate release notes from merged PRs
3. **Build and test** release binaries (automated)
4. **Publish to crates.io** (automated for maintainers)

## Getting Help

- **Open an issue** for bugs or questions
- **Check existing documentation** and issues first
- **Provide context** and examples when asking for help

## License

By contributing to SemTools, you agree that your contributions will be licensed under the MIT License.

## Recognition

Contributors will be acknowledged in:
- **GitHub release notes** for their merged PRs
- **README.md** contributors section
- **Release notes** for major features

Thank you for contributing to SemTools! 🎉