hat-splitter-0.1.1 has been yanked.
hat-splitter
This is the home of the HAT splitting rule. We expose it as a Rust crate with Python bindings so that the same splitting rule can be used in both languages.
This project is WIP. More information and documentation to follow.
The plan
We've found that HAT models are very sensitive to their splitting rule. As a result, the splitting rule implemented here must exactly match the behaviour of the splitter we're currently using.
- Create a simple placeholder text splitting implementation (e.g., just split on whitespace).
- Set up Python bindings with PyO3.
- Add Scaling as a Python dev dep and test the Python bindings against the existing splitting rule. Tests will fail.
- Implement the HAT splitting rule in Rust and make tests green.
Once these basics are in place, we can start thinking about packaging and publishing.
Development
Release process
- Update the version in
Cargo.toml. Commit and push tomain. - Tag the commit with the new version, e.g.,
git tag v0.1.0. - Push the tag to the remote. CI will take care of the rest.