hat-splitter-0.1.1 has been yanked.

hat-splitter

This is the home of the HAT splitting rule. We expose it as a Rust crate with Python bindings so that the same splitting rule can be used in both languages.

This project is WIP. More information and documentation to follow.

The plan

We've found that HAT models are very sensitive to their splitting rule. As a result, the splitting rule implemented here must exactly match the behaviour of the splitter we're currently using.

Create a simple placeholder text splitting implementation (e.g., just split on whitespace).
Set up Python bindings with PyO3.
Add Scaling as a Python dev dep and test the Python bindings against the existing splitting rule. Tests will fail.
Implement the HAT splitting rule in Rust and make tests green.

Once these basics are in place, we can start thinking about packaging and publishing.

Development

Release process

Update the version in Cargo.toml. Commit and push to main.
Tag the commit with the new version, e.g., git tag v0.1.0.
Push the tag to the remote. CI will take care of the rest.

hat-splitter 0.1.1

hat-splitter

The plan

Development

Release process