# sanitise
A library for headache-free data clean-up and validation.
[](https://crates.io/crates/sanitise)
[](https://github.com/Spartan2909/rulox)
[](https://docs.rs/sanitise/latest)
`sanitise` is a CSV processing and validation library that generates code at compile time based on a YAML configuration file. The generated code is robust and will not panic.
`no_std` environments are supported, but the `alloc` crate is required.
## Quick Start
Add `sanitise` to your dependencies in your `Cargo.toml`:
```toml
[dependencies]
sanitise = "0.1"
```
Import the macro:
```rust
use sanitise::sanitise_string;
```
And call:
```rust,ignore
// main.rs
use std::{fs, iter::zip};
use sanitise::sanitise_string;
fn main() {
let csv = fs::read_to_string("data.csv").unwrap();
let ((time_millis, pulse, movement), (time_secs,)) = sanitise_string!(include_str!("sanitise_config.yaml"), &csv).unwrap();
println!("time_millis,time_secs,pulse,movement");
for (((time_millis, pulse), movement), time_secs) in zip(zip(zip(time_millis, pulse), movement), time_secs) {
println!("{time_millis},{time_secs},{pulse},{movement}")
}
}
```
```yaml
# sanitise_config.yaml
processes:
- name: validate
columns:
- title: time
type: integer
- title: pulse
type: integer
max: 100
min: 40
on-invalid: average
valid-streak: 3
- title: movement
type: integer
valid-values: [0, 1]
output-type: boolean
output: "value == 1"
- name: process
columns:
- title: time
type: integer
output: "value / 1000"
- title: pulse
type: integer
ignore: true
- title: movement
type: integer
ignore: true
```
```csv
# data.csv
time,pulse,movement
0,67,0
15,45,1
126,132,1
```
The first argument to `sanitise_string!` must be either a string literal or a macro call that expands to a string literal. The second argument must be an expression that resolves to a `&str` in CSV format. In the above example, `sanitise_config.yaml` must be next to `main.rs`, and `data.csv` must be in the working directory at runtime.
The other macro, `sanitise!`, is used when your data has already been parsed into the correct shape. See the [documentation](https://docs.rs/sanitise/latest/sanitise/macro.sanitise.html) for more details.
## Configuration
For details on the configuration file, see the [specification](https://github.com/Spartan2909/sanitise/blob/main/configuration.md).
## Optional features
- `benchmark`: Print the time taken to complete various stages of the process. Disables `no_std` support. You probably don't want this.
## Efficiency
The macro creates linear finite automata to process each column. If `on-invalid` is set to `average` for a given column, that column's automaton will use a state machine to keep track of valid and invalid values. If a column is ignored, no automaton will be generated for it. All data is stored in native Rust types.
## License
Licensed under either of
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
at your option.
### Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.