NSV Rust
Rust implementation of the NSV (Newline-Separated Values) format.
Parallel chunked parsing via rayon + memchr, byte-level API with no encoding assumptions, column-selective (projected) decode.
Installation
Usage
Basic encoding/decoding
use ;
let data = decode;
// [["a", "b", "c"], ["d", "e", "f"]]
let encoded = encode;
// "a\nb\nc\n\nd\ne\nf\n\n"
Cell-level escaping
use ;
escape; // "hello\\nworld"
escape; // "\\"
unescape; // "hello\nworld"
unescape; // ""
Byte-level API
All core operations have _bytes variants for working with arbitrary ASCII-compatible encodings (Latin-1, Shift-JIS, raw binary, etc). No UTF-8 assumption.
use ;
let data = decode_bytes;
let encoded = encode_bytes;
Projected decode
Column-selective parsing. Single-pass scan that tracks the column index, skips non-projected columns entirely (no allocation, no unescape), and produces the result directly.
use decode_bytes_projected;
let input = b"name\nage\nsalary\n\nAlice\n30\n50000\n\nBob\n25\n75000\n\n";
// Extract only columns 0 and 2 (name, salary)
let projected = decode_bytes_projected;
// [[b"name", b"salary"], [b"Alice", b"50000"], [b"Bob", b"75000"]]
// Reorder: columns appear in the order specified
let reordered = decode_bytes_projected;
// [[b"salary", b"name"], [b"50000", b"Alice"], [b"75000", b"Bob"]]
Validation
use check;
let warnings = check;
for w in &warnings
Warning kinds: UnknownEscape(u8), DanglingBackslash, NoTerminalLf.
check is opt-in diagnostics — it doesn't alter parsing behavior.
Structural operations (spill/unspill)
use ;
let flat = spill;
// ["a", "b", "", "c", ""]
let structured = unspill;
// [["a", "b"], ["c"]]
Generic over T: Clone + PartialEq — works with strings, bytes, integers, anything.
Streaming
Resumable Reader and Writer for incremental I/O — tailing files, sockets, pipes, etc.
For finite/in-memory data, use decode/encode instead.
use ;
// Reading — yields one row at a time, returns Ok(None) when no complete row is available
let mut r = new;
while let Some = r.next_row?
// Peeking at buffered state (useful when the source may have more data later)
let _partial = r.partial_row; // completed cells so far
let _cell = r.partial_cell; // bytes of the cell being read (not yet unescaped)
// Writing — accepts &str, String, &[u8], Vec<u8>
let mut w = new;
w.write_row?;
// Recovering wrapped I/O
let inner = w.into_inner;
Composition
nsv::util also exposes the algebraic decomposition of encode/decode:
encode = spill('\n') ∘ spill("") ∘ escape_seqseq
decode = unescape_seqseq ∘ unspill("") ∘ unspill('\n')
use ;
let escaped = escape_seqseq;
let flat = spill;
let chars: = spill;
let encoded: String = chars.into_iter.collect;
assert_eq!;
API
Core
| Function | Signature |
|---|---|
decode |
(&str) -> Vec<Vec<String>> |
encode |
(&[Vec<String>]) -> String |
decode_bytes |
(&[u8]) -> Vec<Vec<Vec<u8>>> |
encode_bytes |
(&[Vec<Vec<u8>>]) -> Vec<u8> |
decode_bytes_projected |
(&[u8], &[usize]) -> Vec<Vec<Vec<u8>>> |
Cell escaping
| Function | Signature |
|---|---|
escape / unescape |
(&str) -> String |
escape_bytes / unescape_bytes |
(&[u8]) -> Vec<u8> |
Validation
| Function | Signature |
|---|---|
check |
(&[u8]) -> Vec<Warning> |
Streaming
| Type | Method | Signature |
|---|---|---|
Reader<R> |
next_row |
(&mut self) -> io::Result<Option<Vec<Vec<u8>>>> |
partial_row |
(&self) -> &[Vec<u8>] |
|
partial_cell |
(&self) -> &[u8] |
|
into_inner |
(self) -> BufReader<R> |
|
Writer<W> |
write_row |
(&mut self, &[C: AsRef<[u8]>]) -> io::Result<()> |
into_inner |
(self) -> W |
Util (nsv::util)
| Function | Description |
|---|---|
spill / unspill |
Flatten/recover seqseq dimension with terminators |
escape_seqseq / unescape_seqseq |
map(map(escape)) / map(map(unescape)) over a seqseq |
Parallel parsing
For inputs above 64KB, decode_bytes (and decode, and decode_bytes_projected) switch from sequential to chunked parallel parsing:
- Pick N evenly-spaced byte positions (one per CPU core)
- Scan forward from each to the nearest
\n\nrow boundary — O(avg_row_len) - Each worker independently parses its chunk (boundary scan + cell split + unescape)
This works because literal 0x0A in NSV is always structural (never escaped), so row-boundary recovery from any byte position is a trivial forward scan. The sequential phase is O(N), not O(input_len) — all real work is parallel.