nsv 0.0.12

NSV (Newline-Separated Values) format parser and encoder
Documentation

NSV Rust

Rust implementation of the NSV (Newline-Separated Values) format.

Parallel chunked parsing via rayon + memchr, byte-level API with no encoding assumptions, column-selective (projected) decode.

Installation

cargo add nsv

Usage

Basic encoding/decoding

use nsv::{decode, encode};

let data = decode("a\nb\nc\n\nd\ne\nf\n\n");
// [["a", "b", "c"], ["d", "e", "f"]]

let encoded = encode(&data);
// "a\nb\nc\n\nd\ne\nf\n\n"

Cell-level escaping

use nsv::{escape, unescape};

escape("hello\nworld");  // "hello\\nworld"
escape("");              // "\\"

unescape("hello\\nworld");  // "hello\nworld"
unescape("\\");              // ""

Byte-level API

All core operations have _bytes variants for working with arbitrary ASCII-compatible encodings (Latin-1, Shift-JIS, raw binary, etc). No UTF-8 assumption.

use nsv::{decode_bytes, encode_bytes, escape_bytes, unescape_bytes};

let data = decode_bytes(b"a\nb\n\nc\nd\n\n");
let encoded = encode_bytes(&data);

Projected decode

Column-selective parsing. Single-pass scan that tracks the column index, skips non-projected columns entirely (no allocation, no unescape), and produces the result directly.

use nsv::decode_bytes_projected;

let input = b"name\nage\nsalary\n\nAlice\n30\n50000\n\nBob\n25\n75000\n\n";

// Extract only columns 0 and 2 (name, salary)
let projected = decode_bytes_projected(input, &[0, 2]);
// [[b"name", b"salary"], [b"Alice", b"50000"], [b"Bob", b"75000"]]

// Reorder: columns appear in the order specified
let reordered = decode_bytes_projected(input, &[2, 0]);
// [[b"salary", b"name"], [b"50000", b"Alice"], [b"75000", b"Bob"]]

Validation

use nsv::check;

let warnings = check(b"hello\\x\nworld\n");
for w in &warnings {
    println!("{}:{} {:?}", w.line, w.col, w.kind);
}

Warning kinds: UnknownEscape(u8), DanglingBackslash, NoTerminalLf.

check is opt-in diagnostics — it doesn't alter parsing behavior.

Structural operations (spill/unspill)

use nsv::util::{spill, unspill};

let flat = spill(&vec![vec!["a", "b"], vec!["c"]], "");
// ["a", "b", "", "c", ""]

let structured = unspill(&flat, &"");
// [["a", "b"], ["c"]]

Generic over T: Clone + PartialEq — works with strings, bytes, integers, anything.

Streaming

Resumable Reader and Writer for incremental I/O — tailing files, sockets, pipes, etc. For finite/in-memory data, use decode/encode instead.

use nsv::{Reader, Writer};

// Reading — yields one row at a time, returns Ok(None) when no complete row is available
let mut r = Reader::new(some_stream);
while let Some(row) = r.next_row()? {
    // row: Vec<Vec<u8>>
}

// Peeking at buffered state (useful when the source may have more data later)
let _partial = r.partial_row();   // completed cells so far
let _cell    = r.partial_cell();  // bytes of the cell being read (not yet unescaped)

// Writing — accepts &str, String, &[u8], Vec<u8>
let mut w = Writer::new(some_sink);
w.write_row(&["hello", "world"])?;

// Recovering wrapped I/O
let inner = w.into_inner();

Composition

nsv::util also exposes the algebraic decomposition of encode/decode:

encode = spill('\n') ∘ spill("") ∘ escape_seqseq
decode = unescape_seqseq ∘ unspill("") ∘ unspill('\n')
use nsv::util::{escape_seqseq, unescape_seqseq, spill, unspill};

let escaped = escape_seqseq(&data);
let flat = spill(&escaped, String::new());
let chars: Vec<char> = spill(
    &flat.iter().map(|s| s.chars().collect()).collect::<Vec<Vec<char>>>(),
    '\n',
);
let encoded: String = chars.into_iter().collect();
assert_eq!(nsv::encode(&data), encoded);

API

Core

Function Signature
decode (&str) -> Vec<Vec<String>>
encode (&[Vec<String>]) -> String
decode_bytes (&[u8]) -> Vec<Vec<Vec<u8>>>
encode_bytes (&[Vec<Vec<u8>>]) -> Vec<u8>
decode_bytes_projected (&[u8], &[usize]) -> Vec<Vec<Vec<u8>>>

Cell escaping

Function Signature
escape / unescape (&str) -> String
escape_bytes / unescape_bytes (&[u8]) -> Vec<u8>

Validation

Function Signature
check (&[u8]) -> Vec<Warning>

Streaming

Type Method Signature
Reader<R> next_row (&mut self) -> io::Result<Option<Vec<Vec<u8>>>>
partial_row (&self) -> &[Vec<u8>]
partial_cell (&self) -> &[u8]
into_inner (self) -> BufReader<R>
Writer<W> write_row (&mut self, &[C: AsRef<[u8]>]) -> io::Result<()>
into_inner (self) -> W

Util (nsv::util)

Function Description
spill / unspill Flatten/recover seqseq dimension with terminators
escape_seqseq / unescape_seqseq map(map(escape)) / map(map(unescape)) over a seqseq

Parallel parsing

For inputs above 64KB, decode_bytes (and decode, and decode_bytes_projected) switch from sequential to chunked parallel parsing:

  1. Pick N evenly-spaced byte positions (one per CPU core)
  2. Scan forward from each to the nearest \n\n row boundary — O(avg_row_len)
  3. Each worker independently parses its chunk (boundary scan + cell split + unescape)

This works because literal 0x0A in NSV is always structural (never escaped), so row-boundary recovery from any byte position is a trivial forward scan. The sequential phase is O(N), not O(input_len) — all real work is parallel.