pub struct RandomAccessSimple<R> { /* private fields */ }
Expand description
A simple index for random access to CSV records.
This index permits seeking to the start of any CSV record with a constant number of operations.
The format of the index is simplistic and amenable to serializing to disk.
It consists of exactly N+1
64 bit big-endian integers, where N
is the
number of records in the CSV data that is indexed. Each i
th integer
corresponds to the approximate byte offset where the i
th record in the
CSV data begins. One additional integer is written to the end of the index
which indicates the total number of records in the CSV data.
This indexing format does not store the line numbers of CSV records, so using the positions returned by this index to seek a CSV reader will likely cause any future line numbers reported by that reader to be incorrect.
This format will never change.
N.B. The format of this indexing scheme matches the format of the old the
csv::Indexed
type in pre-1.0 versions of the csv
crate.
Implementations§
Source§impl<W: Write> RandomAccessSimple<W>
impl<W: Write> RandomAccessSimple<W>
Sourcepub fn create<R: Read>(rdr: &mut Reader<R>, wtr: W) -> Result<()>
pub fn create<R: Read>(rdr: &mut Reader<R>, wtr: W) -> Result<()>
Write a simple index to the given writer for the given CSV reader.
If there was a problem reading CSV records or writing to the given writer, then an error is returned.
That the given CSV reader is read as given until EOF. The index produced includes all records, including the first record even if the CSV reader is configured to interpret the first record as a header record.
§Example: in memory index
This example shows how to create a simple random access index, open it and query the number of records in the index.
use std::io;
use csv_index::RandomAccessSimple;
fn example() -> csv::Result<()> {
let data = "\
city,country,pop
Boston,United States,4628910
Concord,United States,42695
";
let mut rdr = csv::Reader::from_reader(data.as_bytes());
let mut wtr = io::Cursor::new(vec![]);
RandomAccessSimple::create(&mut rdr, &mut wtr)?;
let idx = RandomAccessSimple::open(wtr)?;
assert_eq!(idx.len(), 3);
Ok(())
}
§Example: file backed index
This is like the previous example, but instead of creating the index
in memory with std::io::Cursor
, we write the index to a file.
use std::fs::File;
use std::io;
use csv_index::RandomAccessSimple;
fn example() -> csv::Result<()> {
let data = "\
city,country,pop
Boston,United States,4628910
Concord,United States,42695
";
let mut rdr = csv::Reader::from_reader(data.as_bytes());
let mut wtr = File::create("data.csv.idx")?;
RandomAccessSimple::create(&mut rdr, &mut wtr)?;
let fileidx = File::open("data.csv.idx")?;
let idx = RandomAccessSimple::open(fileidx)?;
assert_eq!(idx.len(), 3);
Ok(())
}
Source§impl<R: Read + Seek> RandomAccessSimple<R>
impl<R: Read + Seek> RandomAccessSimple<R>
Sourcepub fn open(rdr: R) -> Result<RandomAccessSimple<R>>
pub fn open(rdr: R) -> Result<RandomAccessSimple<R>>
Open an existing simple CSV index.
The reader given must be seekable and should contain an index written
by RandomAccessSimple::create
.
§Example
This example shows how to create a simple random access index, open it and query the number of records in the index.
use std::io;
use csv_index::RandomAccessSimple;
fn example() -> csv::Result<()> {
let data = "\
city,country,pop
Boston,United States,4628910
Concord,United States,42695
";
let mut rdr = csv::Reader::from_reader(data.as_bytes());
let mut wtr = io::Cursor::new(vec![]);
RandomAccessSimple::create(&mut rdr, &mut wtr)?;
let idx = RandomAccessSimple::open(wtr)?;
assert_eq!(idx.len(), 3);
Ok(())
}
Sourcepub fn get(&mut self, i: u64) -> Result<Position>
pub fn get(&mut self, i: u64) -> Result<Position>
Get the position of the record at index i
.
The first record has index 0
.
If the position returned is used to seek the CSV reader that was used
to create this index, then the next record read by the CSV reader will
be the i
th record.
Note that since this index does not store the line number of each
record, the position returned will always have a line number equivalent
to 1
. This in turn will cause the CSV reader to report all subsequent
line numbers incorrectly.
§Example
This example shows how to create a simple random access index, open it and use it to seek a CSV reader to read an arbitrary record.
use std::error::Error;
use std::io;
use csv_index::RandomAccessSimple;
fn example() -> Result<(), Box<dyn Error>> {
let data = "\
city,country,pop
Boston,United States,4628910
Concord,United States,42695
";
// Note that we wrap our CSV data in an io::Cursor, which makes it
// seekable. If you're opening CSV data from a file, then this is
// not needed since a `File` is already seekable.
let mut rdr = csv::Reader::from_reader(io::Cursor::new(data));
let mut wtr = io::Cursor::new(vec![]);
RandomAccessSimple::create(&mut rdr, &mut wtr)?;
// Open the index we just created, get the position of the last
// record and seek the CSV reader.
let mut idx = RandomAccessSimple::open(wtr)?;
let pos = idx.get(2)?;
rdr.seek(pos)?;
// Read the next record.
if let Some(result) = rdr.records().next() {
let record = result?;
assert_eq!(record, vec!["Concord", "United States", "42695"]);
Ok(())
} else {
Err(From::from("expected at least one record but got none"))
}
}