Struct csv::Reader [] [src]

pub struct Reader<R> { /* fields omitted */ }

A already configured CSV reader.

A CSV reader takes as input CSV data and transforms that into standard Rust values. The most flexible way to read CSV data is as a sequence of records, where a record is a sequence of fields and each field is a string. However, a reader can also deserialize CSV data into Rust types like i64 or (String, f64, f64, f64) or even a custom struct automatically using Serde.

Configuration

A CSV reader has a couple convenient constructor methods like from_path and from_reader. However, if you want to configure the CSV reader to use a different delimiter or quote character (among many other things), then you should use a ReaderBuilder to construct a Reader. For example, to change the field delimiter:

extern crate csv;

use std::error::Error;
use csv::ReaderBuilder;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city;country;pop
Boston;United States;4628910
";
    let mut rdr = ReaderBuilder::new()
        .delimiter(b';')
        .from_reader(data.as_bytes());

    if let Some(result) = rdr.records().next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Error handling

In general, CSV parsing does not ever return an error. That is, there is no such thing as malformed CSV data. Instead, this reader will prioritize finding a parse over rejecting CSV data that it does not understand. This choice was inspired by other popular CSV parsers, but also because it is pragmatic. CSV data varies wildly, so even if the CSV data is malformed, it might still be possible to work with the data. In the land of CSV, there is no "right" or "wrong," only "right" and "less right."

With that said, a number of errors can occur while reading CSV data:

  • By default, all records in CSV data must have the same number of fields. If a record is found with a different number of fields than a prior record, then an error is returned. This behavior can be disabled by enabling flexible parsing via the flexible method on ReaderBuilder.
  • When reading CSV data from a resource (like a file), it is possible for reading from the underlying resource to fail. This will return an error.
  • When reading CSV data into String or &str fields (e.g., via a StringRecord), UTF-8 is strictly enforced. If CSV data is invalid UTF-8, then an error is returned. If you want to read invalid UTF-8, then you should use the byte oriented APIs such as ByteRecord. If you need explicit support for another encoding entirely, then you'll need to use another crate to transcode your CSV data to UTF-8 before parsing it.
  • When using Serde to deserialize CSV data into Rust types, it is possible for a number of additional errors to occur. For example, deserializing a field xyz into an i32 field will result in an error.

For more details on the precise semantics of errors, see the Error type.

Methods

impl Reader<Reader<File>>
[src]

[src]

Create a new CSV parser with a default configuration for the given file path.

To customize CSV parsing, use a ReaderBuilder.

Example

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let mut rdr = Reader::from_path("foo.csv")?;
    for result in rdr.records() {
        let record = result?;
        println!("{:?}", record);
    }
    Ok(())
}

impl<R: Read> Reader<R>
[src]

[src]

Create a new CSV parser with a default configuration for the given reader.

To customize CSV parsing, use a ReaderBuilder.

Example

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
Concord,United States,42695
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    for result in rdr.records() {
        let record = result?;
        println!("{:?}", record);
    }
    Ok(())
}

Important traits for DeserializeRecordsIter<'r, R, D>
[src]

Returns a borrowed iterator over deserialized records.

Each item yielded by this iterator is a Result<D, Error>. Therefore, in order to access the record, callers must handle the possibility of error (typically with try! or ?).

If has_headers was enabled via a ReaderBuilder (which is the default), then this does not include the first record. Additionally, if has_headers is enabled, then deserializing into a struct will automatically align the values in each row to the fields of a struct based on the header row.

Example

This shows how to deserialize CSV data into normal Rust structs. The fields of the header row are used to match up the values in each row to the fields of the struct.

extern crate csv;
#[macro_use]
extern crate serde_derive;

use std::error::Error;
use csv::Reader;

#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
    city: String,
    country: String,
    #[serde(rename = "popcount")]
    population: u64,
}

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,popcount
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.deserialize();

    if let Some(result) = iter.next() {
        let record: Row = result?;
        assert_eq!(record, Row {
            city: "Boston".to_string(),
            country: "United States".to_string(),
            population: 4628910,
        });
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Rules

For the most part, any Rust type that maps straight-forwardly to a CSV record is supported. This includes maps, structs, tuples and tuple structs. Other Rust types, such as Vecs, arrays, and enums have a more complicated story. In general, when working with CSV data, one should avoid nested sequences as much as possible.

Maps, structs, tuples and tuple structs map to CSV records in a simple way. Tuples and tuple structs decode their fields in the order that they are defined. Structs will do the same only if has_headers has been disabled using ReaderBuilder, otherwise, structs and maps are deserialized based on the fields defined in the header row. (If there is no header row, then deserializing into a map will result in an error.)

Nested sequences are supported in a limited capacity. Namely, they are flattened. As a result, it's often useful to use a Vec to capture a "tail" of fields in a record:

extern crate csv;
#[macro_use]
extern crate serde_derive;

use std::error::Error;
use csv::ReaderBuilder;

#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
    label: String,
    values: Vec<i32>,
}

fn example() -> Result<(), Box<Error>> {
    let data = "foo,1,2,3";
    let mut rdr = ReaderBuilder::new()
        .has_headers(false)
        .from_reader(data.as_bytes());
    let mut iter = rdr.deserialize();

    if let Some(result) = iter.next() {
        let record: Row = result?;
        assert_eq!(record, Row {
            label: "foo".to_string(),
            values: vec![1, 2, 3],
        });
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

In the above example, adding another field to the Row struct after the values field will result in a deserialization error. This is because the deserializer doesn't know when to stop reading fields into the values vector, so it will consume the rest of the fields in the record leaving none left over for the additional field.

Finally, simple enums in Rust can be deserialized as well. Namely, enums must either be variants with no arguments or variants with a single argument. Variants with no arguments are deserialized based on which variant name the field matches. Variants with a single argument are deserialized based on which variant can store the data. The latter is only supported when using "untagged" enum deserialization. The following example shows both forms in action:

extern crate csv;
#[macro_use]
extern crate serde_derive;

use std::error::Error;
use csv::Reader;

#[derive(Debug, Deserialize, PartialEq)]
struct Row {
    label: Label,
    value: Number,
}

#[derive(Debug, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
enum Label {
    Celsius,
    Fahrenheit,
}

#[derive(Debug, Deserialize, PartialEq)]
#[serde(untagged)]
enum Number {
    Integer(i64),
    Float(f64),
}

fn example() -> Result<(), Box<Error>> {
    let data = "\
label,value
celsius,22.2222
fahrenheit,72
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.deserialize();

    // Read the first record.
    if let Some(result) = iter.next() {
        let record: Row = result?;
        assert_eq!(record, Row {
            label: Label::Celsius,
            value: Number::Float(22.2222),
        });
    } else {
        return Err(From::from(
            "expected at least two records but got none"));
    }

    // Read the second record.
    if let Some(result) = iter.next() {
        let record: Row = result?;
        assert_eq!(record, Row {
            label: Label::Fahrenheit,
            value: Number::Integer(72),
        });
        Ok(())
    } else {
        Err(From::from(
            "expected at least two records but got only one"))
    }
}

Important traits for DeserializeRecordsIntoIter<R, D>
[src]

Returns an owned iterator over deserialized records.

Each item yielded by this iterator is a Result<D, Error>. Therefore, in order to access the record, callers must handle the possibility of error (typically with try! or ?).

This is mostly useful when you want to return a CSV iterator or store it somewhere.

If has_headers was enabled via a ReaderBuilder (which is the default), then this does not include the first record. Additionally, if has_headers is enabled, then deserializing into a struct will automatically align the values in each row to the fields of a struct based on the header row.

For more detailed deserialization rules, see the documentation on the deserialize method.

Example

extern crate csv;
#[macro_use]
extern crate serde_derive;

use std::error::Error;
use csv::Reader;

#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
    city: String,
    country: String,
    #[serde(rename = "popcount")]
    population: u64,
}

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,popcount
Boston,United States,4628910
";
    let rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.into_deserialize();

    if let Some(result) = iter.next() {
        let record: Row = result?;
        assert_eq!(record, Row {
            city: "Boston".to_string(),
            country: "United States".to_string(),
            population: 4628910,
        });
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Important traits for StringRecordsIter<'r, R>
[src]

Returns a borrowed iterator over all records as strings.

Each item yielded by this iterator is a Result<StringRecord, Error>. Therefore, in order to access the record, callers must handle the possibility of error (typically with try! or ?).

If has_headers was enabled via a ReaderBuilder (which is the default), then this does not include the first record.

Example

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.records();

    if let Some(result) = iter.next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Important traits for StringRecordsIntoIter<R>
[src]

Returns an owned iterator over all records as strings.

Each item yielded by this iterator is a Result<StringRecord, Error>. Therefore, in order to access the record, callers must handle the possibility of error (typically with try! or ?).

This is mostly useful when you want to return a CSV iterator or store it somewhere.

If has_headers was enabled via a ReaderBuilder (which is the default), then this does not include the first record.

Example

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.into_records();

    if let Some(result) = iter.next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Important traits for ByteRecordsIter<'r, R>
[src]

Returns a borrowed iterator over all records as raw bytes.

Each item yielded by this iterator is a Result<ByteRecord, Error>. Therefore, in order to access the record, callers must handle the possibility of error (typically with try! or ?).

If has_headers was enabled via a ReaderBuilder (which is the default), then this does not include the first record.

Example

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.byte_records();

    if let Some(result) = iter.next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Important traits for ByteRecordsIntoIter<R>
[src]

Returns an owned iterator over all records as raw bytes.

Each item yielded by this iterator is a Result<ByteRecord, Error>. Therefore, in order to access the record, callers must handle the possibility of error (typically with try! or ?).

This is mostly useful when you want to return a CSV iterator or store it somewhere.

If has_headers was enabled via a ReaderBuilder (which is the default), then this does not include the first record.

Example

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let rdr = Reader::from_reader(data.as_bytes());
    let mut iter = rdr.into_byte_records();

    if let Some(result) = iter.next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

[src]

Returns a reference to the first row read by this parser.

If no row has been read yet, then this will force parsing of the first row.

If there was a problem parsing the row or if it wasn't valid UTF-8, then this returns an error.

If the underlying reader emits EOF before any data, then this returns an empty record.

Note that this method may be used regardless of whether has_headers was enabled (but it is enabled by default).

Example

This example shows how to get the header row of CSV data. Notice that the header row does not appear as a record in the iterator!

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());

    // We can read the headers before iterating.
    {
        // `headers` borrows from the reader, so we put this in its
        // own scope. That way, the borrow ends before we try iterating
        // below. Alternatively, we could clone the headers.
        let headers = rdr.headers()?;
        assert_eq!(headers, vec!["city", "country", "pop"]);
    }

    if let Some(result) = rdr.records().next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
    } else {
        return Err(From::from(
            "expected at least one record but got none"))
    }

    // We can also read the headers after iterating.
    let headers = rdr.headers()?;
    assert_eq!(headers, vec!["city", "country", "pop"]);
    Ok(())
}

[src]

Returns a reference to the first row read by this parser as raw bytes.

If no row has been read yet, then this will force parsing of the first row.

If there was a problem parsing the row then this returns an error.

If the underlying reader emits EOF before any data, then this returns an empty record.

Note that this method may be used regardless of whether has_headers was enabled (but it is enabled by default).

Example

This example shows how to get the header row of CSV data. Notice that the header row does not appear as a record in the iterator!

extern crate csv;

use std::error::Error;
use csv::Reader;

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());

    // We can read the headers before iterating.
    {
        // `headers` borrows from the reader, so we put this in its
        // own scope. That way, the borrow ends before we try iterating
        // below. Alternatively, we could clone the headers.
        let headers = rdr.byte_headers()?;
        assert_eq!(headers, vec!["city", "country", "pop"]);
    }

    if let Some(result) = rdr.byte_records().next() {
        let record = result?;
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
    } else {
        return Err(From::from(
            "expected at least one record but got none"))
    }

    // We can also read the headers after iterating.
    let headers = rdr.byte_headers()?;
    assert_eq!(headers, vec!["city", "country", "pop"]);
    Ok(())
}

[src]

Set the headers of this CSV parser manually.

This overrides any other setting (including set_byte_headers). Any automatic detection of headers is disabled. This may be called at any time.

Example

extern crate csv;

use std::error::Error;
use csv::{Reader, StringRecord};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());

    assert_eq!(rdr.headers()?, vec!["city", "country", "pop"]);
    rdr.set_headers(StringRecord::from(vec!["a", "b", "c"]));
    assert_eq!(rdr.headers()?, vec!["a", "b", "c"]);

    Ok(())
}

[src]

Set the headers of this CSV parser manually as raw bytes.

This overrides any other setting (including set_headers). Any automatic detection of headers is disabled. This may be called at any time.

Example

extern crate csv;

use std::error::Error;
use csv::{Reader, ByteRecord};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());

    assert_eq!(rdr.byte_headers()?, vec!["city", "country", "pop"]);
    rdr.set_byte_headers(ByteRecord::from(vec!["a", "b", "c"]));
    assert_eq!(rdr.byte_headers()?, vec!["a", "b", "c"]);

    Ok(())
}

[src]

Read a single row into the given record. Returns false when no more records could be read.

If has_headers was enabled via a ReaderBuilder (which is the default), then this will never read the first record.

This method is useful when you want to read records as fast as as possible. It's less ergonomic than an iterator, but it permits the caller to reuse the StringRecord allocation, which usually results in higher throughput.

Records read via this method are guaranteed to have a position set on them, even if the reader is at EOF or if an error is returned.

Example

extern crate csv;

use std::error::Error;
use csv::{Reader, StringRecord};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    let mut record = StringRecord::new();

    if rdr.read_record(&mut record)? {
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

[src]

Read a single row into the given byte record. Returns false when no more records could be read.

If has_headers was enabled via a ReaderBuilder (which is the default), then this will never read the first record.

This method is useful when you want to read records as fast as as possible. It's less ergonomic than an iterator, but it permits the caller to reuse the ByteRecord allocation, which usually results in higher throughput.

Records read via this method are guaranteed to have a position set on them, even if the reader is at EOF or if an error is returned.

Example

extern crate csv;

use std::error::Error;
use csv::{ByteRecord, Reader};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,pop
Boston,United States,4628910
";
    let mut rdr = Reader::from_reader(data.as_bytes());
    let mut record = ByteRecord::new();

    if rdr.read_byte_record(&mut record)? {
        assert_eq!(record, vec!["Boston", "United States", "4628910"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

[src]

Return the current position of this CSV reader.

The byte offset in the position returned can be used to seek this reader. In particular, seeking to a position returned here on the same data will result in parsing the same subsequent record.

Example: reading the position

extern crate csv;

use std::error::Error;
use std::io;
use csv::{Reader, Position};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,popcount
Boston,United States,4628910
Concord,United States,42695
";
    let rdr = Reader::from_reader(io::Cursor::new(data));
    let mut iter = rdr.into_records();
    let mut pos = Position::new();
    loop {
        // Read the position immediately before each record.
        let next_pos = iter.reader().position().clone();
        if iter.next().is_none() {
            break;
        }
        pos = next_pos;
    }

    // `pos` should now be the position immediately before the last
    // record.
    assert_eq!(pos.byte(), 51);
    assert_eq!(pos.line(), 3);
    assert_eq!(pos.record(), 2);
    Ok(())
}

[src]

Returns true if and only if this reader has been exhausted.

When this returns true, no more records can be read from this reader (unless it has been seeked to another position).

Example

extern crate csv;

use std::error::Error;
use std::io;
use csv::{Reader, Position};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,popcount
Boston,United States,4628910
Concord,United States,42695
";
    let mut rdr = Reader::from_reader(io::Cursor::new(data));
    assert!(!rdr.is_done());
    for result in rdr.records() {
        let _ = result?;
    }
    assert!(rdr.is_done());
    Ok(())
}

[src]

Returns true if and only if this reader has been configured to interpret the first record as a header record.

[src]

Returns a reference to the underlying reader.

[src]

Returns a mutable reference to the underlying reader.

[src]

Unwraps this CSV reader, returning the underlying reader.

Note that any leftover data inside this reader's internal buffer is lost.

impl<R: Read + Seek> Reader<R>
[src]

[src]

Seeks the underlying reader to the position given.

This comes with a few caveats:

  • Any internal buffer associated with this reader is cleared.
  • If the given position does not correspond to a position immediately before the start of a record, then the behavior of this reader is unspecified.
  • Any special logic that skips the first record in the CSV reader when reading or iterating over records is disabled.

If the given position has a byte offset equivalent to the current position, then no seeking is performed.

If the header row has not already been read, then this will attempt to read the header row before seeking. Therefore, it is possible that this returns an error associated with reading CSV data.

Note that seeking is performed based only on the byte offset in the given position. Namely, the record or line numbers in the position may be incorrect, but this will cause any future position generated by this CSV reader to be similarly incorrect.

Example: seek to parse a record twice

extern crate csv;

use std::error::Error;
use std::io;
use csv::{Reader, Position};

fn example() -> Result<(), Box<Error>> {
    let data = "\
city,country,popcount
Boston,United States,4628910
Concord,United States,42695
";
    let rdr = Reader::from_reader(io::Cursor::new(data));
    let mut iter = rdr.into_records();
    let mut pos = Position::new();
    loop {
        // Read the position immediately before each record.
        let next_pos = iter.reader().position().clone();
        if iter.next().is_none() {
            break;
        }
        pos = next_pos;
    }

    // Now seek the reader back to `pos`. This will let us read the
    // last record again.
    iter.reader_mut().seek(pos)?;
    let mut iter = iter.into_reader().into_records();
    if let Some(result) = iter.next() {
        let record = result?;
        assert_eq!(record, vec!["Concord", "United States", "42695"]);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

[src]

This is like seek, but provides direct control over how the seeking operation is performed via io::SeekFrom.

The pos position given should correspond the position indicated by seek_from, but there is no requirement. If the pos position given is incorrect, then the position information returned by this reader will be similarly incorrect.

If the header row has not already been read, then this will attempt to read the header row before seeking. Therefore, it is possible that this returns an error associated with reading CSV data.

Unlike seek, this will always cause an actual seek to be performed.

Trait Implementations

impl<R: Debug> Debug for Reader<R>
[src]

[src]

Formats the value using the given formatter. Read more

Auto Trait Implementations

impl<R> Send for Reader<R> where
    R: Send

impl<R> Sync for Reader<R> where
    R: Sync