[][src]Struct rustf8::Utf8Iterator

pub struct Utf8Iterator<R> where
    R: Iterator
{ /* fields omitted */ }

A Utf8Iterator wraps a UTF-8 decoder around an iterator for Read.

Essentially, the Utf8Iterator converts a u8 iterator into a char iterator. The underling iterator can be an iterator for a BufRead or a Cursor, for example. It is meant to iterate around an I/O. Therefore, it is expecting the inner iterator to be of type Iterator<Item = Result<u8, std::io::Error>>.

The next() method will return an Option, where None indicates the end of the sequence and a value will be of type Result containing a char or an error, which will describe an UTF-8 decoding error or an IO error from the underling iterator. Decoding errors will contain the malformed sequences.

Examples

   use rustf8::*;
   use std::io::prelude::*;
   use std::io::Cursor;
   fn some_correct_utf_8_text() {
       let input: Vec<u8> = vec![
           0xce, 0xba, 0xe1, 0xbd, 0xb9, 0xcf, 0x83, 0xce, 0xbc, 0xce, 0xb5,
       ];
       let stream = Cursor::new(input);
       let iter = stream.bytes();
       let mut chiter = Utf8Iterator::new(iter);
       assert_eq!('κ', chiter.next().unwrap().unwrap());
       assert_eq!('ό', chiter.next().unwrap().unwrap());
       assert_eq!('σ', chiter.next().unwrap().unwrap());
       assert_eq!('μ', chiter.next().unwrap().unwrap());
       assert_eq!('ε', chiter.next().unwrap().unwrap());
       assert!(chiter.next().is_none());
   }

Errors

The Utf8Iterator will identify UTF-8 decoding errors returning the enum Utf8IteratorError. The error will also contain a Box<u8> with the malformed sequence. Subsequent calls to next() are allowed and will decode valid characters from the point beyond the malformed sequence.

The IO error std::io::ErrorKind::Interrupted coming from the underling iterator will be transparently consumed by the next() method. Therefore there will be no need to treat such error.

Panics

Panics if trying to use unget() twice before calling next().

Safety

This crate does not use unsafe {}.

Once decoded, the values are converted using char::from_u32(), which should prevent invalid characters anyway.

Implementations

impl<R> Utf8Iterator<R> where
    R: Iterator<Item = Result<u8, Error>>, 
[src]

pub fn new(inner: R) -> Self[src]

Builds a new UTF-8 iterator using the provided iterator for a Read. This iterator will not reinitialize once it reaches the end of the sequence. Also, the decoding will start at the current position of the underling iterator.

pub fn unget(&mut self, ch: char)[src]

Returns a character to the iterator. That will be the item returned by the subsequent call to next(). Calling unget() twice before calling next() will panic.

Example

use rustf8::*;
use std::io::prelude::*;
use std::io::Cursor;
fn unget_test() {
    let input: Vec<u8> = vec![
        0xce, 0xba, 0xe1, 0xbd, 0xb9, 0xcf, 0x83, 0xce, 0xbc, 0xce, 0xb5,
    ];
    let stream = Cursor::new(input);
    let iter = stream.bytes();
    let mut chiter = Utf8Iterator::new(iter);
    assert_eq!('κ', chiter.next().unwrap().unwrap());
    chiter.unget('ε');
    assert_eq!('ε', chiter.next().unwrap().unwrap());
    assert_eq!('ό', chiter.next().unwrap().unwrap());
    assert_eq!('σ', chiter.next().unwrap().unwrap());
    assert_eq!('μ', chiter.next().unwrap().unwrap());
    assert_eq!('ε', chiter.next().unwrap().unwrap());
    chiter.unget('κ');
    assert_eq!('κ', chiter.next().unwrap().unwrap());
    assert!(chiter.next().is_none());
}

Panics

Panics if trying to use unget() twice before calling next().

Trait Implementations

impl<R> Iterator for Utf8Iterator<R> where
    R: Iterator<Item = Result<u8, Error>>, 
[src]

type Item = Result<char, Utf8IteratorError>

The type of the elements being iterated over.

fn next(&mut self) -> Option<Self::Item>[src]

Decodes the next UTF-8 sequence and returns the corresponding character.

Auto Trait Implementations

impl<R> RefUnwindSafe for Utf8Iterator<R> where
    R: RefUnwindSafe

impl<R> Send for Utf8Iterator<R> where
    R: Send

impl<R> Sync for Utf8Iterator<R> where
    R: Sync

impl<R> Unpin for Utf8Iterator<R> where
    R: Unpin

impl<R> UnwindSafe for Utf8Iterator<R> where
    R: UnwindSafe

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<I> IntoIterator for I where
    I: Iterator
[src]

type Item = <I as Iterator>::Item

The type of the elements being iterated over.

type IntoIter = I

Which kind of iterator are we turning this into?

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.