Struct OdbcReaderBuilder

Source
pub struct OdbcReaderBuilder { /* private fields */ }
Expand description

Creates instances of OdbcReader based on odbc_api::Cursor.

Using a builder pattern instead of passing structs with all required arguments to the constructors of OdbcReader allows arrow_odbc to introduce new paramters to fine tune the creation and behavior of the readers without breaking the code of downstream applications.

Implementations§

Source§

impl OdbcReaderBuilder

Source

pub fn new() -> Self

Source

pub fn with_max_num_rows_per_batch( &mut self, max_num_rows_per_batch: usize, ) -> &mut Self

Limits the maximum amount of rows which are fetched in a single roundtrip to the datasource. Higher numbers lower the IO overhead and may speed up your runtime, but also require larger preallocated buffers and use more memory. This value defaults to 65535 which is u16 max. Some ODBC drivers use a 16Bit integer to count rows so this can avoid overflows. The improvements in saving IO overhead going above that number are estimated to be small. Your milage may vary of course.

Source

pub fn with_max_bytes_per_batch( &mut self, max_bytes_per_batch: usize, ) -> &mut Self

In addition to a row size limit you may specify an upper bound in bytes for allocating the transit buffer. This is useful if you do not know the database schema, or your code has to work with different ones, but you know the amount of memory in your machine. This limit is applied in addition to OdbcReaderBuilder::with_max_num_rows_per_batch. Whichever of these leads to a smaller buffer is used. This defaults to 512 MiB.

Source

pub fn with_schema(&mut self, schema: SchemaRef) -> &mut Self

Describes the types of the Arrow Arrays in the record batches. It is also used to determine CData type requested from the data source. If this is not explicitly set the type is infered from the schema information provided by the ODBC driver. A reason for setting this explicitly could be that you have superior knowledge about your data compared to the ODBC driver. E.g. a type for an unsigned byte (u8) is not part of the ODBC standard. Therfore the driver might at best be able to tell you that this is an (i8). If you want to still have u8s in the resulting array you need to specify the schema manually. Also many drivers struggle with reporting nullability correctly and just report every column as nullable. Explicitly specifying a schema can also compensate for such shortcomings if it turns out to be relevant.

Source

pub fn with_max_text_size(&mut self, max_text_size: usize) -> &mut Self

In order for fast bulk fetching to work, arrow-odbc needs to know the size of the largest possible field in each column. It will do so itself automatically by considering the schema information. However, trouble arises if the schema contains ounbounded variadic fields like VARCHAR(MAX) which can hold really large values. These have a very high upper element size, if any. In order to work with such schemas we need a limit, of what the an upper bound of the actual values in the column is, as opposed to the what the largest value is the column could theoretically store. There is no need for this to be very precise, but just knowing that a value would never exceed 4KiB rather than 2GiB is enough to allow for tremendous efficiency gains. The size of the text is specified in UTF-8 encoded bytes if using a narrow encoding (typically all non-windows systems) and in UTF-16 encoded pairs of bytes on systems using a wide encoding (typically windows). This means about the size in letters, yet if you are using a lot of emojis or other special characters this number might need to be larger.

Source

pub fn with_max_binary_size(&mut self, max_binary_size: usize) -> &mut Self

An upper limit for the size of buffers bound to variadic binary columns of the data source. This limit does not (directly) apply to the size of the created arrow buffers, but rather applies to the buffers used for the data in transit. Use this option if you have e.g. VARBINARY(MAX) fields in your database schema. In such a case without an upper limit, the ODBC driver of your data source is asked for the maximum size of an element, and is likely to answer with either 0 or a value which is way larger than any actual entry in the column. If you can not adapt your database schema, this limit might be what you are looking for. This is the maximum size in bytes of the binary column. If this method is not called no upper limit is set and the maximum element size, reported by ODBC is used to determine buffer sizes.

Source

pub fn with_fallibale_allocations( &mut self, fallibale_allocations: bool, ) -> &mut Self

Set to true in order to trigger an crate::ColumnFailure::TooLarge instead of a panic in case the buffers can not be allocated due to their size. This might have a performance cost for constructing the reader. false by default.

Source

pub fn value_errors_as_null( &mut self, map_value_errors_to_null: bool, ) -> &mut Self

Set to true in order to map a value in the database which can not be successfully converted into its target type to NULL, rather than emitting an external Arrow Error. E.g. currently mapping errors can happen if a datetime value is not in the rang representable by arrow. Default is false.

Source

pub fn trim_fixed_sized_characters( &mut self, fixed_sized_character_strings_are_trimmed: bool, ) -> &mut Self

If set to true text in fixed sized character columns like e.g. CHAR are trimmed of whitespaces before converted into Arrow UTF-8 arrays. Default is false.

Source

pub fn with_payload_text_encoding( &mut self, text_encoding: TextEncoding, ) -> &mut Self

Controls the encoding used for transferring text data from the ODBC data source to the application. The resulting Arrow arrays will still be UTF-8 encoded. You may want to use this if you get garbage characters or invalid UTF-8 errors on non-windows systems to set the encoding to TextEncoding::Utf16. On windows systems you may want to set this to TextEncoding::Utf8 to gain performance benefits, after you have verified that your system locale is set to UTF-8. The default is TextEncoding::Auto.

Source

pub fn build<C>(&self, cursor: C) -> Result<OdbcReader<C>, Error>
where C: Cursor,

Constructs an OdbcReader which consumes the giver cursor. The cursor will also be used to infer the Arrow schema if it has not been supplied explicitly.

§Parameters
  • cursor: ODBC cursor used to fetch batches from the data source. The constructor will bind buffers to this cursor in order to perform bulk fetches from the source. This is usually faster than fetching results row by row as it saves roundtrips to the database. The type of these buffers will be inferred from the arrow schema. Not every arrow type is supported though.

Trait Implementations§

Source§

impl Clone for OdbcReaderBuilder

Source§

fn clone(&self) -> OdbcReaderBuilder

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Default for OdbcReaderBuilder

Source§

fn default() -> OdbcReaderBuilder

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,