Struct arrow_array::array::DictionaryArray
source · pub struct DictionaryArray<K: ArrowPrimitiveType> { /* private fields */ }
Expand description
A dictionary array where each element is a single value indexed by an integer key. This is mostly used to represent strings or a limited set of primitive types as integers, for example when doing NLP analysis or representing chromosomes by name.
DictionaryArray
are represented using a keys
array and a
values
array, which may be different lengths. The keys
array
stores indexes in the values
array which holds
the corresponding logical value, as shown here:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌─────────────────┐ ┌─────────┐ │ ┌─────────────────┐
│ │ A │ │ 0 │ │ A │ values[keys[0]]
├─────────────────┤ ├─────────┤ │ ├─────────────────┤
│ │ D │ │ 2 │ │ B │ values[keys[1]]
├─────────────────┤ ├─────────┤ │ ├─────────────────┤
│ │ B │ │ 2 │ │ B │ values[keys[2]]
└─────────────────┘ ├─────────┤ │ ├─────────────────┤
│ │ 1 │ │ D │ values[keys[3]]
├─────────┤ │ ├─────────────────┤
│ │ 1 │ │ D │ values[keys[4]]
├─────────┤ │ ├─────────────────┤
│ │ 0 │ │ A │ values[keys[5]]
└─────────┘ │ └─────────────────┘
│ values keys
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
Logical array
Contents
DictionaryArray
length = 6
Example with nullable data:
use arrow_array::{DictionaryArray, Int8Array, types::Int8Type};
let test = vec!["a", "a", "b", "c"];
let array : DictionaryArray<Int8Type> = test.iter().map(|&x| if x == "b" {None} else {Some(x)}).collect();
assert_eq!(array.keys(), &Int8Array::from(vec![Some(0), Some(0), None, Some(1)]));
Example without nullable data:
use arrow_array::{DictionaryArray, Int8Array, types::Int8Type};
let test = vec!["a", "a", "b", "c"];
let array : DictionaryArray<Int8Type> = test.into_iter().collect();
assert_eq!(array.keys(), &Int8Array::from(vec![0, 0, 1, 2]));
Example from existing arrays:
use arrow_array::{DictionaryArray, Int8Array, StringArray, types::Int8Type};
// You can form your own DictionaryArray by providing the
// values (dictionary) and keys (indexes into the dictionary):
let values = StringArray::from_iter_values(["a", "b", "c"]);
let keys = Int8Array::from_iter_values([0, 0, 1, 2]);
let array = DictionaryArray::<Int8Type>::try_new(&keys, &values).unwrap();
let expected: DictionaryArray::<Int8Type> = vec!["a", "a", "b", "c"]
.into_iter()
.collect();
assert_eq!(&array, &expected);
Implementations§
source§impl<K: ArrowPrimitiveType> DictionaryArray<K>
impl<K: ArrowPrimitiveType> DictionaryArray<K>
sourcepub fn try_new(
keys: &PrimitiveArray<K>,
values: &dyn Array
) -> Result<Self, ArrowError>
pub fn try_new(
keys: &PrimitiveArray<K>,
values: &dyn Array
) -> Result<Self, ArrowError>
Attempt to create a new DictionaryArray with a specified keys (indexes into the dictionary) and values (dictionary) array. Returns an error if there are any keys that are outside of the dictionary array.
sourcepub fn keys(&self) -> &PrimitiveArray<K>
pub fn keys(&self) -> &PrimitiveArray<K>
Return an array view of the keys of this dictionary as a PrimitiveArray.
sourcepub fn lookup_key(&self, value: &str) -> Option<K::Native>
pub fn lookup_key(&self, value: &str) -> Option<K::Native>
If value
is present in values
(aka the dictionary),
returns the corresponding key (index into the values
array). Otherwise returns None
.
Panics if values
is not a StringArray
.
sourcepub fn value_type(&self) -> DataType
pub fn value_type(&self) -> DataType
Returns a clone of the value type of this list.
sourcepub fn is_ordered(&self) -> bool
pub fn is_ordered(&self) -> bool
Currently exists for compatibility purposes with Arrow IPC.
sourcepub fn keys_iter(&self) -> impl Iterator<Item = Option<usize>> + '_
pub fn keys_iter(&self) -> impl Iterator<Item = Option<usize>> + '_
Return an iterator over the keys (indexes into the dictionary)
sourcepub fn key(&self, i: usize) -> Option<usize>
pub fn key(&self, i: usize) -> Option<usize>
Return the value of keys
(the dictionary key) at index i
,
cast to usize
, None
if the value at i
is NULL
.
sourcepub fn downcast_dict<V: 'static>(
&self
) -> Option<TypedDictionaryArray<'_, K, V>>
pub fn downcast_dict<V: 'static>(
&self
) -> Option<TypedDictionaryArray<'_, K, V>>
Downcast this dictionary to a TypedDictionaryArray
use arrow_array::{Array, ArrayAccessor, DictionaryArray, StringArray, types::Int32Type};
let orig = [Some("a"), Some("b"), None];
let dictionary = DictionaryArray::<Int32Type>::from_iter(orig);
let typed = dictionary.downcast_dict::<StringArray>().unwrap();
assert_eq!(typed.value(0), "a");
assert_eq!(typed.value(1), "b");
assert!(typed.is_null(2));
sourcepub fn with_values(&self, values: &dyn Array) -> Self
pub fn with_values(&self, values: &dyn Array) -> Self
Returns a new dictionary with the same keys as the current instance but with a different set of dictionary values
This can be used to perform an operation on the values of a dictionary
Panics
Panics if values
has a length less than the current values
use arrow_array::builder::PrimitiveDictionaryBuilder;
use arrow_array::{Int8Array, Int64Array, ArrayAccessor};
use arrow_array::types::{Int32Type, Int8Type};
// Construct a Dict(Int32, Int8)
let mut builder = PrimitiveDictionaryBuilder::<Int32Type, Int8Type>::with_capacity(2, 200);
for i in 0..100 {
builder.append(i % 2).unwrap();
}
let dictionary = builder.finish();
// Perform a widening cast of dictionary values
let typed_dictionary = dictionary.downcast_dict::<Int8Array>().unwrap();
let values: Int64Array = typed_dictionary.values().unary(|x| x as i64);
// Create a Dict(Int32,
let new = dictionary.with_values(&values);
// Verify values are as expected
let new_typed = new.downcast_dict::<Int64Array>().unwrap();
for i in 0..100 {
assert_eq!(new_typed.value(i), (i % 2) as i64)
}
Trait Implementations§
source§impl<T: ArrowPrimitiveType> Array for DictionaryArray<T>
impl<T: ArrowPrimitiveType> Array for DictionaryArray<T>
source§fn data_ref(&self) -> &ArrayData
fn data_ref(&self) -> &ArrayData
source§fn slice(&self, offset: usize, length: usize) -> ArrayRef
fn slice(&self, offset: usize, length: usize) -> ArrayRef
source§fn offset(&self) -> usize
fn offset(&self) -> usize
0
. Read moresource§fn is_null(&self, index: usize) -> bool
fn is_null(&self, index: usize) -> bool
index
is null.
When using this function on a slice, the index is relative to the slice. Read moresource§fn is_valid(&self, index: usize) -> bool
fn is_valid(&self, index: usize) -> bool
index
is not null.
When using this function on a slice, the index is relative to the slice. Read moresource§fn null_count(&self) -> usize
fn null_count(&self) -> usize
source§fn get_buffer_memory_size(&self) -> usize
fn get_buffer_memory_size(&self) -> usize
source§fn get_array_memory_size(&self) -> usize
fn get_array_memory_size(&self) -> usize
get_buffer_memory_size()
and
includes the overhead of the data structures that contain the pointers to the various buffers. Read moresource§impl<K: ArrowPrimitiveType> Clone for DictionaryArray<K>
impl<K: ArrowPrimitiveType> Clone for DictionaryArray<K>
source§impl<T: ArrowPrimitiveType> Debug for DictionaryArray<T>
impl<T: ArrowPrimitiveType> Debug for DictionaryArray<T>
source§impl<T: ArrowPrimitiveType> From<ArrayData> for DictionaryArray<T>
impl<T: ArrowPrimitiveType> From<ArrayData> for DictionaryArray<T>
Constructs a DictionaryArray
from an array data reference.
source§impl<T: ArrowPrimitiveType> From<DictionaryArray<T>> for ArrayData
impl<T: ArrowPrimitiveType> From<DictionaryArray<T>> for ArrayData
source§fn from(array: DictionaryArray<T>) -> Self
fn from(array: DictionaryArray<T>) -> Self
source§impl<'a, T: ArrowDictionaryKeyType> FromIterator<&'a str> for DictionaryArray<T>
impl<'a, T: ArrowDictionaryKeyType> FromIterator<&'a str> for DictionaryArray<T>
Constructs a DictionaryArray
from an iterator of strings.
Example:
use arrow_array::{DictionaryArray, PrimitiveArray, StringArray, types::Int8Type};
let test = vec!["a", "a", "b", "c"];
let array: DictionaryArray<Int8Type> = test.into_iter().collect();
assert_eq!(
"DictionaryArray {keys: PrimitiveArray<Int8>\n[\n 0,\n 0,\n 1,\n 2,\n] values: StringArray\n[\n \"a\",\n \"b\",\n \"c\",\n]}\n",
format!("{:?}", array)
);
source§impl<'a, T: ArrowDictionaryKeyType> FromIterator<Option<&'a str>> for DictionaryArray<T>
impl<'a, T: ArrowDictionaryKeyType> FromIterator<Option<&'a str>> for DictionaryArray<T>
Constructs a DictionaryArray
from an iterator of optional strings.
Example:
use arrow_array::{DictionaryArray, PrimitiveArray, StringArray, types::Int8Type};
let test = vec!["a", "a", "b", "c"];
let array: DictionaryArray<Int8Type> = test
.iter()
.map(|&x| if x == "b" { None } else { Some(x) })
.collect();
assert_eq!(
"DictionaryArray {keys: PrimitiveArray<Int8>\n[\n 0,\n 0,\n null,\n 1,\n] values: StringArray\n[\n \"a\",\n \"c\",\n]}\n",
format!("{:?}", array)
);