Skip to main content

parse_xml

Function parse_xml 

Source
pub fn parse_xml(
    reader: impl BufRead,
    config: &Config,
) -> Result<IndexMap<String, RecordBatch>>
Expand description

Parses XML data from a reader into Arrow record batches based on a provided configuration.

This function takes a reader implementing the BufRead trait (e.g., a File, &[u8], or String) and a Config struct that defines the structure of the XML data and how it should be mapped to Arrow tables.

§Arguments

  • reader: A reader object that provides access to the XML data.
  • config: A Config struct that specifies the tables, fields, and data types to extract from the XML.

§Returns

A Result containing:

  • Ok(IndexMap<String, RecordBatch>): An IndexMap where keys are the XML names of the tables (as defined in the config) and values are the corresponding Arrow RecordBatch objects.
  • Err(Error): An Error value if any error occurs during parsing, configuration, or Arrow table creation.

§Example

use xml2arrow::{parse_xml, config::{Config, TableConfig, FieldConfigBuilder, DType}};
use std::fs::File;
use std::io::BufReader;

let xml_content = r#"<data><item><value>123</value></item></data>"#;
let fields = vec![FieldConfigBuilder::new("value", "/data/item/value", DType::Int32).build().unwrap()];
let tables = vec![TableConfig::new("items", "/data", vec![], fields)];
let config = Config { tables, parser_options: Default::default() };
let record_batches = parse_xml(xml_content.as_bytes(), &config).unwrap();
// ... use record_batches