pub struct SparkSession { /* private fields */ }Expand description
Main entry point for creating DataFrames and executing queries Similar to PySpark’s SparkSession but using Polars as the backend
Implementations§
Source§impl SparkSession
impl SparkSession
pub fn new( app_name: Option<String>, master: Option<String>, config: HashMap<String, String>, ) -> Self
Sourcepub fn create_or_replace_temp_view(&self, name: &str, df: DataFrame)
pub fn create_or_replace_temp_view(&self, name: &str, df: DataFrame)
Register a DataFrame as a temporary view (PySpark: createOrReplaceTempView). The view is session-scoped and is dropped when the session is dropped.
Sourcepub fn create_global_temp_view(&self, name: &str, df: DataFrame)
pub fn create_global_temp_view(&self, name: &str, df: DataFrame)
Global temp view (PySpark: createGlobalTempView). Persists across sessions within the same process.
Sourcepub fn create_or_replace_global_temp_view(&self, name: &str, df: DataFrame)
pub fn create_or_replace_global_temp_view(&self, name: &str, df: DataFrame)
Global temp view (PySpark: createOrReplaceGlobalTempView). Persists across sessions within the same process.
Sourcepub fn drop_temp_view(&self, name: &str)
pub fn drop_temp_view(&self, name: &str)
Drop a temporary view by name (PySpark: catalog.dropTempView). No error if the view does not exist.
Sourcepub fn drop_global_temp_view(&self, name: &str) -> bool
pub fn drop_global_temp_view(&self, name: &str) -> bool
Drop a global temporary view (PySpark: catalog.dropGlobalTempView). Removes from process-wide catalog.
Sourcepub fn register_table(&self, name: &str, df: DataFrame)
pub fn register_table(&self, name: &str, df: DataFrame)
Register a DataFrame as a saved table (PySpark: saveAsTable). Inserts into the tables catalog only.
Sourcepub fn register_database(&self, name: &str)
pub fn register_database(&self, name: &str)
Register a database/schema name (from CREATE DATABASE / CREATE SCHEMA). Persisted in session for listDatabases/databaseExists.
Sourcepub fn list_database_names(&self) -> Vec<String>
pub fn list_database_names(&self) -> Vec<String>
List database names: built-in “default”, “global_temp”, plus any created via CREATE DATABASE / CREATE SCHEMA.
Sourcepub fn database_exists(&self, name: &str) -> bool
pub fn database_exists(&self, name: &str) -> bool
True if the database name exists (default, global_temp, or created via CREATE DATABASE / CREATE SCHEMA).
Sourcepub fn get_saved_table(&self, name: &str) -> Option<DataFrame>
pub fn get_saved_table(&self, name: &str) -> Option<DataFrame>
Get a saved table by name (tables map only). Returns None if not in saved tables (temp views not checked).
Sourcepub fn saved_table_exists(&self, name: &str) -> bool
pub fn saved_table_exists(&self, name: &str) -> bool
True if the name exists in the saved-tables map (not temp views).
Sourcepub fn table_exists(&self, name: &str) -> bool
pub fn table_exists(&self, name: &str) -> bool
Check if a table or temp view exists (PySpark: catalog.tableExists). True if name is in temp views, saved tables, global temp, or warehouse.
Sourcepub fn list_global_temp_view_names(&self) -> Vec<String>
pub fn list_global_temp_view_names(&self) -> Vec<String>
Return global temp view names (process-scoped). PySpark: catalog.listTables(dbName=“global_temp”).
Sourcepub fn list_temp_view_names(&self) -> Vec<String>
pub fn list_temp_view_names(&self) -> Vec<String>
Return temporary view names in this session.
Sourcepub fn list_table_names(&self) -> Vec<String>
pub fn list_table_names(&self) -> Vec<String>
Return saved table names in this session (saveAsTable / write_delta_table).
Sourcepub fn drop_table(&self, name: &str) -> bool
pub fn drop_table(&self, name: &str) -> bool
Drop a saved table by name (removes from tables catalog only). No-op if not present.
Sourcepub fn drop_database(&self, name: &str) -> bool
pub fn drop_database(&self, name: &str) -> bool
Drop a database/schema by name (from DROP SCHEMA / DROP DATABASE). Removes from registered databases only. Does not drop “default” or “global_temp”. No-op if not present (or if_exists). Returns true if removed.
Sourcepub fn warehouse_dir(&self) -> Option<&str>
pub fn warehouse_dir(&self) -> Option<&str>
Return spark.sql.warehouse.dir from config if set. Enables disk-backed saveAsTable.
Sourcepub fn table(&self, name: &str) -> Result<DataFrame, PolarsError>
pub fn table(&self, name: &str) -> Result<DataFrame, PolarsError>
Look up a table or temp view by name (PySpark: table(name)). Resolution order: (1) global_temp.xyz from global catalog, (2) temp view, (3) saved table, (4) warehouse.
pub fn builder() -> SparkSessionBuilder
Sourcepub fn from_config(config: &SparklessConfig) -> SparkSession
pub fn from_config(config: &SparklessConfig) -> SparkSession
Create a session from a SparklessConfig.
Equivalent to SparkSession::builder().with_config(config).get_or_create().
Sourcepub fn get_config(&self) -> &HashMap<String, String>
pub fn get_config(&self) -> &HashMap<String, String>
Return a reference to the session config (for catalog/conf compatibility).
Sourcepub fn is_case_sensitive(&self) -> bool
pub fn is_case_sensitive(&self) -> bool
Whether column names are case-sensitive (PySpark: spark.sql.caseSensitive). Default is false (case-insensitive matching).
Sourcepub fn register_udf<F>(&self, name: &str, f: F) -> Result<(), PolarsError>
pub fn register_udf<F>(&self, name: &str, f: F) -> Result<(), PolarsError>
Register a Rust UDF. Session-scoped. Use with call_udf. PySpark: spark.udf.register (Python) or equivalent.
Sourcepub fn create_dataframe(
&self,
data: Vec<(i64, i64, String)>,
column_names: Vec<&str>,
) -> Result<DataFrame, PolarsError>
pub fn create_dataframe( &self, data: Vec<(i64, i64, String)>, column_names: Vec<&str>, ) -> Result<DataFrame, PolarsError>
Create a DataFrame from a vector of tuples (i64, i64, String)
§Example
use robin_sparkless::session::SparkSession;
let spark = SparkSession::builder().app_name("test").get_or_create();
let df = spark.create_dataframe(
vec![
(1, 25, "Alice".to_string()),
(2, 30, "Bob".to_string()),
],
vec!["id", "age", "name"],
)?;Sourcepub fn create_dataframe_engine(
&self,
data: Vec<(i64, i64, String)>,
column_names: Vec<&str>,
) -> Result<DataFrame, EngineError>
pub fn create_dataframe_engine( &self, data: Vec<(i64, i64, String)>, column_names: Vec<&str>, ) -> Result<DataFrame, EngineError>
Same as create_dataframe but returns EngineError. Use in bindings to avoid Polars.
Sourcepub fn create_dataframe_from_polars(&self, df: PlDataFrame) -> DataFrame
pub fn create_dataframe_from_polars(&self, df: PlDataFrame) -> DataFrame
Create a DataFrame from a Polars DataFrame
Sourcepub fn infer_schema_from_json_rows(
rows: &[Vec<JsonValue>],
names: &[String],
) -> Vec<(String, String)>
pub fn infer_schema_from_json_rows( rows: &[Vec<JsonValue>], names: &[String], ) -> Vec<(String, String)>
Infer schema (name, dtype_str) from JSON rows by scanning the first non-null value per column. Used by createDataFrame(data, schema=None) when schema is omitted or only column names given.
Sourcepub fn create_dataframe_from_rows(
&self,
rows: Vec<Vec<JsonValue>>,
schema: Vec<(String, String)>,
) -> Result<DataFrame, PolarsError>
pub fn create_dataframe_from_rows( &self, rows: Vec<Vec<JsonValue>>, schema: Vec<(String, String)>, ) -> Result<DataFrame, PolarsError>
Create a DataFrame from rows and a schema (arbitrary column count and types).
rows: each inner vec is one row; length must match schema length. Values are JSON-like (i64, f64, string, bool, null, object, array).
schema: list of (column_name, dtype_string), e.g. [("id", "bigint"), ("name", "string")].
Supported dtype strings: bigint, int, long, double, float, string, str, varchar, boolean, bool, date, timestamp, datetime, list, array, array<element_type>, structfield:type,....
When rows is empty and schema is non-empty, returns an empty DataFrame with that schema (issue #519). Use with write.format("parquet").saveAsTable(...) then append; PySpark would fail with “can not infer schema from empty dataset”.
Sourcepub fn create_dataframe_from_rows_engine(
&self,
rows: Vec<Vec<JsonValue>>,
schema: Vec<(String, String)>,
) -> Result<DataFrame, EngineError>
pub fn create_dataframe_from_rows_engine( &self, rows: Vec<Vec<JsonValue>>, schema: Vec<(String, String)>, ) -> Result<DataFrame, EngineError>
Same as create_dataframe_from_rows but returns EngineError. Use in bindings to avoid Polars.
Sourcepub fn range(
&self,
start: i64,
end: i64,
step: i64,
) -> Result<DataFrame, PolarsError>
pub fn range( &self, start: i64, end: i64, step: i64, ) -> Result<DataFrame, PolarsError>
Create a DataFrame with a single column id (bigint) containing values from start to end (exclusive) with step.
PySpark: spark.range(end) or spark.range(start, end, step).
range(end)→ 0 to end-1, step 1range(start, end)→ start to end-1, step 1range(start, end, step)→ start, start+step, … up to but not including end
Sourcepub fn read_csv(&self, path: impl AsRef<Path>) -> Result<DataFrame, PolarsError>
pub fn read_csv(&self, path: impl AsRef<Path>) -> Result<DataFrame, PolarsError>
Read a CSV file.
Uses Polars’ CSV reader with default options:
- Header row is inferred (default: true)
- Schema is inferred from first 100 rows
§Example
use robin_sparkless::SparkSession;
let spark = SparkSession::builder().app_name("test").get_or_create();
let df_result = spark.read_csv("data.csv");
// Handle the Result as appropriate in your applicationSourcepub fn read_csv_engine(
&self,
path: impl AsRef<Path>,
) -> Result<DataFrame, EngineError>
pub fn read_csv_engine( &self, path: impl AsRef<Path>, ) -> Result<DataFrame, EngineError>
Same as read_csv but returns EngineError. Use in bindings to avoid Polars.
Sourcepub fn read_parquet(
&self,
path: impl AsRef<Path>,
) -> Result<DataFrame, PolarsError>
pub fn read_parquet( &self, path: impl AsRef<Path>, ) -> Result<DataFrame, PolarsError>
Read a Parquet file.
Uses Polars’ Parquet reader. Parquet files have embedded schema, so schema inference is automatic.
§Example
use robin_sparkless::SparkSession;
let spark = SparkSession::builder().app_name("test").get_or_create();
let df_result = spark.read_parquet("data.parquet");
// Handle the Result as appropriate in your applicationSourcepub fn read_parquet_engine(
&self,
path: impl AsRef<Path>,
) -> Result<DataFrame, EngineError>
pub fn read_parquet_engine( &self, path: impl AsRef<Path>, ) -> Result<DataFrame, EngineError>
Same as read_parquet but returns EngineError. Use in bindings to avoid Polars.
Sourcepub fn read_json(
&self,
path: impl AsRef<Path>,
) -> Result<DataFrame, PolarsError>
pub fn read_json( &self, path: impl AsRef<Path>, ) -> Result<DataFrame, PolarsError>
Read a JSON file (JSONL format - one JSON object per line).
Uses Polars’ JSONL reader with default options:
- Schema is inferred from first 100 rows
§Example
use robin_sparkless::SparkSession;
let spark = SparkSession::builder().app_name("test").get_or_create();
let df_result = spark.read_json("data.json");
// Handle the Result as appropriate in your applicationSourcepub fn read_json_engine(
&self,
path: impl AsRef<Path>,
) -> Result<DataFrame, EngineError>
pub fn read_json_engine( &self, path: impl AsRef<Path>, ) -> Result<DataFrame, EngineError>
Same as read_json but returns EngineError. Use in bindings to avoid Polars.
Sourcepub fn sql(&self, query: &str) -> Result<DataFrame, PolarsError>
pub fn sql(&self, query: &str) -> Result<DataFrame, PolarsError>
Execute a SQL query (SELECT only). Tables must be registered with create_or_replace_temp_view.
Requires the sql feature. Supports: SELECT (columns or *), FROM (single table or JOIN),
WHERE (basic predicates), GROUP BY + aggregates, ORDER BY, LIMIT.
Sourcepub fn table_engine(&self, name: &str) -> Result<DataFrame, EngineError>
pub fn table_engine(&self, name: &str) -> Result<DataFrame, EngineError>
Same as table but returns EngineError. Use in bindings to avoid Polars.
Sourcepub fn read_delta(&self, name_or_path: &str) -> Result<DataFrame, PolarsError>
pub fn read_delta(&self, name_or_path: &str) -> Result<DataFrame, PolarsError>
Stub when delta feature is disabled. Still supports reading by table name.
pub fn read_delta_with_version( &self, name_or_path: &str, version: Option<i64>, ) -> Result<DataFrame, PolarsError>
pub fn read_delta_from_path( &self, _path: impl AsRef<Path>, ) -> Result<DataFrame, PolarsError>
Source§impl SparkSession
impl SparkSession
Sourcepub fn read(&self) -> DataFrameReader
pub fn read(&self) -> DataFrameReader
Get a DataFrameReader for reading files
Trait Implementations§
Source§impl Clone for SparkSession
impl Clone for SparkSession
Source§fn clone(&self) -> SparkSession
fn clone(&self) -> SparkSession
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for SparkSession
impl RefUnwindSafe for SparkSession
impl Send for SparkSession
impl Sync for SparkSession
impl Unpin for SparkSession
impl UnsafeUnpin for SparkSession
impl UnwindSafe for SparkSession
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more