Skip to main content

Crate robin_sparkless

Crate robin_sparkless 

Source
Expand description

Robin Sparkless - A Rust DataFrame library with PySpark-like API

This library provides a PySpark-compatible API. The root crate is engine-agnostic: it depends on robin-sparkless-core (types, expression IR, config) and one backend—currently robin-sparkless-polars, which uses Polars for execution. The public API exposes engine-agnostic expression types where possible.

§Expression APIs

§Getting started and embedding

For application code and embedding, use the prelude: use robin_sparkless::prelude::*. For a minimal FFI surface, use prelude::embed. For engine-agnostic expressions, use the root re-exports (col, lit_i64, gt, etc.) and the *_expr_ir / collect_rows methods.

§Panics and errors

Some functions panic when used with invalid or empty inputs (e.g. calling when(cond).otherwise(val) without .then(), or passing no columns to format_string, elt, concat, coalesce, or named_struct in Rust). In Rust, create_map and array return Result for empty input instead of panicking. From Python, empty columns for coalesce, format_string, printf, and named_struct raise ValueError. See the documentation for each function for details.

§API stability

While the crate is in the 0.x series, we follow semver but may introduce breaking changes in minor releases (e.g. 0.1 → 0.2) until 1.0. For behavioral caveats and intentional differences from PySpark, see the repository documentation.

Re-exports§

pub use dataframe::CubeRollupData;
pub use dataframe::DataFrame;
pub use dataframe::DataFrameNa;
pub use dataframe::DataFrameStat;
pub use dataframe::DataFrameWriter;
pub use dataframe::GroupedData;
pub use dataframe::PivotedGroupedData;
pub use session::DataFrameReader;
pub use session::SparkSession;
pub use session::SparkSessionBuilder;
pub use traits::FromRobinDf;
pub use traits::IntoRobinDf;

Modules§

column
config
Configuration for robin-sparkless sessions.
dataframe
Root-owned DataFrame API; delegates to robin-sparkless-polars for execution.
error
Engine error type for embedders (Polars conversion in this crate).
expression
functions
Expression builders: sort, types, column refs/literals, agg, when, string, datetime, struct/map/array, cast, hash, misc. Re-exports all public items so use crate::functions::* and pub use functions::* keep the same API.
plan
Plan execution; use execute_plan for root session/DataFrame. Re-exports plan error types.
prelude
One-stop prelude for application code and embedding.
schema
Schema types and Polars conversion. Types from core; conversion from robin-sparkless-polars.
session
Root-owned Session API; delegates to robin-sparkless-polars for execution.
sql
SQL parsing and execution; returns root-owned DataFrame.
traits
Traits for conversion to and from root-owned DataFrames.
type_coercion

Structs§

ChainedWhenBuilder
Builder for an additional when-then clause (returned by ThenBuilder::when).
Column
Column - represents a column in a DataFrame, used for building expressions Thin wrapper around Polars Expr. May carry a DeferredRandom for rand/randn so with_column can produce one value per row. May carry UdfCall for Python UDFs (eager execution at with_column).
PlanExprError
Error from parsing or interpreting a plan expression.
SortOrder
Sort order specification for use in orderBy/sort. Holds expr + direction + null placement.
SparklessConfig
Configuration for building a SparkSession.
StructField
StructType
ThenBuilder
Builder for chaining when-then clauses before finalizing with otherwise
UdfRegistry
Session-scoped UDF registry. Rust UDFs run lazily via Polars Expr::map.
WhenBuilder
When-then-otherwise builder.
WhenThenBuilder

Enums§

DataType
EngineError
Unified error type for robin-sparkless operations.
ExprIr
Expression IR: a single, serializable tree that backends convert to their native Expr.
JoinType
Re-export for API compatibility. Join type for DataFrame joins (PySpark-compatible)
LiteralValue
Literal value in an expression (engine-agnostic).
PlanError
Errors from plan execution.
PolarsError
SaveMode
Re-export for API compatibility. Save mode for saveAsTable (PySpark default is ErrorIfExists).
SelectItem
Re-export for API compatibility. Select item: either a column name (str) or an expression (PySpark parity: select(“a”, col(“b”).alias(“x”))). Fixes #645: select expects Column or str.
WriteFormat
Re-export for API compatibility. Output format for generic write (PySpark DataFrameWriter.format).
WriteMode
Re-export for API compatibility. Write mode: overwrite or append (PySpark DataFrameWriter.mode for path-based save).

Traits§

RustUdf
Rust UDF: takes columns as Series, returns one Series. Used via Expr::map / map_many.
StructTypePolarsExt
Extension trait for Polars schema conversion. Implemented for StructType from core. Bring this trait into scope to use StructType::from_polars_schema and to_polars_schema.

Functions§

abs
Absolute value (PySpark abs)
acos
Arc cosine (PySpark acos)
acosh
Inverse hyperbolic cosine (PySpark acosh).
add_months
Add n months to date column (PySpark add_months).
aes_decrypt
AES decrypt (PySpark aes_decrypt). Input hex(nonce||ciphertext).
aes_encrypt
AES encrypt (PySpark aes_encrypt). Key as string; AES-128-GCM.
aggregate
Array fold/aggregate (PySpark aggregate). Simplified: zero + sum(list elements).
alias
Alias an expression with a new output name.
and_
any_value
Any value from the group (PySpark any_value). Use in groupBy.agg(). ignorenulls reserved for API compatibility.
approx_count_distinct
Approximate count distinct (PySpark approx_count_distinct). Use in groupBy.agg(). rsd reserved for API compatibility; Polars uses exact n_unique.
approx_percentile
Approximate percentile (PySpark approx_percentile). Maps to quantile; percentage in 0.0..=1.0. accuracy reserved for API compatibility.
array
Create an array column from multiple columns (PySpark array). With no arguments, returns a column of empty arrays (one per row); PySpark parity.
array_agg
Collect to array (PySpark array_agg).
array_append
Append element to end of list (PySpark array_append).
array_compact
Remove null elements from list (PySpark array_compact).
array_contains
Check if list contains value (PySpark array_contains).
array_distinct
Distinct elements in list (PySpark array_distinct).
array_except
Elements in first array not in second (PySpark array_except).
array_exists
True if any list element satisfies the predicate (PySpark exists).
array_filter
Filter list elements by predicate (PySpark filter).
array_flatten
Flatten list of lists to one list (PySpark flatten). Not implemented.
array_forall
True if all list elements satisfy the predicate (PySpark forall).
array_insert
Insert element at 1-based position (PySpark array_insert).
array_intersect
Elements in both arrays (PySpark array_intersect).
array_join
Join list of strings with separator (PySpark array_join).
array_max
Maximum element in list (PySpark array_max).
array_mean
Mean of list elements (PySpark aggregate avg).
array_min
Minimum element in list (PySpark array_min).
array_position
1-based index of first occurrence of value in list, or 0 if not found (PySpark array_position). Implemented via Polars list.eval with col(“”) as element.
array_prepend
Prepend element to start of list (PySpark array_prepend).
array_remove
New list with all elements equal to value removed (PySpark array_remove). Implemented via Polars list.eval + list.drop_nulls.
array_repeat
Repeat each element n times (PySpark array_repeat). Not implemented: would require list.eval with dynamic repeat.
array_size
Number of elements in list (PySpark size / array_size). Returns Int32.
array_slice
Slice list from 1-based start with optional length (PySpark slice).
array_sort
Sort list elements (PySpark array_sort).
array_sum
Sum of list elements (PySpark aggregate sum).
array_transform
Transform list elements by expression (PySpark transform).
array_union
Distinct elements from both arrays (PySpark array_union).
arrays_overlap
True if two arrays have any element in common (PySpark arrays_overlap).
arrays_zip
Zip arrays into array of structs (PySpark arrays_zip).
asc
Ascending sort, nulls first (Spark default for ASC).
asc_nulls_first
Ascending sort, nulls first.
asc_nulls_last
Ascending sort, nulls last.
ascii
ASCII value of first character (PySpark ascii). Returns Int32.
asin
Arc sine (PySpark asin)
asinh
Inverse hyperbolic sine (PySpark asinh).
assert_true
Assert that all boolean values are true; errors otherwise (PySpark assert_true). When err_msg is Some, it is used in the error message when assertion fails.
atan
Arc tangent (PySpark atan)
atan2
Two-argument arc tangent atan2(y, x) in radians (PySpark atan2)
atanh
Inverse hyperbolic tangent (PySpark atanh).
avg
Average aggregation
base64
Base64 encode string bytes (PySpark base64).
between
bin
Convert integer to binary string (PySpark bin).
bit_and
Bitwise AND of two integer/boolean columns (PySpark bit_and).
bit_count
Count of set bits in the integer representation (PySpark bit_count).
bit_get
Alias for getbit (PySpark bit_get).
bit_length
Bit length of string (bytes * 8) (PySpark bit_length).
bit_or
Bitwise OR of two integer/boolean columns (PySpark bit_or).
bit_xor
Bitwise XOR of two integer/boolean columns (PySpark bit_xor).
bitmap_bit_position
Map integral value (0–32767) to bit position for bitmap aggregates (PySpark bitmap_bit_position).
bitmap_bucket_number
Bucket number for distributed bitmap (PySpark bitmap_bucket_number). value / 32768.
bitmap_construct_agg
Aggregate: bitwise OR of bit positions into one bitmap binary (PySpark bitmap_construct_agg). Use in group_by(…).agg([bitmap_construct_agg(col)]).
bitmap_count
Count set bits in a bitmap binary column (PySpark bitmap_count).
bitmap_or_agg
Aggregate: bitwise OR of bitmap binary column (PySpark bitmap_or_agg).
bitwise_not
Bitwise NOT of an integer/boolean column (PySpark bitwise_not / bitwiseNOT).
bool_and
Boolean AND across group (PySpark bool_and). Use in groupBy.agg(); column should be boolean.
broadcast
Broadcast hint - no-op that returns the same DataFrame (PySpark broadcast).
bround
Banker’s rounding - round half to even (PySpark bround).
btrim
Trim leading and trailing chars (PySpark btrim). trim_str defaults to whitespace.
call
Generic function call (for the long tail of functions).
call_udf
Call a registered UDF by name. PySpark: F.call_udf(udfName, *cols). Requires a session (set by get_or_create). Raises if UDF not found.
cardinality
Cardinality: number of elements in array (PySpark cardinality). Alias for size/array_size.
cast
Cast column to the given type (PySpark cast). Fails on invalid conversion. String-to-boolean uses custom parsing (“true”/“false”/“1”/“0”) since Polars does not support Utf8->Boolean. String-to-date accepts date and datetime strings (e.g. “2025-01-01 10:30:00” truncates to date) for Spark parity.
cbrt
Cube root (PySpark cbrt).
ceil
Ceiling (PySpark ceil)
ceiling
Alias for ceil. PySpark ceiling.
char
Int to single-character string (PySpark char). Valid codepoint only.
char_length
Length of string in characters (PySpark char_length). Alias of length().
character_length
Length of string in characters (PySpark character_length). Alias of length().
chr
Alias for char (PySpark chr).
coalesce
Coalesce - returns the first non-null value from multiple columns.
col
Column reference.
collect_list
Collect column values into list per group (PySpark collect_list). Use in groupBy.agg().
collect_set
Collect distinct column values into list per group (PySpark collect_set). Use in groupBy.agg().
concat
Concatenate string columns without separator (PySpark concat) Concatenate string columns (PySpark concat). Panics if columns is empty.
concat_ws
Concatenate string columns with separator (PySpark concat_ws) Concatenate with separator (PySpark concat_ws). Panics if columns is empty.
contains
True if string contains substring (literal). PySpark contains.
conv
Base conversion (PySpark conv). num from from_base to to_base.
convert_timezone
Convert timestamp between timezones (PySpark convert_timezone).
corr
Pearson correlation aggregation (PySpark corr). Module-level; use in groupBy.agg() with two columns.
corr_expr
Pearson correlation aggregation (PySpark corr). Returns Expr for use in groupBy.agg().
cos
Cosine in radians (PySpark cos)
cosh
Hyperbolic cosine (PySpark cosh).
cot
Cotangent: 1/tan (PySpark cot).
count
count_distinct
Count distinct aggregation (PySpark countDistinct)
count_if
Count rows where condition is true (PySpark count_if). Use in groupBy.agg(); column should be boolean (true=1, false=0).
covar_pop
Population covariance aggregation (PySpark covar_pop). Module-level; use in groupBy.agg() with two columns.
covar_pop_expr
Population covariance aggregation (PySpark covar_pop). Returns Expr for use in groupBy.agg().
covar_samp_expr
Sample covariance aggregation (PySpark covar_samp). Returns Expr for use in groupBy.agg().
crc32
CRC32 of string bytes (PySpark crc32). Not implemented: requires element-wise UDF.
create_map
Build a map column from alternating key/value expressions (PySpark create_map). Returns List(Struct{key, value}) using Polars as_struct and concat_list. With no args (or empty slice), returns a column of empty maps per row (PySpark parity #275).
csc
Cosecant: 1/sin (PySpark csc).
cume_dist
Cumulative distribution in partition: row_number / count. Window is applied.
curdate
Alias for current_date (PySpark curdate).
current_catalog
Current catalog name stub (PySpark current_catalog).
current_database
Current database/schema name stub (PySpark current_database).
current_date
Current date (evaluation time). PySpark current_date.
current_schema
Current schema name stub (PySpark current_schema).
current_timestamp
Current timestamp (evaluation time). PySpark current_timestamp.
current_timezone
Current session timezone (PySpark current_timezone). Default “UTC”. Returns literal column.
current_user
Current user stub (PySpark current_user).
date_add
Add n days to date column (PySpark date_add).
date_diff
Alias for datediff (PySpark date_diff). date_diff(end, start).
date_format
Format date/datetime as string (PySpark date_format). Accepts PySpark/Java SimpleDateFormat style (e.g. “yyyy-MM”) and converts to chrono strftime internally.
date_from_unix_date
Days since epoch to date (PySpark date_from_unix_date).
date_part
Alias for extract (PySpark date_part).
date_sub
Subtract n days from date column (PySpark date_sub).
date_trunc
Alias for trunc (PySpark date_trunc).
dateadd
Alias for date_add (PySpark dateadd).
datediff
Number of days between two date columns (PySpark datediff).
datepart
Alias for extract (PySpark datepart).
day
Extract day of month from datetime column (PySpark day)
dayname
Weekday name “Mon”,“Tue”,… (PySpark dayname).
dayofmonth
Alias for day. PySpark dayofmonth.
dayofweek
Extract day of week: 1=Sunday..7=Saturday (PySpark dayofweek).
dayofyear
Extract day of year (1-366) (PySpark dayofyear).
days
Interval of n days (PySpark days). For use in date_add, timestampadd, etc.
decode
Decode binary (hex string) to string (PySpark decode). Charset: UTF-8.
degrees
Convert radians to degrees (PySpark degrees)
dense_rank
Dense rank window function (no gaps). Use with .over(partition_by).
desc
Descending sort, nulls last (Spark default for DESC).
desc_nulls_first
Descending sort, nulls first.
desc_nulls_last
Descending sort, nulls last.
e
Constant e = 2.718… (PySpark e).
element_at
Get element at 1-based index (PySpark element_at).
elt
Return column at 1-based index (PySpark elt). elt(2, a, b, c) returns b. Element at 1-based index from list of columns (PySpark elt). Panics if columns is empty.
encode
Encode string to binary (PySpark encode). Charset: UTF-8. Returns hex string.
endswith
True if string ends with suffix (PySpark endswith).
eq
equal_null
Null-safe equality: true if both null or both equal (PySpark equal_null). Alias for eq_null_safe.
every
Alias for bool_and (PySpark every). Use in groupBy.agg().
execute_plan
Execute a logical plan; returns root-owned DataFrame.
exp
Exponential (PySpark exp)
explode
Explode list into one row per element (PySpark explode).
explode_outer
Explode; null/empty yields one row with null (PySpark explode_outer).
expm1
exp(x) - 1 (PySpark expm1).
extract
Extract field from date/datetime (PySpark extract). field: year, month, day, hour, minute, second, quarter, week, dayofweek, dayofyear.
factorial
Factorial n! (PySpark factorial). n in 0..=20; null for negative or overflow.
find_in_set
1-based index of str in comma-delimited set (PySpark find_in_set). 0 if not found or str contains comma.
first
First value in group (PySpark first). Use in groupBy.agg(). ignorenulls: when true, first non-null; Polars 0.45 uses .first() only (ignorenulls reserved for API compatibility).
first_value
First value in partition (PySpark first_value). Use with .over(partition_by).
floor
Floor (PySpark floor)
format_number
Format numeric as string with fixed decimal places (PySpark format_number).
format_string
Printf-style format (PySpark format_string). Supports %s, %d, %i, %f, %g, %%. Panics if columns is empty.
from_csv
Parse CSV string to struct (PySpark from_csv). Minimal implementation.
from_json
Parse string column as JSON into struct (PySpark from_json).
from_unixtime
Convert seconds since epoch to formatted string (PySpark from_unixtime).
from_utc_timestamp
Interpret timestamp as UTC, convert to tz (PySpark from_utc_timestamp).
ge
get
Get value for key from map, or null (PySpark get).
get_json_object
Extract JSON path from string column (PySpark get_json_object).
getbit
Get bit at 0-based position (PySpark getbit).
greatest
Greatest of the given columns per row (PySpark greatest). Uses element-wise UDF.
grouping
Grouping set marker (PySpark grouping). Stub: returns 0 (no GROUPING SETS in robin-sparkless).
grouping_id
Grouping set id (PySpark grouping_id). Stub: returns 0.
gt
hash
Hash of column values (PySpark hash). Uses Murmur3 32-bit for parity with PySpark.
hex
Convert to hex string (PySpark hex).
hour
Extract hour from datetime column (PySpark hour).
hours
Interval of n hours (PySpark hours).
hypot
sqrt(xx + yy) (PySpark hypot).
ifnull
Alias for nvl. PySpark ifnull.
ilike
Case-insensitive LIKE. PySpark ilike. When escape_char is Some(esc), esc + char treats that char as literal.
initcap
Title case (PySpark initcap)
inline
Explode list of structs into rows; struct fields become columns after unnest (PySpark inline). Returns the exploded struct column; use unnest to expand struct fields to columns.
inline_outer
Like inline but null/empty yields one row of nulls (PySpark inline_outer).
input_file_name
Stub input file name - empty string (PySpark input_file_name).
instr
Find substring position 1-based; 0 if not found (PySpark instr).
is_in
is_null
isin
Check if column values are in the given list (PySpark isin). Uses Polars is_in.
isin_i64
Check if column values are in the given i64 slice (PySpark isin with literal list).
isin_str
Check if column values are in the given string slice (PySpark isin with literal list).
isnan
True where the float value is NaN (PySpark isnan).
isnotnull
True if column is not null. PySpark isnotnull.
isnull
True if column is null. PySpark isnull.
json_array_length
Length of JSON array at path (PySpark json_array_length).
json_object_keys
Keys of JSON object (PySpark json_object_keys). Returns list of strings.
json_tuple
Extract keys from JSON as struct (PySpark json_tuple). keys: e.g. [“a”, “b”].
kurtosis
Kurtosis aggregation (PySpark kurtosis). Fisher definition, bias=true. Use in groupBy.agg().
lag
Lag: value from n rows before in partition. Use with .over(partition_by).
last_day
Last day of month for date column (PySpark last_day).
last_value
Last value in partition (PySpark last_value). Use with .over(partition_by).
lcase
Alias for lower. PySpark lcase.
le
lead
Lead: value from n rows after in partition. Use with .over(partition_by).
least
Least of the given columns per row (PySpark least). Uses element-wise UDF.
left
Leftmost n characters (PySpark left).
length
String length in characters (PySpark length)
levenshtein
Levenshtein distance (PySpark levenshtein). Not implemented: requires element-wise UDF.
like
SQL LIKE pattern (% any, _ one char). PySpark like. When escape_char is Some(esc), esc + char treats that char as literal.
lit_bool
lit_f64
lit_i32
lit_i64
lit_null
lit_str
ln
Alias for log (natural log). PySpark ln.
localtimestamp
Alias for current_timestamp (PySpark localtimestamp).
locate
Find substring position 1-based, starting at pos (PySpark locate). 0 if not found.
log
Natural logarithm (PySpark log with one arg)
log2
Base-2 log (PySpark log2).
log1p
log(1 + x) (PySpark log1p).
log10
Base-10 log (PySpark log10).
log_with_base
Logarithm with given base (PySpark log(col, base)). base must be positive and not 1.
lower
Convert string column to lowercase (PySpark lower)
lpad
Left-pad string to length with pad char (PySpark lpad).
lt
ltrim
Trim leading whitespace (PySpark ltrim)
make_date
Build date from year, month, day columns (PySpark make_date).
make_dt_interval
Day-time interval: days, hours, minutes, seconds (PySpark make_dt_interval). All optional; 0 for omitted.
make_interval
Create interval duration (PySpark make_interval). Optional args; 0 for omitted.
make_timestamp
make_timestamp(year, month, day, hour, min, sec, timezone?) - six columns to timestamp (PySpark make_timestamp). When timezone is Some(tz), components are interpreted as local time in that zone, then converted to UTC.
make_timestamp_ntz
Alias for make_timestamp (PySpark make_timestamp_ntz - no timezone).
make_ym_interval
Year-month interval (PySpark make_ym_interval). Polars has no native YM type; return months as Int32 (years*12 + months).
map_concat
Merge two map columns (PySpark map_concat). Last value wins for duplicate keys.
map_contains_key
True if map contains key (PySpark map_contains_key).
map_entries
Return map as list of structs {key, value} (PySpark map_entries).
map_filter
Filter map entries by predicate (PySpark map_filter).
map_filter_value_gt
Convenience: map_filter with value > threshold predicate.
map_from_arrays
Build map from two array columns keys and values (PySpark map_from_arrays). Implemented via UDF.
map_from_entries
Array of structs {key, value} to map (PySpark map_from_entries).
map_keys
Extract keys from a map column (PySpark map_keys). Map is List(Struct{key, value}).
map_values
Extract values from a map column (PySpark map_values).
map_zip_with
Merge two maps by key with merge function (PySpark map_zip_with).
map_zip_with_coalesce
Convenience: map_zip_with with coalesce(value1, value2) merge.
mask
Mask string: replace upper/lower/digit/other with given chars (PySpark mask).
max
max_by
Value of value_col in the row where ord_col is maximum (PySpark max_by). Use in groupBy.agg().
md5
MD5 hash of string bytes, return hex string (PySpark md5).
mean
median
Median aggregation. PySpark median.
min
min_by
Value of value_col in the row where ord_col is minimum (PySpark min_by). Use in groupBy.agg().
minute
Extract minute from datetime column (PySpark minute).
minutes
Interval of n minutes (PySpark minutes).
mode
Mode aggregation - most frequent value. PySpark mode.
monotonically_increasing_id
Stub monotonically_increasing_id - constant 0 (PySpark monotonically_increasing_id). Note: differs from PySpark which is unique per-row; see PYSPARK_DIFFERENCES.md.
month
Extract month from datetime column (PySpark month)
months
Interval of n months (PySpark months). Approximated as 30*n days.
months_between
Months between end and start dates as fractional (PySpark months_between). When round_off is true, rounds to 8 decimal places (PySpark default).
named_struct
Create struct with explicit field names (PySpark named_struct). Pairs of (name, column). Struct from (name, column) pairs (PySpark named_struct). Panics if pairs is empty.
nanvl
Replace NaN with value. PySpark nanvl.
ne
negate
Unary minus / negate (PySpark negate, negative).
negative
Alias for negate. PySpark negative.
next_day
Next date that is the given weekday (e.g. “Mon”) (PySpark next_day).
not_
now
Alias for current_timestamp (PySpark now).
nth_value
Nth value in partition by order (1-based n). Window is applied; do not call .over() again.
ntile
Ntile: bucket 1..n by rank within partition. Window is applied.
nullif
Return null if column equals value, else column. PySpark nullif.
nvl
Alias for coalesce(col, value). PySpark nvl / ifnull.
nvl2
Three-arg null replacement: if col1 is not null then col2 else col3. PySpark nvl2.
octet_length
Length of string in bytes (PySpark octet_length).
or_
overlay
Replace substring at 1-based position (PySpark overlay). replace is literal.
parse_type_name
Parse PySpark-like type name to Polars DataType. Decimal(precision, scale) is mapped to Float64 for schema parity (Polars dtype-decimal not enabled).
parse_url
Parse URL and extract part: PROTOCOL, HOST, PATH, etc. (PySpark parse_url). When key is Some(k) and part is QUERY/QUERYSTRING, returns the value for that query parameter only.
percent_rank
Percent rank in partition: (rank - 1) / (count - 1). Window is applied.
percentile_approx
Approximate percentile (PySpark percentile_approx). Alias for approx_percentile.
pi
Constant pi = 3.14159… (PySpark pi).
pmod
Positive modulus (PySpark pmod).
posexplode
Explode list with position (PySpark posexplode). Returns (pos_column, value_column). pos is 1-based; implemented via list.eval(cum_count()).explode() and explode().
posexplode_outer
Posexplode with null preservation (PySpark posexplode_outer).
position
Position of substring in column (PySpark position). Same as instr; (substr, col) argument order.
positive
Unary plus - no-op, returns column as-is (PySpark positive).
pow
Power (PySpark pow)
power
Alias for pow. PySpark power.
printf
Alias for format_string (PySpark printf).
quarter
Extract quarter (1-4) from date/datetime (PySpark quarter).
radians
Convert degrees to radians (PySpark radians)
raise_error
Raise an error when evaluated (PySpark raise_error). Always fails with the given message.
rand
Random uniform [0, 1) per row, with optional seed (PySpark rand). When added via with_column, generates one distinct value per row (PySpark-like).
randn
Random standard normal per row, with optional seed (PySpark randn). When added via with_column, generates one distinct value per row (PySpark-like).
rank
Rank window function (ties same rank, gaps). Use with .over(partition_by).
regexp
Alias for rlike (PySpark regexp).
regexp_count
Count of non-overlapping regex matches (PySpark regexp_count).
regexp_extract
Extract first match of regex (PySpark regexp_extract). group_index 0 = full match.
regexp_extract_all
Extract all matches of regex (PySpark regexp_extract_all).
regexp_instr
1-based position of first regex match (PySpark regexp_instr).
regexp_like
Check if string matches regex (PySpark regexp_like / rlike).
regexp_replace
Replace first match of regex (PySpark regexp_replace)
regexp_substr
First substring matching regex (PySpark regexp_substr). Null if no match.
regr_avgx_expr
Regression: average of x (PySpark regr_avgx).
regr_avgy_expr
Regression: average of y (PySpark regr_avgy).
regr_count_expr
Regression: count of (y, x) pairs where both non-null (PySpark regr_count).
regr_intercept_expr
Regression intercept: avg_y - slope*avg_x (PySpark regr_intercept).
regr_r2_expr
Regression R-squared (PySpark regr_r2).
regr_slope_expr
Regression slope: cov_samp(y,x)/var_samp(x) (PySpark regr_slope).
regr_sxx_expr
Regression: sum((x - avg_x)^2) (PySpark regr_sxx).
regr_sxy_expr
Regression: sum((x - avg_x)(y - avg_y)) (PySpark regr_sxy).
regr_syy_expr
Regression: sum((y - avg_y)^2) (PySpark regr_syy).
repeat
Repeat string n times (PySpark repeat).
replace
Replace all occurrences of search with replacement (literal). PySpark replace.
reverse
Reverse string (PySpark reverse).
right
Rightmost n characters (PySpark right).
rint
Round to nearest integer (PySpark rint).
rlike
Alias for regexp_like. PySpark rlike / regexp.
round
Round (PySpark round)
row_number
Row number window function (1, 2, 3 by order within partition). Use with .over(partition_by) after ranking by an order column.
rpad
Right-pad string to length with pad char (PySpark rpad).
rtrim
Trim trailing whitespace (PySpark rtrim)
schema_from_json
Parse a schema from a JSON string (e.g. from a host binding).
schema_of_csv
Schema of CSV string (PySpark schema_of_csv). Returns literal schema string; minimal stub.
schema_of_json
Schema of JSON string (PySpark schema_of_json). Returns literal schema string; minimal stub.
sec
Secant: 1/cos (PySpark sec).
second
Extract second from datetime column (PySpark second).
sequence
Generate array of numbers from start to stop (inclusive) with optional step (PySpark sequence). step defaults to 1.
sha1
SHA1 hash of string bytes, return hex string (PySpark sha1).
sha2
SHA2 hash; bit_length 256, 384, or 512 (PySpark sha2).
shift_left
Bitwise left shift (PySpark shiftLeft). col << n.
shift_right
Bitwise signed right shift (PySpark shiftRight). col >> n.
shift_right_unsigned
Bitwise unsigned right shift (PySpark shiftRightUnsigned). Logical shift for Long.
shuffle
Random permutation of list elements (PySpark shuffle).
sign
Alias for signum (PySpark sign).
signum
Sign of the number: -1, 0, or 1 (PySpark signum)
sin
Sine in radians (PySpark sin)
sinh
Hyperbolic sine (PySpark sinh).
size
Alias for array_size (PySpark size).
skewness
Skewness aggregation (PySpark skewness). bias=true. Use in groupBy.agg().
soundex
Soundex code (PySpark soundex). Not implemented: requires element-wise UDF.
spark_partition_id
Stub partition id - always 0 (PySpark spark_partition_id).
split
Split string by delimiter (PySpark split). Optional limit: at most that many parts (remainder in last).
split_part
Split by delimiter and return 1-based part (PySpark split_part).
sqrt
Square root (PySpark sqrt)
stack
Stack columns into struct (PySpark stack). Alias for struct_.
startswith
True if string starts with prefix (PySpark startswith).
std
Alias for stddev (PySpark std).
stddev
Standard deviation (sample) aggregation (PySpark stddev / stddev_samp)
stddev_pop
Population standard deviation (ddof=0). PySpark stddev_pop.
stddev_samp
Sample standard deviation (ddof=1). Alias for stddev. PySpark stddev_samp.
str_to_map
Parse string to map (PySpark str_to_map). Default delims: “,” and “:”.
struct_
Create struct from columns using column names as field names (PySpark struct). Struct from columns (PySpark struct). Panics if columns is empty.
substr
Alias for substring. PySpark substr.
substring
Substring with 1-based start (PySpark substring semantics)
substring_index
Substring before/after nth delimiter (PySpark substring_index).
sum
tan
Tangent in radians (PySpark tan)
tanh
Hyperbolic tangent (PySpark tanh).
timestamp_micros
Convert microseconds since epoch to timestamp (PySpark timestamp_micros).
timestamp_millis
Convert milliseconds since epoch to timestamp (PySpark timestamp_millis).
timestamp_seconds
Convert seconds since epoch to timestamp (PySpark timestamp_seconds).
timestampadd
Add amount of unit to timestamp (PySpark timestampadd).
timestampdiff
Difference between timestamps in unit (PySpark timestampdiff).
to_binary
Convert to binary (PySpark to_binary). fmt: ‘utf-8’, ‘hex’.
to_char
Cast to string, optionally with format for datetime (PySpark to_char, to_varchar). When format is Some, uses date_format for datetime columns (PySpark format → chrono strftime); otherwise cast to string. Returns Err if the cast to string fails (invalid type name or unsupported column type).
to_csv
Format struct as CSV string (PySpark to_csv). Minimal implementation.
to_date
Cast or parse to date (PySpark to_date). When format is None: cast date/datetime to date, parse string with default formats. When format is Some: parse string with given format.
to_degrees
Alias for degrees. PySpark toDegrees.
to_engine_error
Convert PolarsError to EngineError (for APIs that still return PolarsError).
to_json
Serialize struct column to JSON string (PySpark to_json).
to_number
Cast to numeric (PySpark to_number). Uses Double. Format parameter reserved for future use. Returns Err if the cast to double fails (invalid type name or unsupported column type).
to_radians
Alias for radians. PySpark toRadians.
to_timestamp
Cast to timestamp, or parse with format when provided (PySpark to_timestamp). When format is None, parses string columns with default format “%Y-%m-%d %H:%M:%S” (PySpark parity #273).
to_timestamp_ltz
Parse as timestamp in local timezone, return UTC (PySpark to_timestamp_ltz).
to_timestamp_ntz
Parse as timestamp without timezone (PySpark to_timestamp_ntz). Returns Datetime(_, None).
to_unix_timestamp
Alias for unix_timestamp.
to_utc_timestamp
Interpret timestamp as in tz, convert to UTC (PySpark to_utc_timestamp).
to_varchar
Alias for to_char (PySpark to_varchar).
transform_keys
Transform map keys by expr (PySpark transform_keys).
transform_values
Transform map values by expr (PySpark transform_values).
translate
Character-by-character translation (PySpark translate).
trim
Trim leading and trailing whitespace (PySpark trim)
trunc
Truncate date/datetime to unit (PySpark trunc).
try_add
Add that returns null on overflow (PySpark try_add). Uses checked arithmetic.
try_aes_decrypt
Try AES decrypt (PySpark try_aes_decrypt). Returns null on failure.
try_avg
Average aggregation; null on invalid (PySpark try_avg). Use in groupBy.agg(). Maps to mean; reserved for API.
try_cast
Cast column to the given type, returning null on invalid conversion (PySpark try_cast). String-to-boolean uses custom parsing (“true”/“false”/“1”/“0”) since Polars does not support Utf8->Boolean. String-to-date accepts date and datetime strings; invalid strings become null.
try_divide
Division that returns null on divide-by-zero (PySpark try_divide).
try_element_at
Element at index, null if out of bounds (PySpark try_element_at). Same as element_at for lists.
try_multiply
Multiply that returns null on overflow (PySpark try_multiply).
try_subtract
Subtract that returns null on overflow (PySpark try_subtract).
try_sum
Sum aggregation; null on overflow (PySpark try_sum). Use in groupBy.agg(). Polars sum does not overflow; reserved for API.
try_to_binary
Try convert to binary; null on failure (PySpark try_to_binary).
try_to_number
Cast to numeric, null on invalid (PySpark try_to_number). Format parameter reserved for future use. Returns Err if the try_cast setup fails (invalid type name); column values that cannot be parsed become null.
try_to_timestamp
Cast to timestamp, null on invalid, or parse with format when provided (PySpark try_to_timestamp). When format is None, parses string columns with default format (null on invalid). #273
typeof_
Data type of column as string (PySpark typeof). Constant per column from schema.
ucase
Alias for upper. PySpark ucase.
unbase64
Base64 decode to string (PySpark unbase64). Invalid decode → null.
unhex
Convert hex string to binary/string (PySpark unhex).
unix_date
Date to days since 1970-01-01 (PySpark unix_date).
unix_micros
Timestamp to microseconds since epoch (PySpark unix_micros).
unix_millis
Timestamp to milliseconds since epoch (PySpark unix_millis).
unix_seconds
Timestamp to seconds since epoch (PySpark unix_seconds).
unix_timestamp
Parse string timestamp to seconds since epoch (PySpark unix_timestamp). format defaults to yyyy-MM-dd HH:mm:ss.
unix_timestamp_now
Current Unix timestamp in seconds (PySpark unix_timestamp with no args).
upper
Convert string column to uppercase (PySpark upper)
url_decode
Percent-decode URL-encoded string (PySpark url_decode).
url_encode
Percent-encode string for URL (PySpark url_encode).
user
User stub (PySpark user).
var_pop
Population variance (ddof=0). PySpark var_pop.
var_samp
Sample variance (ddof=1). Alias for variance. PySpark var_samp.
variance
Variance (sample) aggregation (PySpark variance / var_samp)
version
Session/library version string (PySpark version).
weekday
Weekday 0=Mon, 6=Sun (PySpark weekday).
weekofyear
Extract ISO week of year (1-53) (PySpark weekofyear).
when
Start a when(condition).then(…).otherwise(…) chain.
when_then_otherwise_null
Two-arg when(condition, value): returns value where condition is true, null otherwise (PySpark when(cond, val)).
width_bucket
Assign value to histogram bucket (PySpark width_bucket). Returns 0 if v < min_val, num_bucket+1 if v >= max_val.
xxhash64
XXH64 hash (PySpark xxhash64). Not implemented: requires element-wise UDF.
year
Extract year from datetime column (PySpark year)
years
Interval of n years (PySpark years). Approximated as 365*n days.
zip_with
Zip two arrays element-wise with merge function (PySpark zip_with).
zip_with_coalesce
Convenience: zip_with with coalesce(left, right) merge.

Type Aliases§

CollectedRows
Rows as list of maps: column name -> JSON value. Used by bindings and collect.
Expr