Skip to main content Module functions Copy item path Source ChainedWhenBuilder Builder for an additional when-then clause (returned by ThenBuilder::when). SortOrder Sort order specification for use in orderBy/sort. Holds expr + direction + null placement. ThenBuilder Builder for chaining when-then clauses before finalizing with otherwise WhenBuilder Builder for when-then-otherwise expressions abs Absolute value (PySpark abs) acos Arc cosine (PySpark acos) acosh Inverse hyperbolic cosine (PySpark acosh). add_months Add n months to date column (PySpark add_months). aes_decrypt AES decrypt (PySpark aes_decrypt). Input hex(nonce||ciphertext). aes_encrypt AES encrypt (PySpark aes_encrypt). Key as string; AES-128-GCM. aggregate Array fold/aggregate (PySpark aggregate). Simplified: zero + sum(list elements). any_value Any value from the group (PySpark any_value). Use in groupBy.agg(). ignorenulls reserved for API compatibility. approx_count_distinct Approximate count distinct (PySpark approx_count_distinct). Use in groupBy.agg(). rsd reserved for API compatibility; Polars uses exact n_unique. approx_percentile Approximate percentile (PySpark approx_percentile). Maps to quantile; percentage in 0.0..=1.0. accuracy reserved for API compatibility. array Create an array column from multiple columns (PySpark array).
With no arguments, returns a column of empty arrays (one per row); PySpark parity. array_agg Collect to array (PySpark array_agg). array_append Append element to end of list (PySpark array_append). array_compact Remove null elements from list (PySpark array_compact). array_contains Check if list contains value (PySpark array_contains). array_distinct Distinct elements in list (PySpark array_distinct). array_except Elements in first array not in second (PySpark array_except). array_exists True if any list element satisfies the predicate (PySpark exists). array_filter Filter list elements by predicate (PySpark filter). array_flatten Flatten list of lists to one list (PySpark flatten). Not implemented. array_forall True if all list elements satisfy the predicate (PySpark forall). array_insert Insert element at 1-based position (PySpark array_insert). array_intersect Elements in both arrays (PySpark array_intersect). array_join Join list of strings with separator (PySpark array_join). array_max Maximum element in list (PySpark array_max). array_mean Mean of list elements (PySpark aggregate avg). array_min Minimum element in list (PySpark array_min). array_position 1-based index of first occurrence of value in list, or 0 if not found (PySpark array_position).
Implemented via Polars list.eval with col(“”) as element. array_prepend Prepend element to start of list (PySpark array_prepend). array_remove New list with all elements equal to value removed (PySpark array_remove).
Implemented via Polars list.eval + list.drop_nulls. array_repeat Repeat each element n times (PySpark array_repeat). Not implemented: would require list.eval with dynamic repeat. array_size Number of elements in list (PySpark size / array_size). Returns Int32. array_slice Slice list from 1-based start with optional length (PySpark slice). array_sort Sort list elements (PySpark array_sort). array_sum Sum of list elements (PySpark aggregate sum). array_transform Transform list elements by expression (PySpark transform). array_union Distinct elements from both arrays (PySpark array_union). arrays_overlap True if two arrays have any element in common (PySpark arrays_overlap). arrays_zip Zip arrays into array of structs (PySpark arrays_zip). asc Ascending sort, nulls first (Spark default for ASC). asc_nulls_first Ascending sort, nulls first. asc_nulls_last Ascending sort, nulls last. ascii ASCII value of first character (PySpark ascii). Returns Int32. asin Arc sine (PySpark asin) asinh Inverse hyperbolic sine (PySpark asinh). assert_true Assert that all boolean values are true; errors otherwise (PySpark assert_true).
When err_msg is Some, it is used in the error message when assertion fails. atan Arc tangent (PySpark atan) atan2 Two-argument arc tangent atan2(y, x) in radians (PySpark atan2) atanh Inverse hyperbolic tangent (PySpark atanh). avg Average aggregation base64 Base64 encode string bytes (PySpark base64). bin Convert integer to binary string (PySpark bin). bit_and Bitwise AND of two integer/boolean columns (PySpark bit_and). bit_count Count of set bits in the integer representation (PySpark bit_count). bit_get Alias for getbit (PySpark bit_get). bit_length Bit length of string (bytes * 8) (PySpark bit_length). bit_or Bitwise OR of two integer/boolean columns (PySpark bit_or). bit_xor Bitwise XOR of two integer/boolean columns (PySpark bit_xor). bitmap_bit_position Map integral value (0–32767) to bit position for bitmap aggregates (PySpark bitmap_bit_position). bitmap_bucket_number Bucket number for distributed bitmap (PySpark bitmap_bucket_number). value / 32768. bitmap_construct_agg Aggregate: bitwise OR of bit positions into one bitmap binary (PySpark bitmap_construct_agg).
Use in group_by(…).agg([bitmap_construct_agg(col)]). bitmap_count Count set bits in a bitmap binary column (PySpark bitmap_count). bitmap_or_agg Aggregate: bitwise OR of bitmap binary column (PySpark bitmap_or_agg). bitwise_not Bitwise NOT of an integer/boolean column (PySpark bitwise_not / bitwiseNOT). bool_and Boolean AND across group (PySpark bool_and). Use in groupBy.agg(); column should be boolean. bround Banker’s rounding - round half to even (PySpark bround). btrim Trim leading and trailing chars (PySpark btrim). trim_str defaults to whitespace. call_udf Call a registered UDF by name. PySpark: F.call_udf(udfName, *cols).
Requires a session (set by get_or_create). Raises if UDF not found. cardinality Cardinality: number of elements in array (PySpark cardinality). Alias for size/array_size. cast Cast column to the given type (PySpark cast). Fails on invalid conversion.
String-to-boolean uses custom parsing (“true”/“false”/“1”/“0”) since Polars does not support Utf8->Boolean.
String-to-date accepts date and datetime strings (e.g. “2025-01-01 10:30:00” truncates to date) for Spark parity. cbrt Cube root (PySpark cbrt). ceil Ceiling (PySpark ceil) ceiling Alias for ceil. PySpark ceiling. char Int to single-character string (PySpark char). Valid codepoint only. char_length Length of string in characters (PySpark char_length). Alias of length(). character_length Length of string in characters (PySpark character_length). Alias of length(). chr Alias for char (PySpark chr). coalesce Coalesce - returns the first non-null value from multiple columns. col Get a column by name collect_list Collect column values into list per group (PySpark collect_list). Use in groupBy.agg(). collect_set Collect distinct column values into list per group (PySpark collect_set). Use in groupBy.agg(). concat Concatenate string columns without separator (PySpark concat) concat_ws Concatenate string columns with separator (PySpark concat_ws) contains True if string contains substring (literal). PySpark contains. conv Base conversion (PySpark conv). num from from_base to to_base. convert_timezone Convert timestamp between timezones (PySpark convert_timezone). corr Pearson correlation aggregation (PySpark corr). Module-level; use in groupBy.agg() with two columns. corr_expr Pearson correlation aggregation (PySpark corr). Returns Expr for use in groupBy.agg(). cos Cosine in radians (PySpark cos) cosh Hyperbolic cosine (PySpark cosh). cot Cotangent: 1/tan (PySpark cot). count Count aggregation count_distinct Count distinct aggregation (PySpark countDistinct) count_if Count rows where condition is true (PySpark count_if). Use in groupBy.agg(); column should be boolean (true=1, false=0). covar_pop Population covariance aggregation (PySpark covar_pop). Module-level; use in groupBy.agg() with two columns. covar_pop_expr Population covariance aggregation (PySpark covar_pop). Returns Expr for use in groupBy.agg(). covar_samp_expr Sample covariance aggregation (PySpark covar_samp). Returns Expr for use in groupBy.agg(). crc32 CRC32 of string bytes (PySpark crc32). Not implemented: requires element-wise UDF. create_map Build a map column from alternating key/value expressions (PySpark create_map).
Returns List(Struct{key, value}) using Polars as_struct and concat_list.
With no args (or empty slice), returns a column of empty maps per row (PySpark parity #275). csc Cosecant: 1/sin (PySpark csc). cume_dist Cumulative distribution in partition: row_number / count. Window is applied. curdate Alias for current_date (PySpark curdate). current_catalog Current catalog name stub (PySpark current_catalog). current_database Current database/schema name stub (PySpark current_database). current_date Current date (evaluation time). PySpark current_date. current_schema Current schema name stub (PySpark current_schema). current_timestamp Current timestamp (evaluation time). PySpark current_timestamp. current_timezone Current session timezone (PySpark current_timezone). Default “UTC”. Returns literal column. current_user Current user stub (PySpark current_user). date_add Add n days to date column (PySpark date_add). date_diff Alias for datediff (PySpark date_diff). date_diff(end, start). date_format Format date/datetime as string (PySpark date_format). Accepts PySpark/Java SimpleDateFormat style (e.g. “yyyy-MM”) and converts to chrono strftime internally. date_from_unix_date Days since epoch to date (PySpark date_from_unix_date). date_part Alias for extract (PySpark date_part). date_sub Subtract n days from date column (PySpark date_sub). date_trunc Alias for trunc (PySpark date_trunc). dateadd Alias for date_add (PySpark dateadd). datediff Number of days between two date columns (PySpark datediff). datepart Alias for extract (PySpark datepart). day Extract day of month from datetime column (PySpark day) dayname Weekday name “Mon”,“Tue”,… (PySpark dayname). dayofmonth Alias for day. PySpark dayofmonth. dayofweek Extract day of week: 1=Sunday..7=Saturday (PySpark dayofweek). dayofyear Extract day of year (1-366) (PySpark dayofyear). days Interval of n days (PySpark days). For use in date_add, timestampadd, etc. decode Decode binary (hex string) to string (PySpark decode). Charset: UTF-8. degrees Convert radians to degrees (PySpark degrees) dense_rank Dense rank window function (no gaps). Use with .over(partition_by). desc Descending sort, nulls last (Spark default for DESC). desc_nulls_first Descending sort, nulls first. desc_nulls_last Descending sort, nulls last. e Constant e = 2.718… (PySpark e). element_at Get element at 1-based index (PySpark element_at). elt Return column at 1-based index (PySpark elt). elt(2, a, b, c) returns b. encode Encode string to binary (PySpark encode). Charset: UTF-8. Returns hex string. endswith True if string ends with suffix (PySpark endswith). equal_null Null-safe equality: true if both null or both equal (PySpark equal_null). Alias for eq_null_safe. every Alias for bool_and (PySpark every). Use in groupBy.agg(). exp Exponential (PySpark exp) explode Explode list into one row per element (PySpark explode). explode_outer Explode; null/empty yields one row with null (PySpark explode_outer). expm1 exp(x) - 1 (PySpark expm1). extract Extract field from date/datetime (PySpark extract). field: year, month, day, hour, minute, second, quarter, week, dayofweek, dayofyear. factorial Factorial n! (PySpark factorial). n in 0..=20; null for negative or overflow. find_in_set 1-based index of str in comma-delimited set (PySpark find_in_set). 0 if not found or str contains comma. first First value in group (PySpark first). Use in groupBy.agg(). ignorenulls: when true, first non-null; Polars 0.45 uses .first() only (ignorenulls reserved for API compatibility). first_value First value in partition (PySpark first_value). Use with .over(partition_by). floor Floor (PySpark floor) format_number Format numeric as string with fixed decimal places (PySpark format_number). format_string Printf-style format (PySpark format_string). Supports %s, %d, %i, %f, %g, %%. from_csv Parse CSV string to struct (PySpark from_csv). Minimal implementation. from_json Parse string column as JSON into struct (PySpark from_json). from_unixtime Convert seconds since epoch to formatted string (PySpark from_unixtime). from_utc_timestamp Interpret timestamp as UTC, convert to tz (PySpark from_utc_timestamp). get Get value for key from map, or null (PySpark get). get_json_object Extract JSON path from string column (PySpark get_json_object). getbit Get bit at 0-based position (PySpark getbit). greatest Greatest of the given columns per row (PySpark greatest). Uses element-wise UDF. grouping Grouping set marker (PySpark grouping). Stub: returns 0 (no GROUPING SETS in robin-sparkless). grouping_id Grouping set id (PySpark grouping_id). Stub: returns 0. hash Hash of column values (PySpark hash). Uses Murmur3 32-bit for parity with PySpark. hex Convert to hex string (PySpark hex). hour Extract hour from datetime column (PySpark hour). hours Interval of n hours (PySpark hours). hypot sqrt(xx + y y) (PySpark hypot). ifnull Alias for nvl. PySpark ifnull. ilike Case-insensitive LIKE. PySpark ilike.
When escape_char is Some(esc), esc + char treats that char as literal. initcap Title case (PySpark initcap) inline Explode list of structs into rows; struct fields become columns after unnest (PySpark inline).
Returns the exploded struct column; use unnest to expand struct fields to columns. inline_outer Like inline but null/empty yields one row of nulls (PySpark inline_outer). input_file_name Stub input file name - empty string (PySpark input_file_name). instr Find substring position 1-based; 0 if not found (PySpark instr). isin Check if column values are in the given list (PySpark isin). Uses Polars is_in. isin_i64 Check if column values are in the given i64 slice (PySpark isin with literal list). isin_str Check if column values are in the given string slice (PySpark isin with literal list). isnan True where the float value is NaN (PySpark isnan). isnotnull True if column is not null. PySpark isnotnull. isnull True if column is null. PySpark isnull. json_array_length Length of JSON array at path (PySpark json_array_length). json_object_keys Keys of JSON object (PySpark json_object_keys). Returns list of strings. json_tuple Extract keys from JSON as struct (PySpark json_tuple). keys: e.g. [“a”, “b”]. kurtosis Kurtosis aggregation (PySpark kurtosis). Fisher definition, bias=true. Use in groupBy.agg(). lag Lag: value from n rows before in partition. Use with .over(partition_by). last_day Last day of month for date column (PySpark last_day). last_value Last value in partition (PySpark last_value). Use with .over(partition_by). lcase Alias for lower. PySpark lcase. lead Lead: value from n rows after in partition. Use with .over(partition_by). least Least of the given columns per row (PySpark least). Uses element-wise UDF. left Leftmost n characters (PySpark left). length String length in characters (PySpark length) levenshtein Levenshtein distance (PySpark levenshtein). Not implemented: requires element-wise UDF. like SQL LIKE pattern (% any, _ one char). PySpark like.
When escape_char is Some(esc), esc + char treats that char as literal. lit_bool lit_f64 lit_i32 Create a literal column from a value lit_i64 lit_null Typed null literal column. Returns Err on unknown type name.
See parse_type_name for supported type strings (e.g. "boolean", "string", "bigint"). lit_str ln Alias for log (natural log). PySpark ln. localtimestamp Alias for current_timestamp (PySpark localtimestamp). locate Find substring position 1-based, starting at pos (PySpark locate). 0 if not found. log Natural logarithm (PySpark log with one arg) log2 Base-2 log (PySpark log2). log1p log(1 + x) (PySpark log1p). log10 Base-10 log (PySpark log10). log_with_base Logarithm with given base (PySpark log(col, base)). base must be positive and not 1. lower Convert string column to lowercase (PySpark lower) lpad Left-pad string to length with pad char (PySpark lpad). ltrim Trim leading whitespace (PySpark ltrim) make_date Build date from year, month, day columns (PySpark make_date). make_dt_interval Day-time interval: days, hours, minutes, seconds (PySpark make_dt_interval). All optional; 0 for omitted. make_interval Create interval duration (PySpark make_interval). Optional args; 0 for omitted. make_timestamp make_timestamp(year, month, day, hour, min, sec, timezone?) - six columns to timestamp (PySpark make_timestamp).
When timezone is Some(tz), components are interpreted as local time in that zone, then converted to UTC. make_timestamp_ntz Alias for make_timestamp (PySpark make_timestamp_ntz - no timezone). make_ym_interval Year-month interval (PySpark make_ym_interval). Polars has no native YM type; return months as Int32 (years*12 + months). map_concat Merge two map columns (PySpark map_concat). Last value wins for duplicate keys. map_contains_key True if map contains key (PySpark map_contains_key). map_entries Return map as list of structs {key, value} (PySpark map_entries). map_filter Filter map entries by predicate (PySpark map_filter). map_filter_value_gt Convenience: map_filter with value > threshold predicate. map_from_arrays Build map from two array columns keys and values (PySpark map_from_arrays). Implemented via UDF. map_from_entries Array of structs {key, value} to map (PySpark map_from_entries). map_keys Extract keys from a map column (PySpark map_keys). Map is List(Struct{key, value}). map_values Extract values from a map column (PySpark map_values). map_zip_with Merge two maps by key with merge function (PySpark map_zip_with). map_zip_with_coalesce Convenience: map_zip_with with coalesce(value1, value2) merge. mask Mask string: replace upper/lower/digit/other with given chars (PySpark mask). max Maximum aggregation max_by Value of value_col in the row where ord_col is maximum (PySpark max_by). Use in groupBy.agg(). md5 MD5 hash of string bytes, return hex string (PySpark md5). mean Alias for avg (PySpark mean). median Median aggregation. PySpark median. min Minimum aggregation min_by Value of value_col in the row where ord_col is minimum (PySpark min_by). Use in groupBy.agg(). minute Extract minute from datetime column (PySpark minute). minutes Interval of n minutes (PySpark minutes). mode Mode aggregation - most frequent value. PySpark mode. monotonically_increasing_id Stub monotonically_increasing_id - constant 0 (PySpark monotonically_increasing_id).
Note: differs from PySpark which is unique per-row; see PYSPARK_DIFFERENCES.md. month Extract month from datetime column (PySpark month) months Interval of n months (PySpark months). Approximated as 30*n days. months_between Months between end and start dates as fractional (PySpark months_between).
When round_off is true, rounds to 8 decimal places (PySpark default). named_struct Create struct with explicit field names (PySpark named_struct). Pairs of (name, column). nanvl Replace NaN with value. PySpark nanvl. negate Unary minus / negate (PySpark negate, negative). negative Alias for negate. PySpark negative. next_day Next date that is the given weekday (e.g. “Mon”) (PySpark next_day). now Alias for current_timestamp (PySpark now). nth_value Nth value in partition by order (1-based n). Window is applied; do not call .over() again. ntile Ntile: bucket 1..n by rank within partition. Window is applied. nullif Return null if column equals value, else column. PySpark nullif. nvl Alias for coalesce(col, value). PySpark nvl / ifnull. nvl2 Three-arg null replacement: if col1 is not null then col2 else col3. PySpark nvl2. octet_length Length of string in bytes (PySpark octet_length). overlay Replace substring at 1-based position (PySpark overlay). replace is literal. parse_type_name Parse PySpark-like type name to Polars DataType.
Decimal(precision, scale) is mapped to Float64 for schema parity (Polars dtype-decimal not enabled). parse_url Parse URL and extract part: PROTOCOL, HOST, PATH, etc. (PySpark parse_url).
When key is Some(k) and part is QUERY/QUERYSTRING, returns the value for that query parameter only. percent_rank Percent rank in partition: (rank - 1) / (count - 1). Window is applied. percentile_approx Approximate percentile (PySpark percentile_approx). Alias for approx_percentile. pi Constant pi = 3.14159… (PySpark pi). pmod Positive modulus (PySpark pmod). posexplode Explode list with position (PySpark posexplode). Returns (pos_column, value_column).
pos is 1-based; implemented via list.eval(cum_count()).explode() and explode(). posexplode_outer Posexplode with null preservation (PySpark posexplode_outer). position Position of substring in column (PySpark position). Same as instr; (substr, col) argument order. positive Unary plus - no-op, returns column as-is (PySpark positive). pow Power (PySpark pow) power Alias for pow. PySpark power. printf Alias for format_string (PySpark printf). quarter Extract quarter (1-4) from date/datetime (PySpark quarter). radians Convert degrees to radians (PySpark radians) raise_error Raise an error when evaluated (PySpark raise_error). Always fails with the given message. rand Random uniform [0, 1) per row, with optional seed (PySpark rand).
When added via with_column, generates one distinct value per row (PySpark-like). randn Random standard normal per row, with optional seed (PySpark randn).
When added via with_column, generates one distinct value per row (PySpark-like). rank Rank window function (ties same rank, gaps). Use with .over(partition_by). regexp Alias for rlike (PySpark regexp). regexp_count Count of non-overlapping regex matches (PySpark regexp_count). regexp_extract Extract first match of regex (PySpark regexp_extract). group_index 0 = full match. regexp_extract_all Extract all matches of regex (PySpark regexp_extract_all). regexp_instr 1-based position of first regex match (PySpark regexp_instr). regexp_like Check if string matches regex (PySpark regexp_like / rlike). regexp_replace Replace first match of regex (PySpark regexp_replace) regexp_substr First substring matching regex (PySpark regexp_substr). Null if no match. regr_avgx_expr Regression: average of x (PySpark regr_avgx). regr_avgy_expr Regression: average of y (PySpark regr_avgy). regr_count_expr Regression: count of (y, x) pairs where both non-null (PySpark regr_count). regr_intercept_expr Regression intercept: avg_y - slope*avg_x (PySpark regr_intercept). regr_r2_expr Regression R-squared (PySpark regr_r2). regr_slope_expr Regression slope: cov_samp(y,x)/var_samp(x) (PySpark regr_slope). regr_sxx_expr Regression: sum((x - avg_x)^2) (PySpark regr_sxx). regr_sxy_expr Regression: sum((x - avg_x)(y - avg_y)) (PySpark regr_sxy). regr_syy_expr Regression: sum((y - avg_y)^2) (PySpark regr_syy). repeat Repeat string n times (PySpark repeat). replace Replace all occurrences of search with replacement (literal). PySpark replace. reverse Reverse string (PySpark reverse). right Rightmost n characters (PySpark right). rint Round to nearest integer (PySpark rint). rlike Alias for regexp_like. PySpark rlike / regexp. round Round (PySpark round) row_number Row number window function (1, 2, 3 by order within partition).
Use with .over(partition_by) after ranking by an order column. rpad Right-pad string to length with pad char (PySpark rpad). rtrim Trim trailing whitespace (PySpark rtrim) schema_of_csv Schema of CSV string (PySpark schema_of_csv). Returns literal schema string; minimal stub. schema_of_json Schema of JSON string (PySpark schema_of_json). Returns literal schema string; minimal stub. sec Secant: 1/cos (PySpark sec). second Extract second from datetime column (PySpark second). sequence Generate array of numbers from start to stop (inclusive) with optional step (PySpark sequence).
step defaults to 1. sha1 SHA1 hash of string bytes, return hex string (PySpark sha1). sha2 SHA2 hash; bit_length 256, 384, or 512 (PySpark sha2). shift_left Bitwise left shift (PySpark shiftLeft). col << n. shift_right Bitwise signed right shift (PySpark shiftRight). col >> n. shift_right_unsigned Bitwise unsigned right shift (PySpark shiftRightUnsigned). Logical shift for Long. shuffle Random permutation of list elements (PySpark shuffle). sign Alias for signum (PySpark sign). signum Sign of the number: -1, 0, or 1 (PySpark signum) sin Sine in radians (PySpark sin) sinh Hyperbolic sine (PySpark sinh). size Alias for array_size (PySpark size). skewness Skewness aggregation (PySpark skewness). bias=true. Use in groupBy.agg(). soundex Soundex code (PySpark soundex). Not implemented: requires element-wise UDF. spark_partition_id Stub partition id - always 0 (PySpark spark_partition_id). split Split string by delimiter (PySpark split). Optional limit: at most that many parts (remainder in last). split_part Split by delimiter and return 1-based part (PySpark split_part). sqrt Square root (PySpark sqrt) stack Stack columns into struct (PySpark stack). Alias for struct_. startswith True if string starts with prefix (PySpark startswith). std Alias for stddev (PySpark std). stddev Standard deviation (sample) aggregation (PySpark stddev / stddev_samp) stddev_pop Population standard deviation (ddof=0). PySpark stddev_pop. stddev_samp Sample standard deviation (ddof=1). Alias for stddev. PySpark stddev_samp. str_to_map Parse string to map (PySpark str_to_map). Default delims: “,” and “:”. struct_ Create struct from columns using column names as field names (PySpark struct). substr Alias for substring. PySpark substr. substring Substring with 1-based start (PySpark substring semantics) substring_index Substring before/after nth delimiter (PySpark substring_index). sum Sum aggregation tan Tangent in radians (PySpark tan) tanh Hyperbolic tangent (PySpark tanh). timestamp_micros Convert microseconds since epoch to timestamp (PySpark timestamp_micros). timestamp_millis Convert milliseconds since epoch to timestamp (PySpark timestamp_millis). timestamp_seconds Convert seconds since epoch to timestamp (PySpark timestamp_seconds). timestampadd Add amount of unit to timestamp (PySpark timestampadd). timestampdiff Difference between timestamps in unit (PySpark timestampdiff). to_binary Convert to binary (PySpark to_binary). fmt: ‘utf-8’, ‘hex’. to_char Cast to string, optionally with format for datetime (PySpark to_char, to_varchar).
When format is Some, uses date_format for datetime columns (PySpark format → chrono strftime); otherwise cast to string.
Returns Err if the cast to string fails (invalid type name or unsupported column type). to_csv Format struct as CSV string (PySpark to_csv). Minimal implementation. to_date Cast or parse to date (PySpark to_date). When format is None: cast date/datetime to date, parse string with default formats. When format is Some: parse string with given format. to_degrees Alias for degrees. PySpark toDegrees. to_json Serialize struct column to JSON string (PySpark to_json). to_number Cast to numeric (PySpark to_number). Uses Double. Format parameter reserved for future use.
Returns Err if the cast to double fails (invalid type name or unsupported column type). to_radians Alias for radians. PySpark toRadians. to_timestamp Cast to timestamp, or parse with format when provided (PySpark to_timestamp).
When format is None, parses string columns with default format “%Y-%m-%d %H:%M:%S” (PySpark parity #273). to_timestamp_ltz Parse as timestamp in local timezone, return UTC (PySpark to_timestamp_ltz). to_timestamp_ntz Parse as timestamp without timezone (PySpark to_timestamp_ntz). Returns Datetime(_, None). to_unix_timestamp Alias for unix_timestamp. to_utc_timestamp Interpret timestamp as in tz, convert to UTC (PySpark to_utc_timestamp). to_varchar Alias for to_char (PySpark to_varchar). transform_keys Transform map keys by expr (PySpark transform_keys). transform_values Transform map values by expr (PySpark transform_values). translate Character-by-character translation (PySpark translate). trim Trim leading and trailing whitespace (PySpark trim) trunc Truncate date/datetime to unit (PySpark trunc). try_add Add that returns null on overflow (PySpark try_add). Uses checked arithmetic. try_aes_decrypt Try AES decrypt (PySpark try_aes_decrypt). Returns null on failure. try_avg Average aggregation; null on invalid (PySpark try_avg). Use in groupBy.agg(). Maps to mean; reserved for API. try_cast Cast column to the given type, returning null on invalid conversion (PySpark try_cast).
String-to-boolean uses custom parsing (“true”/“false”/“1”/“0”) since Polars does not support Utf8->Boolean.
String-to-date accepts date and datetime strings; invalid strings become null. try_divide Division that returns null on divide-by-zero (PySpark try_divide). try_element_at Element at index, null if out of bounds (PySpark try_element_at). Same as element_at for lists. try_multiply Multiply that returns null on overflow (PySpark try_multiply). try_subtract Subtract that returns null on overflow (PySpark try_subtract). try_sum Sum aggregation; null on overflow (PySpark try_sum). Use in groupBy.agg(). Polars sum does not overflow; reserved for API. try_to_binary Try convert to binary; null on failure (PySpark try_to_binary). try_to_number Cast to numeric, null on invalid (PySpark try_to_number). Format parameter reserved for future use.
Returns Err if the try_cast setup fails (invalid type name); column values that cannot be parsed become null. try_to_timestamp Cast to timestamp, null on invalid, or parse with format when provided (PySpark try_to_timestamp).
When format is None, parses string columns with default format (null on invalid). #273 typeof_ Data type of column as string (PySpark typeof). Constant per column from schema. ucase Alias for upper. PySpark ucase. unbase64 Base64 decode to string (PySpark unbase64). Invalid decode → null. unhex Convert hex string to binary/string (PySpark unhex). unix_date Date to days since 1970-01-01 (PySpark unix_date). unix_micros Timestamp to microseconds since epoch (PySpark unix_micros). unix_millis Timestamp to milliseconds since epoch (PySpark unix_millis). unix_seconds Timestamp to seconds since epoch (PySpark unix_seconds). unix_timestamp Parse string timestamp to seconds since epoch (PySpark unix_timestamp). format defaults to yyyy-MM-dd HH:mm:ss. unix_timestamp_now Current Unix timestamp in seconds (PySpark unix_timestamp with no args). upper Convert string column to uppercase (PySpark upper) url_decode Percent-decode URL-encoded string (PySpark url_decode). url_encode Percent-encode string for URL (PySpark url_encode). user User stub (PySpark user). var_pop Population variance (ddof=0). PySpark var_pop. var_samp Sample variance (ddof=1). Alias for variance. PySpark var_samp. variance Variance (sample) aggregation (PySpark variance / var_samp) version Session/library version string (PySpark version). weekday Weekday 0=Mon, 6=Sun (PySpark weekday). weekofyear Extract ISO week of year (1-53) (PySpark weekofyear). when PySpark-style conditional expression builder. when_then_otherwise_null Two-arg when(condition, value): returns value where condition is true, null otherwise (PySpark when(cond, val)). width_bucket Assign value to histogram bucket (PySpark width_bucket). Returns 0 if v < min_val, num_bucket+1 if v >= max_val. xxhash64 XXH64 hash (PySpark xxhash64). Not implemented: requires element-wise UDF. year Extract year from datetime column (PySpark year) years Interval of n years (PySpark years). Approximated as 365*n days. zip_with Zip two arrays element-wise with merge function (PySpark zip_with). zip_with_coalesce Convenience: zip_with with coalesce(left, right) merge.