Expand description
Helpers for element-wise UDFs used by map() expressions (soundex, levenshtein, crc32, xxhash64, array_flatten, array_repeat). These run at plan execution time when Polars invokes the closure.
Functions§
- apply_
acos - Apply acos to a float column.
- apply_
acosh - apply_
add_ months - add_months(date_column, n) - add n months to each date.
- apply_
aes_ decrypt - AES decrypt (PySpark aes_decrypt). Input hex(nonce||ciphertext). Returns null on failure.
- apply_
aes_ encrypt - AES encrypt (PySpark aes_encrypt). Key as string; uses AES-128-GCM. Output hex.
- apply_
array_ append - Append element to end of each list (PySpark array_append).
- apply_
array_ distinct_ first_ order - Distinct elements in list preserving first-occurrence order (PySpark array_distinct parity).
- apply_
array_ except - Elements in first array not in second (PySpark array_except).
- apply_
array_ flatten - Flatten list-of-lists to a single list per row (PySpark flatten).
- apply_
array_ insert - Insert element at 1-based position (PySpark array_insert). Negative pos = from end.
- apply_
array_ intersect - Elements in both arrays (PySpark array_intersect). Distinct.
- apply_
array_ prepend - Prepend element to start of each list (PySpark array_prepend).
- apply_
array_ repeat - Repeat each element n times (PySpark array_repeat). Supports both: (1) scalar column (string, int, etc.) - create array of n copies; (2) List column - repeat each element within the list.
- apply_
array_ union - Distinct elements from both arrays (PySpark array_union).
- apply_
arrays_ overlap - True if two arrays have any element in common (PySpark arrays_overlap).
- apply_
arrays_ zip - Zip two arrays into array of structs (PySpark arrays_zip).
- apply_
ascii - ASCII value of first character (PySpark ascii). Returns Int32.
- apply_
asin - Apply asin to a float column.
- apply_
asinh - apply_
assert_ true - Assert that all boolean values are true (PySpark assert_true). PySpark: returns null when input is true; throws when input is false or null. When err_msg is Some, it is used in the error message when assertion fails.
- apply_
atan - Apply atan to a float column.
- apply_
atan2 - Apply atan2(y, x) to two float columns.
- apply_
atanh - apply_
base64 - Base64 encode string bytes (PySpark base64). Input string UTF-8, output base64 string.
- apply_
bin - Apply bin: integer to binary string (PySpark bin).
- apply_
bit_ and - Apply bitwise AND for two integer columns (PySpark bit_and).
- apply_
bit_ count - Apply bit_count: count set bits in integer (PySpark bit_count).
- apply_
bit_ or - Apply bitwise OR for two integer columns (PySpark bit_or).
- apply_
bit_ xor - Apply bitwise XOR for two integer columns (PySpark bit_xor).
- apply_
bitmap_ construct_ agg - Build one bitmap from a list of bit positions (0..32767). Used after implode for bitmap_construct_agg.
- apply_
bitmap_ count - Count set bits in a bitmap (binary column). PySpark bitmap_count.
- apply_
bitmap_ or_ agg - Bitwise OR of a list of bitmaps (binary). Used after implode for bitmap_or_agg.
- apply_
bround - Apply bround (banker’s rounding) to a float column.
- apply_
cbrt - apply_
char - Int column to single-character string (PySpark char / chr). Valid codepoint only.
- apply_
conv - Apply conv (base conversion). String: parse from from_base, format in to_base. Int: format value in to_base.
- apply_
convert_ timezone - convert_timezone(source_tz, target_tz, ts_col) - convert between timezones. Same instant.
- apply_
cos - Apply cos (radians) to a float column.
- apply_
cosh - Hyperbolic and inverse hyperbolic / extra math.
- apply_
cot - Apply cot (1/tan) to a float column.
- apply_
crc32 - Apply CRC32 to string bytes (PySpark crc32).
- apply_
csc - Apply csc (1/sin) to a float column.
- apply_
date_ from_ unix_ date - date_from_unix_date(column) - days since epoch to date.
- apply_
dayname - dayname(date_col) - weekday name “Mon”,“Tue”,… (PySpark dayname).
- apply_
decode - Decode binary (hex string) to string (PySpark decode). Charset: UTF-8.
- apply_
degrees - Apply degrees (radians -> degrees) to a float column.
- apply_
encode - Encode string to binary (PySpark encode). Charset: UTF-8, hex. Returns hex string representation of bytes.
- apply_
expm1 - apply_
factorial - factorial(column) - element-wise factorial.
- apply_
find_ in_ set - Find 1-based index of str in comma-delimited set (PySpark find_in_set).
Returns 0 if not found or if str contains comma.
map_many:
columns[0]=str,columns[1]=set - apply_
format_ number - Format numeric column as string with fixed decimal places (PySpark format_number).
- apply_
format_ string - Format columns with printf-style format string (PySpark format_string / printf). Supports %s, %d, %i, %f, %g, %%. Null in any column yields null result.
- apply_
from_ csv - from_csv(str_col, schema) - parse CSV string to struct (PySpark from_csv). Minimal: split by comma, up to 32 columns.
- apply_
from_ unixtime - from_unixtime(column, format?) - seconds since epoch to formatted string.
- apply_
from_ utc_ timestamp - from_utc_timestamp(ts_col, tz) - interpret ts as UTC, convert to tz. Timestamps stored as UTC micros; instant unchanged.
- apply_
get - Get value for key from map, or null (PySpark get).
- apply_
getbit - Apply getbit: get bit at 0-based position (PySpark getbit).
- apply_
greatest2 - Element-wise max of two columns (for greatest). Supports Float64, Int64, String.
- apply_
hash_ one - hash one column (PySpark hash) - uses Murmur3 32-bit for parity with PySpark.
- apply_
hash_ struct - hash struct (multiple columns combined) - PySpark hash (Murmur3).
- apply_
hex - Apply hex: integer or string to hex string (PySpark hex).
- apply_
hour - hour(column) - extract hour (0-23). Accepts string timestamp column (#403).
- apply_
json_ array_ length - json_array_length(json_str, path) - length of JSON array at path (PySpark json_array_length).
- apply_
json_ object_ keys - json_object_keys(json_str) - return list of keys of JSON object (PySpark json_object_keys).
- apply_
json_ tuple - json_tuple(json_str, key1, key2, …) - extract keys from JSON; returns struct with one field per key (PySpark json_tuple).
- apply_
least2 - Element-wise min of two columns (for least).
- apply_
levenshtein - Levenshtein distance between two string columns (element-wise).
- apply_
log2 - apply_
log1p - apply_
log10 - apply_
make_ date - make_date(year, month, day) - three columns to date.
- apply_
make_ timestamp - make_timestamp(year, month, day, hour, min, sec, timezone?) - six columns to timestamp (micros). When timezone is Some(tz_str), components are interpreted as local time in that zone, then converted to UTC.
- apply_
map_ concat - Merge two map columns (PySpark map_concat). Last value wins for duplicate keys.
- apply_
map_ contains_ key - True if map contains key (PySpark map_contains_key).
- apply_
map_ from_ arrays - Build map (list of structs {key, value}) from two list columns. PySpark map_from_arrays.
- apply_
map_ zip_ to_ struct - Merge two maps into List(Struct{key, value1, value2}) for map_zip_with. Union of keys.
- apply_
md5 - MD5 hash of string bytes, return hex string (PySpark md5).
- apply_
minute - minute(column) - extract minute. Accepts string timestamp column (#403).
- apply_
months_ between - months_between(end, start, round_off) - returns fractional number of months. When round_off is true, rounds to 8 decimal places (PySpark default).
- apply_
next_ day - apply_
parse_ url - parse_url(url_str, part, key) - extract URL component (PySpark parse_url). When part is QUERY/QUERYSTRING and key is Some(k), returns the value for that query parameter only.
- apply_
pmod - pmod(dividend, divisor) - positive modulus.
- apply_
pyspark_ add - PySpark-style addition with string/number coercion for Python Column operators.
- apply_
pyspark_ divide - PySpark-style true division with string/number coercion for Python Column operators. Division by zero yields null (Spark/PySpark parity; issue #218).
- apply_
pyspark_ mod - PySpark-style modulo with string/number coercion for Python Column operators.
- apply_
pyspark_ multiply - PySpark-style multiplication with string/number coercion for Python Column operators.
- apply_
pyspark_ subtract - PySpark-style subtraction with string/number coercion for Python Column operators.
- apply_
radians - Apply radians (degrees -> radians) to a float column.
- apply_
rand_ with_ seed - Apply rand: uniform [0, 1) per row, with optional seed (PySpark rand).
- apply_
randn_ with_ seed - Apply randn: standard normal per row, with optional seed (PySpark randn).
- apply_
regexp_ extract_ lookaround - regexp_extract using fancy-regex when pattern has lookahead/lookbehind (PySpark parity). Polars str().extract() uses regex crate which does not support lookaround.
- apply_
regexp_ instr - Regexp instr: 1-based position of first regex match (PySpark regexp_instr). group_idx: 0 = full match, 1+ = capture group. Returns null if no match.
- apply_
rint - apply_
round - Apply round to given decimal places. Supports numeric and string columns (PySpark parity: string columns containing numeric values are implicitly cast to double then rounded).
- apply_
sec - Apply sec (1/cos) to a float column.
- apply_
second - second(column) - extract second. Accepts string timestamp column (#403).
- apply_
sequence - Build array [start, start+step, …] up to but not past stop (PySpark sequence). Input column is a struct with fields “0”=start, “1”=stop, “2”=step (step optional, default 1).
- apply_
sha1 - SHA1 hash of string bytes, return hex string (PySpark sha1).
- apply_
sha2 - SHA2 hash of string bytes, return hex string (PySpark sha2). bit_length 256 or 384 or 512.
- apply_
shift_ right_ unsigned - shiftRightUnsigned - logical right shift for i64 (PySpark shiftRightUnsigned).
- apply_
shuffle - Random permutation of list elements (PySpark shuffle). Uses rand::seq::SliceRandom.
- apply_
signum - Apply signum (-1, 0, or 1) to a numeric column.
- apply_
sin - Apply sin (radians) to a float column.
- apply_
sinh - apply_
soundex - Apply soundex to a string column; returns a new Column (Series).
- apply_
split_ part_ regex - Split string by regex and return 1-based part (for split_part with regex delimiter).
- apply_
split_ with_ limit - Split string by delimiter with at most
limitparts; remainder in last part (PySpark split with limit). Returns List(String). When limit <= 0, splits without limit. - apply_
str_ to_ map - Parse string to map: “k1:v1,k2:v2” -> List(Struct{key, value}) (PySpark str_to_map).
- apply_
string_ to_ boolean - Apply string-to-boolean cast. Handles string columns; passes through boolean; numeric types (0/0.0 -> false, non-zero -> true for PySpark parity #399); null for others (try_cast) or error (cast).
- apply_
string_ to_ date - Apply string-to-date cast. Handles string columns (accepts date and datetime strings, Spark parity); passes through date; casts datetime to date; others error (cast) or null (try_cast).
- apply_
string_ to_ date_ format - Apply string-to-date with optional format (PySpark to_date(col, format)). When format is None uses default parsing; when Some parses with given format.
- apply_
string_ to_ double - Apply string-to-double cast. Handles string columns: empty/invalid -> null (Spark parity); passes through numeric columns; others error (strict) or null.
- apply_
string_ to_ int - Apply string-to-int cast. Handles string columns: empty/invalid -> null (Spark parity); passes through int columns; others error (strict) or null.
- apply_
struct_ with_ field - Replace or add a struct field (PySpark withField). Used when Polars 0.53+ no longer accepts “*” in with_fields.
- apply_
tan - Apply tan (radians) to a float column.
- apply_
tanh - apply_
to_ binary - to_binary(expr, fmt): PySpark to_binary. fmt ‘utf-8’ => hex(utf8 bytes), ‘hex’ => validate and return hex. Returns hex string.
- apply_
to_ csv - to_csv(struct_col) - format struct as CSV string (PySpark to_csv). Minimal: uses struct cast to string.
- apply_
to_ timestamp_ format - to_timestamp(column, format?) / try_to_timestamp(column, format?) - string to timestamp. When format is Some, parse with that format (PySpark-style mapped to chrono); when None, use default. Strips whitespace from string values before parsing (PySpark parity #273). strict: true for to_timestamp (error on invalid), false for try_to_timestamp (null on invalid).
- apply_
to_ timestamp_ ltz_ format - Parse string as timestamp in local timezone, return UTC micros (PySpark to_timestamp_ltz).
- apply_
to_ timestamp_ ntz_ format - Parse string as timestamp without timezone (PySpark to_timestamp_ntz). Returns Datetime(_, None).
- apply_
to_ utc_ timestamp - to_utc_timestamp(ts_col, tz) - interpret ts as in tz, convert to UTC. For UTC-stored timestamps, instant unchanged.
- apply_
try_ add - try_add: returns null on overflow.
- apply_
try_ aes_ decrypt - try_aes_decrypt: same as aes_decrypt, returns null on failure (PySpark try_aes_decrypt).
- apply_
try_ multiply - try_multiply: returns null on overflow.
- apply_
try_ subtract - try_subtract: returns null on overflow.
- apply_
try_ to_ binary - try_to_binary: like to_binary but returns null on failure.
- apply_
typeof - typeof: return dtype as string (PySpark typeof).
- apply_
unbase64 - Base64 decode to string (PySpark unbase64). Output UTF-8 string; invalid decode → null.
- apply_
unhex - Apply unhex: hex string to binary/string (PySpark unhex).
- apply_
unix_ date - unix_date(column) - date to days since 1970-01-01.
- apply_
unix_ timestamp - unix_timestamp(column, format?) - parse string to seconds since epoch.
- apply_
url_ decode - url_decode(column) - percent-decode URL-encoded string (PySpark url_decode).
- apply_
url_ encode - url_encode(column) - percent-encode string for URL (PySpark url_encode).
- apply_
weekday - weekday(date_col) - 0=Mon, 6=Sun (PySpark weekday).
- apply_
xxhash64 - Apply XXH64 hash (PySpark xxhash64).
- apply_
zip_ arrays_ to_ struct - Zip two array columns into List(Struct{left, right}) for zip_with. Shorter padded with null.
- series_
rand_ n - Build a Series of
nuniform [0, 1) values with optional seed (for with_column PySpark-like rand). - series_
randn_ n - Build a Series of
nstandard normal values with optional seed (for with_column PySpark-like randn).