Expand description
A re-implementation of Spark functions
Functions§
- abs
- Computes the absolute value.
- acos
- Computes inverse cosine of the input column.
- acosh
- Computes inverse hyperbolic cosine of the input column.
- add_
months - Returns the date that is months months after start.
- aes_
decrypt - Returns a decrypted value of input using AES in mode with padding
- aes_
encrypt - Returns a encrypted value of input using AES in mode with the specified padding
- any_
value - Returns some value of col for a group of rows.
- approx_
count_ distinct - Returns a new Column for approximate distinct count of column col.
- array
- Creates a new array column.
- array_
agg - Returns a list of objects with duplicates.
- array_
append - Returns an array of the elements in col1 along with the added element in col2 at the last of the array.
- array_
compact - Removes null values from the array.
- array_
contains - Returns null if the array is null, true if the array contains the given value, and false otherwise.
- array_
distinct - Removes duplicate values from the array.
- array_
except - Returns an array of the elements in col1 but not in col2, without duplicates.
- array_
insert - adds an item into a given array at a specified array index.
- array_
intersect - Returns an array of the elements in the intersection of col1 and col2, without duplicates.
- array_
join - Concatenates the elements of column using the delimiter.
- array_
max - Returns the maximum value of the array.
- array_
min - Returns the minimum value of the array.
- array_
position - Locates the position of the first occurrence of the given value in the given array.
- array_
prepend - Returns an array containing element as well as all elements from array.
- array_
remove - Remove all elements that equal to element from the given array.
- array_
repeat - Creates an array containing a column repeated count times.
- array_
size - Returns the total number of elements in the array.
- array_
sort - array_
union - Returns an array of the elements in the union of col1 and col2, without duplicates.
- arrays_
overlap - Returns true if the arrays contain any common non-null element; if not, returns null if both the arrays are non-empty and any of them contains a null element; returns false otherwise.
- arrays_
zip - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
- asc
- Returns a sort expression based on the ascending order of the given column name.
- asc_
nulls_ first - Returns a sort expression based on the ascending order of the given column name, and null values return before non-null values.
- asc_
nulls_ last - Returns a sort expression based on the ascending order of the given column name, and null values appear after non-null values.
- ascii
- Computes the numeric value of the first character of the string column.
- asin
- Computes inverse sine of the input column.
- asinh
- Computes inverse hyperbolic sine of the input column.
- assert_
true - Returns null if the input column is true; throws an exception with the provided error message otherwise
- atan
- Compute inverse tangent of the input column.
- atan2
- Computes inverse hyperbolic tangent of the input columns.
- atanh
- Computes inverse hyperbolic tangent of the input column.
- avg
- Returns the average of the values in a group.
- base64
- Computes the BASE64 encoding of a binary column and returns it as a string column.
- bin
- Returns the string representation of the binary value of the given column.
- bit_and
- Returns the bitwise AND of all non-null input values, or null if none.
- bit_
count - Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.
- bit_get
- Returns the value of the bit (0 or 1) at the specified position.
- bit_
length - Calculates the bit length for the specified string column.
- bit_or
- Returns the bitwise OR of all non-null input values, or null if none.
- bit_xor
- Returns the bitwise XOR of all non-null input values, or null if none.
- bitmap_
bit_ position - Returns the bit position for the given input column.
- bitmap_
bucket_ number - Returns the bucket number for the given input column.
- bitmap_
construct_ agg - Returns a bitmap with the positions of the bits set from all the values from the input column.
- bitmap_
count - Returns the number of set bits in the input bitmap.
- bitmap_
or_ agg - Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column.
- bitwise_
not - Computes bitwise not.
- bool_
and - Returns true if all values of col are true.
- bool_or
- Returns true if at least one value of col is true.
- broadcast
- Marks a DataFrame as small enough for use in broadcast joins.
- bround
- Round the given value to scale decimal places using HALF_EVEN rounding mode if scale >= 0 or at integral part when scale < 0.
- btrim
- bucket
- A transform for any type that partitions by a hash of the input column.
- cardinality
- Returns the length of the array or map stored in the column.
- cbrt
- Computes the cube-root of the given value.
- ceil
- Computes the ceiling of the given value.
- ceiling
- Computes the ceiling of the given value.
- char
- Returns the ASCII character having the binary equivalent to col.
- char_
length - Returns the character length of string data or number of bytes of binary data.
- character_
length - Returns the character length of string data or number of bytes of binary data.
- coalesce
- Returns the first column that is not null.
- col
- Returns a Column based on the given column name.
- collect_
list - Returns a list of objects with duplicates.
- collect_
set - Returns a set of objects with duplicate elements eliminated.
- column
- Returns a Column based on the given column name.
- concat
- Concatenates multiple input columns together into a single column.
- concat_
ws - contains
- Returns a boolean.
- conv
- Convert a number in a string column from one base to another.
- convert_
timezone - Converts the timestamp without time zone sourceTs from the sourceTz time zone to targetTz.
- corr
- Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.
- cos
- Computes cosine of the input column.
- cosh
- Computes hyperbolic cosine of the input column.
- cot
- Computes cotangent of the input column.
- count
- Returns the number of items in a group.
- count_
distinct - count_
if - Returns the number of TRUE values for the col.
- count_
min_ sketch - Returns a count-min sketch of a column with the given esp, confidence and seed.
- covar_
pop - Returns a new Column for the population covariance of col1 and col2.
- covar_
samp - Returns a new Column for the sample covariance of col1 and col2.
- crc32
- Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
- create_
map - Create a new map column.
- csc
- Computes cosecant of the input column.
- cume_
dist - Returns the cumulative distribution of values within a window partition, i.e.
- curdate
- Returns the current date at the start of query evaluation as a DateType column.
- current_
catalog - Returns the current catalog.
- current_
database - Returns the current database.
- current_
date - Returns the current date at the start of query evaluation as a DateType column.
- current_
schema - Returns the current database.
- current_
timestamp - Returns the current timestamp at the start of query evaluation as a TimestampType column.
- current_
timezone - Returns the current session local timezone.
- current_
user - Returns the current user.
- date_
add - Returns the date that is days days after start.
- date_
diff - Returns the number of days from start to end.
- date_
format - Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
- date_
from_ unix_ date - Create date from the number of days since 1970-01-01.
- date_
part - Extracts a part of the date/timestamp or interval source.
- date_
sub - Returns the date that is days days before start.
- date_
trunc - Returns timestamp truncated to the unit specified by the format.
- dateadd
- Returns the date that is days days after start.
- datediff
- Returns the number of days from start to end.
- day
- Extract the day of the month of a given date/timestamp as integer.
- dayofmonth
- Extract the day of the month of a given date/timestamp as integer.
- dayofweek
- Extract the day of the week of a given date/timestamp as integer.
- dayofyear
- Extract the day of the year of a given date/timestamp as integer.
- days
- A transform for timestamps and dates to partition data into days.
- decode
- Computes the first argument into a string from a binary using the provided character set (one of ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’).
- degrees
- Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
- dense_
rank - Returns the rank of rows within a window partition, without any gaps
- desc
- Returns a sort expression based on the descending order of the given column name.
- desc_
nulls_ first - Returns a sort expression based on the descending order of the given column name, and null values appear before non-null values.
- desc_
nulls_ last - Returns a sort expression based on the descending order of the given column name, and null values appear after non-null values.
- e
- Returns Euler’s number.
- element_
at - Returns element of array at given index in extraction if col is array.
- elt
- Returns the n-th input, e.g., returns input2 when n is 2.
- encode
- Computes the first argument into a string from a binary using the provided character set (one of ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’).
- endswith
- Returns a boolean.
- equal_
null - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
- every
- Returns true if all values of col are true.
- exp
- Computes the exponential of the given value.
- explode
- Returns a new row for each element in the given array or map.
- explode_
outer - Returns a new row for each element in the given array or map.
- expm1
- Computes the exponential of the given value minus one.
- expr
- Parses the expression string into the column that it represents
- extract
- Extracts a part of the date/timestamp or interval source.
- factorial
- Computes the factorial of the given value.
- find_
in_ set - Returns the index (1-based) of the given string (str) in the comma-delimited list (strArray).
- first
- Returns the first value in a group.
- first_
value - Returns the first value of col for a group of rows.
- flatten
- Creates a single array from an array of arrays.
- floor
- Computes the floor of the given value.
- format_
number - Formats the number X to a format like ‘#,–#,–#.–’, rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string.
- format_
string - from_
csv - Parses a column containing a CSV string to a row with the specific schema. Returns null in the case of an unparseable string
- from_
json - Parses a column containing a JSON string to a row with the specific schema. Returns null in the case of an unparseable string
- from_
unixtime - Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
- from_
utc_ timestamp - This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE.
- get
- Returns element of array at given (0-based) index.
- get_
json_ object - Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.
- getbit
- Returns the value of the bit (0 or 1) at the specified position.
- greatest
- Returns the greatest value of the list of column names, skipping null values.
- grouping
- Indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
- grouping_
id - Returns the level of grouping, equals to
- hash
- Calculates the hash code of given columns, and returns the result as an int column.
- hex
- Computes hex value of the given column
- histogram_
numeric - Computes a histogram on numeric ‘col’ using nb bins.
- hll_
sketch_ agg - Returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
- hll_
sketch_ estimate - Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
- hll_
union - hll_
union_ agg - Returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
- hour
- Extract the hours of a given timestamp as integer.
- hours
- A transform for timestamps to partition data into hours.
- hypot
- Computes sqrt(a^2 + b^2) without intermediate overflow or underflow.
- ifnull
- Returns col2 if col1 is null, or col1 otherwise.
- ilike
- initcap
- Translate the first letter of each word to upper case in the sentence.
- inline
- Explodes an array of structs into a table.
- inline_
outer - Explodes an array of structs into a table.
- input_
file_ block_ length - Returns the length of the block being read, or -1 if not available.
- input_
file_ block_ start - Returns the start offset of the block being read, or -1 if not available.
- input_
file_ name - Creates a string column for the file name of the current Spark task.
- instr
- Locate the position of the first occurrence of substr column in the given string.
- isnan
- An expression that returns true if the column is NaN.
- isnotnull
- Returns true if col is not null, or false otherwise.
- isnull
- An expression that returns true if the column is null
- java_
method - Calls a method with reflection.
- json_
array_ length - Returns the number of elements in the outermost JSON array.
- json_
object_ keys - Returns all the keys of the outermost JSON object as an array.
- json_
tuple - Creates a new row for a json column according to the given field names.
- kurtosis
- Returns the kurtosis of the values in a group.
- lag
- Returns the value that os offset rows before the current row, and default is there is less than offset rows before the current row
- last
- Returns the last value in a group.
- last_
day - Returns the last day of the month which the given date belongs to.
- last_
value - Returns the last value of col for a group of rows.
- lcase
- Returns str with all characters changed to lowercase.
- lead
- Returns the value that os offset rows after the current row, and default is there is less than offset rows after the current row
- least
- Returns the least value of the list of column names, skipping null values.
- left
- Returns the leftmost len
(
len can be string type) characters from the string str, if len is less or equal than 0 the result is an empty string. - length
- Computes the character length of string data or number of bytes of binary data.
- levenshtein
- like
- lit
- Creates a Column of spark::expression::Literal value.
- ln
- Returns the natural logarithm of the argument.
- localtimestamp
- Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column.
- locate
- log
- Returns the first argument-based logarithm of the second argument.
- log2
- Returns the base-2 logarithm of the argument.
- log1p
- Computes the natural logarithm of the “given value plus one”.
- log10
- Computes the logarithm of the given value in Base 10.
- lower
- Converts a string expression to lower case.
- lpad
- Left-pad the string column to width len with pad.
- ltrim
- Trim the spaces from left end for the specified string value.
- make_
date - Returns a column with a date built from the year, month and day columns.
- make_
dt_ interval - Make DayTimeIntervalType duration from days, hours, mins and secs.
- make_
interval - Make interval from years, months, weeks, days, hours, mins and secs.
- make_
timestamp - Create timestamp from years, months, days, hours, mins, secs and timezone fields.
- make_
timestamp_ ltz - Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields.
- make_
timestamp_ ntz - Create local date-time from years, months, days, hours, mins, secs fields.
- make_
ym_ interval - Make year-month interval from years, months.
- map_
concat - Returns the union of all the given maps.
- map_
contains_ key - Returns true if the map contains the key.
- map_
entries - Returns an unordered array of all entries in the given map.
- map_
from_ arrays - Creates a new map from two arrays.
- map_
from_ entries - Converts an array of entries (key value struct types) to a map of values.
- map_
keys - Returns an unordered array containing the keys of the map.
- map_
values - Returns an unordered array containing the values of the map.
- max
- Returns the maximum value of the expression in a group.
- max_by
- Returns the value associated with the maximum value of ord.
- md5
- Calculates the MD5 digest and returns the value as a 32 character hex string.
- mean
- returns the average of the values in a group.
- median
- Returns the median of the values in a group
- min
- Returns the minimum value of the expression in a group.
- min_by
- Returns the value associated with the minimum value of ord.
- minute
- Extract the minutes of a given timestamp as integer.
- mode
- Returns the most frequent value in a group.
- monotonically_
increasing_ id - A column that generates monotonically increasing 64-bit integers.
- month
- Extract the month of a given date/timestamp as integer.
- months
- A transform for timestamps and dates to partition data into months.
- months_
between - Returns number of months between dates date1 and date2.
- named_
struct - Creates a struct with the given field names and values.
- nanvl
- Returns col1 if it is not NaN, or col2 if col1 is NaN.
- negate
- Returns the negative value.
- negative
- Returns the negative value.
- next_
day - Returns the first date which is later than the value of the date column based on second week day argument.
- now
- Returns the current timestamp at the start of query evaluation.
- nth_
value - ntile
- Returns the ntile group id (from 1 to n inclusive) in an ordered window partition.
- nullif
- Returns null if col1 equals to col2, or col1 otherwise.
- nvl
- Returns col2 if col1 is null, or col1 otherwise.
- nvl2
- Returns col2 if col1 is not null, or col3 otherwise.
- octet_
length - Calculates the byte length for the specified string column.
- overlay
- parse_
url - percent_
rank - Returns the relative rank
- percentile
- Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].
- percentile_
approx - Returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.
- pi
- Returns Pi.
- pmod
- Returns the positive value of dividend mod divisor.
- posexplode
- Returns a new row for each element with position in the given array or map.
- posexplode_
outer - Returns a new row for each element with position in the given array or map.
- position
- positive
- Returns the value.
- pow
- Returns the value of the first argument raised to the power of the second argument.
- power
- Returns the value of the first argument raised to the power of the second argument.
- printf
- product
- Returns the product of the values in a group.
- quarter
- Extract the quarter of a given date/timestamp as integer.
- radians
- Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
- raise_
error - Throws an exception with the provided error message.
- rand
- Generates a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
- randn
- Generates a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
- rank
- Returns the rank of rows within a window partition.
- reflect
- Calls a method with reflection.
- regexp
- Returns true if str matches the Java regex regexp, or false otherwise.
- regexp_
count - Returns a count of the number of times that the Java regex pattern regexp is matched in the string str.
- regexp_
extract - regexp_
extract_ all - regexp_
instr - regexp_
like - Returns true if str matches the Java regex regexp, or false otherwise.
- regexp_
replace - Replace all substrings of the specified string value that match regexp with replacement
- regexp_
substr - Returns the substring that matches the Java regex regexp within the string str.
- regr_
avgx - Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
avgy - Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
count - Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
intercept - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_r2
- Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
slope - Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
sxx - Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
sxy - Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- regr_
syy - Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
- repeat
- Repeats a string column n times, and returns it as a new string column.
- replace
- reverse
- Returns a reversed string or an array with reverse order of elements.
- right
- Returns the rightmost len
(
len can be string type) characters from the string str, if len is less or equal than 0 the result is an empty string. - rint
- Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
- rlike
- Returns true if str matches the Java regex regexp, or false otherwise.
- round
- Round the given value to scale decimal places using HALF_UP rounding mode if scale >= 0 or at integral part when scale < 0.
- row_
number - Returns a sequential number starting at 1 within a window partition.
- rpad
- Right-pad the string column to width len with pad.
- rtrim
- Trim the spaces from right end for the specified string value.
- schema_
of_ csv - Parses a CSV string and infers its schema in DDL format
- schema_
of_ json - Parses a JSON string and infers its schema in DDL format
- sec
- Computes secant of the input column.
- second
- Extract the seconds of a given date as integer.
- sentences
- Splits a string into arrays of sentences, where each sentence is an array of words
- sequence
- Generate a sequence of integers from start to stop, incrementing by step.
- session_
window - Generates session window given a timestamp specifying column.
- sha
- Returns a sha1 hash value as a hex string of the col.
- sha1
- Returns the hex string result of SHA-1.
- sha2
- Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512).
- shiftleft
- Shift the given value numBits left.
- shiftright
- (Signed) shift the given value numBits right.
- shiftrightunsigned
- (Signed) shift the given value numBits right.
- shuffle
- Generates a random permutation of the given array.
- sign
- Computes the signum of the given value.
- signum
- Computes the signum of the given value.
- sin
- Computes sine of the input column.
- sinh
- Computes hyperbolic sine of the input column.
- size
- Returns the length of the array or map stored in the column.
- skewness
- Returns the skewness of the values in a group.
- slice
- Returns an array containing all the elements in x from index start (array indices start at 1, or from the end if start is negative) with the specified length.
- some
- Returns true if at least one value of col is true.
- sort_
array - Sorts the input array in ascending or descending order according to the natural ordering of the array elements.
- soundex
- Returns the SoundEx encoding for a string
- spark_
partition_ id - A column for partition ID.
- split
- Splits str around matches of the given pattern.
- split_
part - Splits str by delimiter and return requested part of the split (1-based).
- sqrt
- Computes the square root of the specified float value.
- stack
- Separates col1, …, colk into n rows
- startswith
- Returns a boolean.
- std
- Alias for stddev_samp.
- stddev
- Alias for stddev_samp.
- stddev_
pop - Returns population standard deviation of the expression in a group.
- stddev_
samp - Returns the unbiased sample standard deviation of the expression in a group.
- str_
to_ map - Create a map after splitting the text into key/value pairs using delimiters
- struct_
col - Creates a new struct column.
- substr
- substring
- Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type.
- substring_
index - Returns the substring from string str before count occurrences of the delimiter delim.
- sum
- Returns the sum of all values in the expression.
- sum_
distinct - Returns the sum of distinct values in the expression.
- tan
- Computes tangent of the input column.
- tanh
- Computes hyperbolic tangent of the input column.
- timestamp_
micros - Creates timestamp from the number of microseconds since UTC epoch.
- timestamp_
millis - Creates timestamp from the number of milliseconds since UTC epoch.
- timestamp_
seconds - Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.
- to_
binary - to_char
- Convert col to a string based on the format.
- to_csv
- Converts a column containing a StructType into a CSV string
- to_date
- Parses the timestamp with the format to a timestamp without time zone.
- to_json
- Converts a column containing a StructType into a JSON string
- to_
number - Convert string ‘col’ to a number based on the string format ‘format’
- to_
timestamp - Converts a Column into pyspark.sql.types.TimestampType using the optionally specified format.
- to_
timestamp_ ltz - Parses the timestamp with the format to a timestamp without time zone.
- to_
timestamp_ ntz - Parses the timestamp with the format to a timestamp without time zone.
- to_
unix_ timestamp - Returns the UNIX timestamp of the given time.
- to_
utc_ timestamp - This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE.
- to_
varchar - Convert col to a string based on the format.
- translate
- A function translate any character in the srcCol by a character in matching.
- trim
- Trim the spaces from both ends for the specified string column.
- trunc
- Returns date truncated to the unit specified by the format.
- try_add
- Returns the sum of left and right and the result is null on overflow.
- try_
aes_ decrypt - try_avg
- Returns the mean calculated from values of a group and the result is null on overflow.
- try_
divide - Returns dividend/divisor.
- try_
element_ at - Returns element of array at given (1-based) index.
- try_
multiply - Returns left*right and the result is null on overflow.
- try_
subtract - Returns left-right and the result is null on overflow.
- try_sum
- Returns sum calculated from values of a group and the result is null on overflow.
- try_
to_ binary - try_
to_ number - Convert string ‘col’ to a number based on the string format ‘format’
- try_
to_ timestamp - ucase
- Returns str with all characters changed to uppercase.
- unbase64
- Decodes a BASE64 encoded string column and returns it as a binary column.
- unhex
- Inverse of hex.
- unix_
date - Returns the number of days since 1970-01-01.
- unix_
micros - Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
- unix_
millis - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.
- unix_
seconds - Returns the number of seconds since 1970-01-01 00:00:00 UTC.
- unix_
timestamp - Convert time string with given pattern (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix time stamp (in seconds), using the default timezone and the default locale, returns null if failed.
- upper
- Converts a string expression to upper case.
- url_
decode - Decodes a str in ‘application/x-www-form-urlencoded’ format using a specific encoding scheme.
- url_
encode - Translates a string into ‘application/x-www-form-urlencoded’ format using a specific encoding scheme.
- user
- Returns the current database.
- var_pop
- Returns the population variance of the values in a group.
- var_
samp - Returns the unbiased sample variance of the values in a group.
- variance
- Alias for var_samp
- version
- Returns the Spark version.
- weekday
- Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, …, 6 = Sunday).
- weekofyear
- Extract the week number of a given date as integer.
- width_
bucket - Returns the bucket number into which the value of this expression would fall after being evaluated.
- window
- Bucketize rows into one or moe time windows given a timestamp specifying column.
- window_
time - Computes the event time from a window column.
- xpath
- Returns a string array of values within the nodes of xml that match the XPath expression.
- xpath_
boolean - Returns true if the XPath expression evaluates to true, or if a matching node is found.
- xpath_
double - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
- xpath_
float - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
- xpath_
int - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
- xpath_
long - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
- xpath_
number - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
- xpath_
short - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
- xpath_
string - Returns the text contents of the first xml node that matches the XPath expression.
- xxhash64
- Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.
- year
- Extract the year of a given date/timestamp as integer.
- years
- A transform for timestamps and dates to partition data into years.