actio_python_utils.spark_functions¶
Spark-related functionality.
Functions
|
Return a PySpark dataframe with |
|
Converts either a list of dicts ( |
|
Return a PySpark dataframe with the number of times a given string occurs in each string column in a dataframe |
|
Return a new PySpark dataframe with the number of distinct values in each column. |
|
Return a PySpark dataframe with the number of null values in each column of a dataframe |
|
Load and return the specified data source using PySpark |
|
Return a PySpark dataframe from either a relation or query |
|
Load and return the specified Excel spreadsheet with PySpark |
|
Load and return the specified XML file with PySpark |
|
Serializes a |
|
Serializes a |
|
Operates on a PySpark dataframe and converts any field of either atoms or structs, or any array of either of those (but not nested) to the properly formatted string for postgresql TEXT loading format and assigns it the column name new_column. |
|
Serializes a |
|
Serializes a |
|
Configures and creates a PySpark session according to the supplied arguments |
|
Split a dataframe with PySpark to a set of gzipped CSVs, e.g. if a dataframe has data: col1,col2,col3 1,1,1 1,2,3 2,1,1. |