actio_python_utils.spark_functions.serialize_field¶

actio_python_utils.spark_functions.serialize_field(self, column, new_column=None, struct_columns_to_use=None)[source]¶

Operates on a PySpark dataframe and converts any field of either atoms or structs, or any array of either of those (but not nested) to the properly formatted string for postgresql TEXT loading format and assigns it the column name new_column. If new_column is not specified, the original column will be overwritten. N.B. All string types should be pyspark.sql.types.StringType as opposed to pyspark.sql.typesCharType or pyspark.sql.types.VarcharType.

Parameters:

self (DataFrame) – The dataframe to use
column (str) – The name of the column to serialize
new_column (Optional[str], default: None) – The name to give the new serialized column
struct_columns_to_use (Optional[Container], default: None) – A set of struct values to use (assuming column is a struct)

Return type:

DataFrame

Returns:

A new dataframe replacing original column with a serialized one

actio_python_utils.spark_functions.serialize_field¶

Table of Contents

This Page