actio_python_utils.spark_functions.serialize_field

actio_python_utils.spark_functions.serialize_field(self, column, new_column=None, struct_columns_to_use=None)[source]

Operates on a PySpark dataframe and converts any field of either atoms or structs, or any array of either of those (but not nested) to the properly formatted string for postgresql TEXT loading format and assigns it the column name new_column. If new_column is not specified, the original column will be overwritten. N.B. All string types should be pyspark.sql.types.StringType as opposed to pyspark.sql.typesCharType or pyspark.sql.types.VarcharType.

Parameters:
  • self (DataFrame) – The dataframe to use

  • column (str) – The name of the column to serialize

  • new_column (Optional[str], default: None) – The name to give the new serialized column

  • struct_columns_to_use (Optional[Container], default: None) – A set of struct values to use (assuming column is a struct)

Return type:

DataFrame

Returns:

A new dataframe replacing original column with a serialized one