actio_python_utils.spark_functions.serialize_field¶
- actio_python_utils.spark_functions.serialize_field(self, column, new_column=None, struct_columns_to_use=None)[source]¶
Operates on a PySpark dataframe and converts any field of either atoms or structs, or any array of either of those (but not nested) to the properly formatted string for postgresql TEXT loading format and assigns it the column name new_column. If
new_column
is not specified, the original column will be overwritten. N.B. All string types should bepyspark.sql.types.StringType
as opposed topyspark.sql.typesCharType
orpyspark.sql.types.VarcharType
.- Parameters:
self (
DataFrame
) – The dataframe to usecolumn (
str
) – The name of the column to serializenew_column (
Optional
[str
], default:None
) – The name to give the new serialized columnstruct_columns_to_use (
Optional
[Container
], default:None
) – A set of struct values to use (assuming column is a struct)
- Return type:
DataFrame
- Returns:
A new dataframe replacing original column with a serialized one