actio_python_utils.spark_functions.count_distinct_values¶
- actio_python_utils.spark_functions.count_distinct_values(self, columns_to_ignore={}, approximate=False)[source]¶
Return a new PySpark dataframe with the number of distinct values in each column. Uses
pyspark.sql.functions.count_distinct()
by default andpyspark.sql.functions.approx_count_distinct()
ifapproximate == True
- Parameters:
self (
DataFrame
) – The dataframe to summarizecolumns_to_ignore (
Container
[str
], default:set()
) – An optional set of columns to not summarizeapproximate (bool) – Get approximate counts instead of exact (faster)
- Return type:
DataFrame
- Returns:
The new dataframe with counts of distinct values per column