actio_python_utils.spark_functions.count_distinct_values¶
- actio_python_utils.spark_functions.count_distinct_values(self, columns_to_ignore={}, approximate=False)[source]¶
Return a new PySpark dataframe with the number of distinct values in each column. Uses
pyspark.sql.functions.count_distinct()by default andpyspark.sql.functions.approx_count_distinct()ifapproximate == True- Parameters:
self (
DataFrame) – The dataframe to summarizecolumns_to_ignore (
Container[str], default:set()) – An optional set of columns to not summarizeapproximate (bool) – Get approximate counts instead of exact (faster)
- Return type:
DataFrame- Returns:
The new dataframe with counts of distinct values per column