actio_python_utils.spark_functions.count_distinct_values¶

actio_python_utils.spark_functions.count_distinct_values(self, columns_to_ignore={}, approximate=False)[source]¶

Return a new PySpark dataframe with the number of distinct values in each column. Uses pyspark.sql.functions.count_distinct() by default and pyspark.sql.functions.approx_count_distinct() if approximate == True

Parameters: