actio_python_utils.spark_functions.load_xml_to_dataframe

actio_python_utils.spark_functions.load_xml_to_dataframe(self, xml_fn, row_tag, schema=None, load_config_options=None, **kwargs)[source]

Load and return the specified XML file with PySpark

Parameters:
  • self (SparkSession) – The PySpark session to use

  • xml_fn (str) – The path to the data source to load

  • row_tag (str) – The XML tag that delimits records

  • schema (Optional[str], default: None) – The path to an optional XSD schema to validate records

  • load_config_options (Optional[Iterable[tuple[str, str]]], default: None) – Any additonal config options to load data

  • **kwargs – Any additional named arguments

Return type:

DataFrame

Returns:

The dataframe requested