actio_python_utils.argparse_functions.EnhancedArgumentParser

class actio_python_utils.argparse_functions.EnhancedArgumentParser(*args, description='The Sphinx documentation toolchain.', formatter_class=<class 'actio_python_utils.argparse_functions.CustomFormatter'>, use_logging=False, use_database=False, use_spark=False, use_xml=False, use_glow=False, use_spark_db=False, dont_create_db_connection=False, spark_extra_packages=None, **kwargs)[source]

Bases: ArgumentParser

Customized argparse.ArgumentParser that sets description automatically, uses both argparse.ArgumentDefaultsHelpFormatter and argparse.RawTextHelpFormatter formatters, optionally sets up logging, database, and PySpark connections.

Parameters:
  • *args – Optional positional arguments passed to argparse.ArgumentParser() constructor

  • description (str, default: 'The Sphinx documentation toolchain.') – Passed to argparse.ArgumentParser() constructor

  • formatter_class (HelpFormatter, default: <class 'actio_python_utils.argparse_functions.CustomFormatter'>) – The help formatter to use

  • use_logging (bool, default: False) – Adds log level and log format arguments, then sets up parsing when parse_args() is called

  • use_database (bool, default: False) – Adds a database service argument, then creates a connection to the specified database with the attribute name db when parse_args() is called

  • use_spark (bool, default: False) – Adds spark cores, spark memory, and spark config arguments, then creates a PySpark session with the attribute name spark when parse_args() is called

  • use_xml (bool, default: False) – Adds dependencies to PySpark to parse XML files; sets use_spark = True

  • use_glow (bool, default: False) – Adds dependencies to PySpark to use glow, e.g. to parse VCF files; sets use_spark = True

  • use_spark_db (bool, default: False) – Adds dependencies to PySpark to connect to a database; sets use_spark = True and creates an object to create a database connection with PySpark with the attribute name ``spark_db` when parse_args() is called

  • dont_create_db_connection (bool, default: False) – Don’t create a database connection even if use_database = True

  • spark_extra_packages (Optional[Iterable[tuple[str, str]]], default: None) – Adds additional Spark package dependencies to initialize; sets use_spark = True

  • **kwargs – Any additional named arguments

__init__(*args, description='The Sphinx documentation toolchain.', formatter_class=<class 'actio_python_utils.argparse_functions.CustomFormatter'>, use_logging=False, use_database=False, use_spark=False, use_xml=False, use_glow=False, use_spark_db=False, dont_create_db_connection=False, spark_extra_packages=None, **kwargs)[source]

Methods

__init__(*args[, description, ...])

add_argument([short_arg, long_arg])

Adds an argument while retaining metavar instead of dest in help message

add_argument_group(*args, **kwargs)

add_db_service_argument([short_arg, ...])

Adds an argument to set the database service name sets dest = "db_service"

add_log_format_argument([short_arg, ...])

Adds an argument to set the logging format and sets dest = "log_format"

add_log_level_argument([short_arg, ...])

Adds an argument to set the logging level, converts it to the proper integer, and sets dest = "log_level"

add_mutually_exclusive_group(**kwargs)

add_spark_config_argument([short_arg, long_arg])

Adds an argument to provide 0 or more options to initialize the PySpark session with and sets dest = "spark_config"

add_spark_cores_argument([short_arg, ...])

Adds an argument to set the number of PySpark cores to use and sets dest = "spark_cores"

add_spark_load_config_argument([short_arg, ...])

Adds an argument to provide 0 or more options to load a dataframe in PySpark with and sets dest = "spark_load_config"

add_spark_memory_argument([short_arg, ...])

Adds an argument to set the amount of memory to give to PySpark and sets dest = "spark_memory"

add_subparsers(**kwargs)

convert_arg_line_to_args(arg_line)

error(message)

Prints a usage message incorporating the message to stderr and exits.

exit([status, message])

format_help()

format_usage()

get_default(dest)

parse_args(*args[, db_connection_name, ...])

Parses arguments while optionally setting up logging, database, and/or PySpark.

parse_intermixed_args([args, namespace])

parse_known_args([args, namespace])

parse_known_intermixed_args([args, namespace])

print_help([file])

print_usage([file])

register(registry_name, value, object)

sanitize_argument(long_arg)

Converts the argument name to the variable actually used

set_defaults(**kwargs)

setup_database(args)

Returns a psycopg2 connection to the database specified in args.db_service

setup_logging(args[, name, stream, ...])

Sets up logging with setup_logging() and specified log level and format

setup_spark(args)

Returns a list with a created PySpark session and optionally a PostgreSQL login record if use_spark_db = True

add_argument(short_arg=None, long_arg=None, *args, **kwargs)[source]

Adds an argument while retaining metavar instead of dest in help message

Parameters:
  • short_arg (Optional[str], default: None) – The short argument name

  • long_arg (Optional[str], default: None) – The long argument name

  • *args – Any additional positional arguments

  • **kwargs – Any additional named arguments

Return type:

None

add_db_service_argument(short_arg='-s', long_arg='--service', default=None, **kwargs)[source]

Adds an argument to set the database service name sets dest = "db_service"

Parameters:
  • short_arg (Optional[str], default: '-s') – Short argument name to use

  • long_arg (Optional[str], default: '--service') – Long argument name to use

  • default (Optional[str], default: None) – Default service

  • **kwargs – Any additional named arguments

add_log_format_argument(short_arg='-f', long_arg='--log-format', default='%(asctime)s - %(name)s - %(levelname)s - %(message)s', **kwargs)[source]

Adds an argument to set the logging format and sets dest = "log_format"

Parameters:
  • short_arg (Optional[str], default: '-f') – Short argument name to use

  • long_arg (Optional[str], default: '--log-format') – Long argument name to use

  • default (str, default: '%(asctime)s - %(name)s - %(levelname)s - %(message)s') – Default logging format

  • **kwargs – Any additional named arguments

Return type:

None

add_log_level_argument(short_arg='-l', long_arg='--log-level', default='INFO', **kwargs)[source]

Adds an argument to set the logging level, converts it to the proper integer, and sets dest = "log_level"

Parameters:
  • short_arg (Optional[str], default: '-l') – Short argument name to use

  • long_arg (Optional[str], default: '--log-level') – Long argument name to use

  • default (str, default: 'INFO') – Default logging level value

  • **kwargs – Any additional named arguments

Return type:

None

add_spark_config_argument(short_arg=None, long_arg='--spark-config', **kwargs)[source]

Adds an argument to provide 0 or more options to initialize the PySpark session with and sets dest = "spark_config"

Parameters:
  • short_arg (Optional[str], default: None) – Short argument name to use

  • long_arg (Optional[str], default: '--spark-config') – Long argument name to use

  • **kwargs – Any additional named arguments

Return type:

None

add_spark_cores_argument(short_arg='-c', long_arg='--spark-cores', default='*', **kwargs)[source]

Adds an argument to set the number of PySpark cores to use and sets dest = "spark_cores"

Parameters:
  • short_arg (Optional[str], default: '-c') – Short argument name to use

  • long_arg (Optional[str], default: '--spark-cores') – Long argument name to use

  • default (int | str, default: '*') – Default cores

  • **kwargs – Any additional named arguments

Return type:

None

add_spark_load_config_argument(short_arg=None, long_arg='--spark-load-config', **kwargs)[source]

Adds an argument to provide 0 or more options to load a dataframe in PySpark with and sets dest = "spark_load_config"

Parameters:
  • short_arg (Optional[str], default: None) – Short argument name to use

  • long_arg (Optional[str], default: '--spark-load-config') – Long argument name to use

  • **kwargs – Any additional named arguments

Return type:

None

add_spark_memory_argument(short_arg='-m', long_arg='--spark-memory', default='1g', **kwargs)[source]

Adds an argument to set the amount of memory to give to PySpark and sets dest = "spark_memory"

Parameters:
  • short_arg (Optional[str], default: '-m') – Short argument name to use

  • long_arg (Optional[str], default: '--spark-memory') – Long argument name to use

  • default (str, default: '1g') – Default memory to use

  • **kwargs – Any additional named arguments

Return type:

None

error(message: string)

Prints a usage message incorporating the message to stderr and exits.

If you override this in a subclass, it should not return – it should either exit or raise an exception.

parse_args(*args, db_connection_name='db', spark_name='spark', spark_db_name='spark_db', **kwargs)[source]

Parses arguments while optionally setting up logging, database, and/or PySpark.

Parameters:
  • *args – Any additional positional arguments

  • db_connection_name (str, default: 'db') – The args attribute name to give to a created database connection

  • spark_name (str, default: 'spark') – The args attribute name to give to a created PySpark session

  • spark_db_name (str, default: 'spark_db') – The args attribute name to give to PostgreSQL login credentials for use with PySpark

  • **kwargs – Any additional named arguments

Return type:

Namespace

Returns:

Parsed arguments, additionally with attribute db as a database connection if use_database = True, with attribute spark if use_spark = True, and attribute spark_db if use_spark_db = True

static sanitize_argument(long_arg)[source]

Converts the argument name to the variable actually used

Parameters:

long_arg (str) – The argument name

Return type:

str

Returns:

The reformatted argument

setup_database(args)[source]

Returns a psycopg2 connection to the database specified in args.db_service

Parameters:

args (Namespace) – Parsed arguments from parse_args()

Return type:

connection

Returns:

The psycopg2 connection

setup_logging(args, name='root', stream=None, stream_handler_logging_level=None)[source]

Sets up logging with setup_logging() and specified log level and format

Parameters:
  • args (Namespace) – Parsed arguments from parse_args()

  • name (str, default: 'root') – Logger name to initialize

  • stream (Optional[TextIOWrapper], default: None) – Stream to log to

  • stream_handler_logging_level (Union[int, str, None], default: None) – Logging level to use for stream

Return type:

None

setup_spark(args)[source]

Returns a list with a created PySpark session and optionally a PostgreSQL login record if use_spark_db = True

Parameters:

args (Namespace) – Parsed arguments from parse_args()

Return type:

tuple[SparkSession, PassEntry]

Returns:

A list with the created PySpark session and either a pgtoolkit.pgpass.PassEntry record or None