aztk.spark package

aztk.spark.client module

class aztk.spark.client.Client(secrets_config: aztk.models.secrets_configuration.SecretsConfiguration)[source]

Bases: aztk.client.Client

Aztk Spark Client This is the main entry point for using aztk for spark

Parameters:secrets_config (aztk.spark.models.models.SecretsConfiguration) – Configuration with all the needed credentials
create_cluster(cluster_conf: aztk.spark.models.models.ClusterConfiguration, wait: bool = False)[source]

Create a new aztk spark cluster

Parameters:
  • cluster_conf (aztk.spark.models.models.ClusterConfiguration) – Configuration for the the cluster to be created
  • wait (bool) – If you should wait for the cluster to be ready before returning
Returns:

aztk.spark.models.Cluster

create_clusters_in_parallel(cluster_confs)[source]
delete_cluster(cluster_id: str, keep_logs: bool = False)[source]
get_cluster(cluster_id: str)[source]
list_clusters()[source]
get_remote_login_settings(cluster_id: str, node_id: str)[source]
submit(cluster_id: str, application: aztk.spark.models.models.ApplicationConfiguration, remote: bool = False, wait: bool = False)[source]
submit_all_applications(cluster_id: str, applications)[source]
wait_until_application_done(cluster_id: str, task_id: str)[source]
wait_until_applications_done(cluster_id: str)[source]
wait_until_cluster_is_ready(cluster_id: str)[source]
wait_until_all_clusters_are_ready(clusters: List[str])[source]
create_user(cluster_id: str, username: str, password: str = None, ssh_key: str = None) → str[source]
get_application_log(cluster_id: str, application_name: str, tail=False, current_bytes: int = 0)[source]
get_application_status(cluster_id: str, app_name: str)[source]
cluster_run(cluster_id: str, command: str, host=False, internal: bool = False, timeout=None)[source]
node_run(cluster_id: str, node_id: str, command: str, host=False, internal: bool = False, timeout=None)[source]
cluster_copy(cluster_id: str, source_path: str, destination_path: str, host: bool = False, internal: bool = False, timeout: int = None)[source]
cluster_download(cluster_id: str, source_path: str, destination_path: str = None, host: bool = False, internal: bool = False, timeout: int = None)[source]
cluster_ssh_into_master(cluster_id, node_id, username, ssh_key=None, password=None, port_forward_list=None, internal=False)[source]
submit_job(job_configuration: aztk.spark.models.models.JobConfiguration)[source]
list_jobs()[source]
list_applications(job_id)[source]
get_job(job_id)[source]
stop_job(job_id)[source]
delete_job(job_id: str, keep_logs: bool = False)[source]
get_application(job_id, application_name)[source]
get_job_application_log(job_id, application_name)[source]
stop_job_app(job_id, application_name)[source]
wait_until_job_finished(job_id)[source]
wait_until_all_jobs_finished(jobs)[source]
run_cluster_diagnostics(cluster_id, output_directory=None)[source]