Dask
Dask is a Python library for parallel and distributed computing. A Dask cluster is composed by one scheduler that coordinates the job of many workers, which can have access to CPU or GPU resources. Here we show how to install Dask in a conda environment on Aurora and how to start a cluster with GPU workers and run a simple example script.
Install Dask on Aurora
From one of Aurora's login nodes, use the following commands to create a conda environment and install Dask. This will also install other libraries needed to run an example script, and create a Jupyter kernel that allows to work interactively from a notebook.
Start a Dask cluster
Copy the following script into a file called start_dask_aurora.sh
and make it executable with:
Start a cluster with CPU workers
Run the following commands from a compute node on Aurora to start a Dask cluster with 104 CPU workers per node:
Start a cluster with GPU workers
Run the following commands from a compute node on Aurora to start a Dask cluster with 6 GPU workers per node:
Example
In this example, we will estimate Pi using a Monte Carlo method.
Paste the following Python script into a file called pi_dask_gpu.py
. Here is a breakdown of what the script does:
- It connects to the Dask cluster (that you should start beforehand) and prints some information including the number of workers and available memory.
- It divides the total number of points to sample between the workers, and each worker uses its GPU to
- generate random points uniformly inside the unit square
- return the number of points that are inside the unit circle
- When the results from the workers are ready, they are aggregated to compute Pi.
- A total of 5 Pi calculations are performed and timed (the very first iterations will incur in initialization and warmup costs).
- At the end, the Dask cluster is shut down.
Run the pi_dask_gpu.py
example
- First, request an interactive job on 1 node.
- Then, start a Dask cluster with 6 GPU workers and wait about 10 seconds for the cluster to start.
- Press Ctrl+Z (SIGTSTP) and then execute
bg
to continue running the process to the background, or open a new shell and SSH onto the compute node. - Run the example script:
Output
<bound method Client.scheduler_info of <Client: 'tcp://10.168.0.10:8786' processes=6 threads=204, memory=1.00 TiB>>
Run 0 Num samples: 1.04E+10 Estimate: 3.141653798 Time taken: 1.596 s
Run 1 Num samples: 1.04E+10 Estimate: 3.141570887 Time taken: 1.354 s
Run 2 Num samples: 1.04E+10 Estimate: 3.141651954 Time taken: 1.451 s
Run 3 Num samples: 1.04E+10 Estimate: 3.141636617 Time taken: 0.518 s
Run 4 Num samples: 1.04E+10 Estimate: 3.141650108 Time taken: 0.511 s
Connect to a Dask cluster from JupyterLab
Here are the steps to start a Dask cluster and connect to it interactively from a Jupyter notebook:
- First, request an interactive job on 1 node. Print the compute node's hostname (that you get with the command
hostname
), which will be used later. - Then, start a Dask cluster and wait about 10 seconds for the cluster to start.
- On your local machine, open a SSH tunnel to the compute node (
COMPUTE_NODE
is the compute node's hostname andYOUR_ALCF_USERNAME
is your ALCF username):ssh -t -L 23456:localhost:23456 -L 8787:localhost:8787 [email protected] ssh -t -L 23456:localhost:23456 -L 8787:localhost:8787 login.aurora.alcf.anl.gov ssh -t -L 23456:localhost:23456 -L 8787:localhost:8787 COMPUTE_NODE
Failure
If you have issues with the above sequence of ssh
commands, check this page for troubleshooting.
- On the compute node where you land with the above ssh command, start JupyterLab:
- Copy the line starting with
http://localhost:23456/lab?token=<TOKEN>
at the end of the jupyter command's output. - On your local machine, open a browser window and go to that url.
- On the JupyterLab page, select the
dask
kernel and use this script to connect to the Dask cluster: - The Dask dashboard will be available at http://localhost:8787