Dask

Spinning up your first Dask Cluster

Dask Clusters

In Saturn, Dask clusters are created for specific purposes, for example I may create a Dask cluster for use with an existing Jupyter instance. Or I may create a Dask cluster for use with a scheduled Prefect job. Dask clusters are disposable and light weight - it’s simpler and better to create and destroy them as you need, rather than try to keep them active and share them between multiple users. In addition, a Dask cluster should have the same software configuration as the client that’s using it. You want all the same libraries installed as well as the same version of your code running on that cluster.

Spinning up Dask Clusters from python

This section is going to assume that we’re running inside Jupyter, but the everything here will work if you’re operating in a Saturn deployment. It’s generally more convenient to spin up dask clusters from Python. We include a dask-saturn library installed in all our environments, and if you’re building your own images, we strongly recommend you include that library as well. To spin up a dask cluster, execute

from dask_saturn import SaturnCluster
cluster = SaturnCluster()
cluster

This provisions a Dask cluster that matches your Jupyter instance. It will have the same version of your project checked out, the same docker image, and all of your credentials. Your Jupyter instance is only associated with 1 Dask cluster. If you call SaturnCluster from multiple notebooks on that Jupyter instance, you will get a reference to the same Dask Cluster. The SaturnCluster class allows you to specify

n_workers: how many instances to start with

nprocs: the number of processes per machine

nthreads: the number of threads per process

scheduler_size: the size of machine to use for the dask scheduler

worker_size: the size of machine to use for the dask worker

Spinning up Dask Clusters from the UI

This UI is more useful for managing Dask clusters

This is a UI of the same parameters represented above. The “Resource attachment” is what I’m using here to indicate that this dask cluster is for my “website analytics” Jupyter instance.

Once created, there is a card presenting the Dask cluster that can be used to start/stop the Dask cluster.

Spinning up Dask Clusters on Spot Instances

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. In Saturn, you can use Spot Instances for Dask Cluster Workers. When creating your Dask Cluster, please make sure to check the box field for “Spot Instance” right bellow the “Worker Size” field as shown in following the image.

NOTE: Spot Instances may be shut down by AWS at any time with a two-minute notice. In order to receive these notifications from AWS, additional configuration must be applied - see AWS Spot Instance interruptions documentation.

For more information on Spot Instances, please visit https://aws.amazon.com/ec2/spot.