In Saturn, Dask clusters are created for specific purposes, for example I may create a Dask cluster for use with an existing Jupyter instance. Or I may create a Dask cluster for use with a scheduled Prefect job. Dask clusters are disposable and light weight - it’s simpler and better to create and destroy them as you need, rather than try to keep them active and share them between multiple users. In addition, a Dask cluster should have the same software configuration as the client that’s using it. You want all the same libraries installed as well as the same version of your code running on that cluster.
Spinning up Dask Clusters from python
This section is going to assume that we’re running inside Jupyter, but the everything here will work if you’re operating in a Saturn deployment. It’s generally more convenient to spin up dask clusters from Python. We include a
dask-saturn library installed in all our environments, and if you’re building your own images, we strongly recommend you include that library as well. To spin up a dask cluster, execute
from dask_saturn import SaturnCluster cluster = SaturnCluster() cluster
This provisions a Dask cluster that matches your Jupyter instance. It will have the same version of your project checked out, the same docker image, and all of your credentials. Your Jupyter instance is only associated with 1 Dask cluster. If you call
SaturnCluster from multiple notebooks on that Jupyter instance, you will get a reference to the same Dask Cluster. The
SaturnCluster class allows you to specify
n_workers: how many instances to start with
nprocs: the number of processes per machine
nthreads: the number of threads per process
scheduler_size: the size of machine to use for the dask scheduler
worker_size: the size of machine to use for the dask worker
Spinning up Dask Clusters from the UI
This UI is more useful for managing Dask clusters
This is a UI of the same parameters represented above. The “Resource attachment” is what I’m using here to indicate that this dask cluster is for my “website analytics” Jupyter instance.
Once created, there is a card presenting the Dask cluster that can be used to start/stop the Dask cluster.
Spinning up Dask Clusters on Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. In Saturn, you can use Spot Instances for Dask Cluster Workers. When creating your Dask Cluster, please make sure to check the box field for “Spot Instance” right bellow the “Worker Size” field as shown in following the image.
NOTE: Spot Instances may be shut down by AWS at any time with a two-minute notice. In order to receive these notifications from AWS, additional configuration must be applied - see AWS Spot Instance interruptions documentation.
For more information on Spot Instances, please visit https://aws.amazon.com/ec2/spot.