Dask

Spinning up your first Dask Cluster

Dask Clusters

In Saturn, Dask clusters are created for specific purposes, for example I may create a Dask cluster for use with an existing Jupyter instance. Or I may create a Dask cluster for use with a scheduled Prefect job. Dask clusters are disposable and light weight - it’s simpler and better to create and destroy them as you need, rather than try to keep them active and share them between multiple users. In addition, a Dask cluster should have the same software configuration as the client that’s using it. You want all the same libraries installed as well as the same version of your code running on that cluster.

Spinning up Dask Clusters from python

This section is going to assume that we’re running inside Jupyter, but the everything here will work if you’re operating in a Saturn deployment. It’s generally more convenient to spin up dask clusters from Python. We include a dask-saturn library installed in all our environments, and if you’re building your own images, we strongly recommend you include that library as well. To spin up a dask cluster, execute

from dask_saturn import SaturnCluster
cluster = SaturnCluster()
cluster

This provisions a Dask cluster that matches your Jupyter instance. It will have the same version of your project checked out, the same docker image, and all of your credentials. Your Jupyter instance is only associated with 1 Dask cluster. If you call SaturnCluster from multiple notebooks on that Jupyter instance, you will get a reference to the same Dask Cluster. The SaturnCluster class allows you to specify

n_workers: how many instances to start with

nprocs: the number of processes per machine

nthreads: the number of threads per process

scheduler_size: the size of machine to use for the dask scheduler

worker_size: the size of machine to use for the dask worker

Spinning up Dask Clusters from the UI

This UI is more useful for managing Dask clusters

This is a UI of the same parameters represented above. The “Resource attachment” is what I’m using here to indicate that this dask cluster is for my “website analytics” Jupyter instance.

Once created, there is a card presenting the Dask cluster that can be used to start/stop the Dask cluster.