Logging in Dask

How to create logs in your code within Dask

When writing code a natural method of keeping track of how code runs is through logging. Typically in Python, logging is done using the built in logging module, like this:

import logging
logging.warning('This is a warning')
logging.info('This is non-essential info')

Unfortunately, if you try and use this style of logging from within a @dask.delayed function, you won’t see any output at all. You won’t see it in the console if you’re running a Python script nor will you see it after a cell within a Jupyter Notebook. This is also the case for print functions–they won’t be captured if they are run within a @dask.delayed function. So an alternate approach is needed for logging within Dask.

Instead, to do logging we’ll need to use the distributed.worker Python module, and import logger. This will give us a logging mechanism that does work in Dask. Here is an example of it in action.

import dask
from distributed.worker import logger # required for logging in Dask
from dask_saturn import SaturnCluster
from dask.distributed import Client

cluster = SaturnCluster()
client = Client(cluster)

@dask.delayed
def lazy_exponent(args):
    x,y = args
    result = x ** y
    # the logging call to keep tabs on the computation
    logger.info(f"Computed exponent {x}^{y} = {result}")
    return result 

inputs = [[1,2], [3,4], [5,6], [9, 10], [11, 12]]

example_future = client.map(lazy_exponent, inputs)
futures_gathered = client.gather(example_future)
futures_computed = client.compute(futures_gathered, sync=False)

results = [x.result() for x in futures_computed]
results

client.close()

The logs generated using distributed.worker won’t show up in the console output or in a Jupyter Notebook still. Instead they’ll be within the Saturn Cloud project worker logs. First, click the “running” link on the Dask cluster of the project workspace:

The Dask cluster for a project workspace

From there, expand each of the Dask workers. The logs from each worker are stored individually, and when you write code you won’t know which worker it will end up being executed on:

The Dask workers

Those will show the logs created by the Dask worker. Notice that there is lots of information there, including how the worker was started by Dask. Near the bottom you should see the logs we wanted, in this case the ones generated by lazy_exponent:

distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          4
distributed.worker - INFO -                Memory:                   15.50 GB
distributed.worker - INFO -       Local Directory: /tmp/dask-worker-space/worker-1ggklqmo
distributed.worker - INFO - Starting Worker plugin <dask_cuda.utils.CPUAffinity object at 0x7fcf3508a-4bbdb030-ec01-4b2e-bc94-d12ba7418478
distributed.worker - INFO - Starting Worker plugin <dask_cuda.utils.RMMSetup object at 0x7fcf8ca5b390-03909dd1-c3e3-4bde-b721-067f3977b005
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to: tcp://d-user-pytorch-fa50eff3b23843798e91e808437d8033.main-namespace:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.worker - INFO - Starting Worker plugin saturn_setup
distributed.worker - INFO - Computed exponent 11^12 = 3138428376721
distributed.worker - INFO - Computed exponent 5^6 = 15625

There we correctly see that the logs included the info logging we did within the function. That concludes the example of how to generate logs from within Dask. This can be a great tool for understanding how code is running, debugging code, and better propagating warnings and errors.




Need help, or have more questions? Contact us at: We'll be happy to help you and answer your questions!