Fault-Tolerant Data Pipelines with Prefect Cloud

Prefect is an open source workflow orchestration framework written in Python. It can integrate with Dask to speed up data processing pipelines by taking advantage of parallelism. Prefect Cloud is a high-availability, fault-tolerant hosted offering that handles orchestration of these pipelines.

Overview

This tutorial explains how to use Prefect Cloud and Saturn Cloud together.

The tutorial “Scheduled Data Pipelines” introduces how to build data pipelines using prefect, and how to speed them up by executing them on a Saturn Dask Cluster. If you are not familiar with prefect yet, consider reading that article first and then coming back to this one.

If you are not familiar with Prefect Cloud or want a deeper understanding of how the integration between Saturn Cloud and Prefect Cloud works, see “Prefect Cloud”.

For this tutorial, we’ll create a flow that mimics the process of getting a batch of records, using a machine learning model to score on them, and capturing metrics.

Set Up a Prefect Cloud Account

To begin this tutorial, you’ll need an existing Prefect Cloud account. Prefect Cloud’s free tier allows you to run a limited number of flows, so you can run this tutorial without spending any money on Prefect Cloud!

  1. Sign up at https://www.prefect.io/cloud/
  2. Once logged in, create a project. For the purpose of this tutorial, call it dask-iz-gr8.
  3. Following the Prefect documentation, create a RUNNER token and a USER token. Store these for later.
    • RUNNER token: must be created by an admin. Allows an agent to communicate with Prefect Cloud
    • USER token: allows a user to register new flows with Prefect Cloud

Create a Prefect Cloud Agent in Saturn

Prefect Cloud “agents” are always-on processes that poll Prefect Cloud and ask “want me to run anything? want me to run anything?". In Saturn Cloud, you can create these agents with a few clicks and let Saturn handle the infrastructure.

  1. Log in to the Saturn UI as an admin user.
  2. Navigate to the “Credentials” page and add a Prefect runner token.
    • Type: Environment Variable
    • Shared With: your user only
    • Name: PREFECT_RUNNER_TOKEN
    • Value: the RUNNER token you created during setup
  3. Navigate to the “Prefect Agents” page. Create a new agent.
    • Name: test-prefect-agent
    • Prefect Runner Token: select the PREFECT_RUNNER_TOKEN you created earlier
  4. Start that Prefect Agent by clicking the play button.

After a few minutes, your agent will be ready! If you go to the “Logs” page, you can see the logs for this agent.


In the Prefect Cloud UI, you should see a new KubernetesAgent up and running!

Create and Register a Flow

Now that you’ve created an account in Prefect Cloud and set up an agent in Saturn to run the work there, it’s time to create a flow!

  1. Return to the Saturn UI.
  2. Navigate to the “Credentials” page and add a Prefect USER token.
    • Type: Environment Variable
    • Name: PREFECT_USER_TOKEN
    • Value: the USER token you created during setup
  3. Navigate to the “Jupyter” page and create a Jupyter with the following specs.
    • Name: test-prefect
    • Disk Space, Size, Auto Shutoff: keep the defaults
    • Image: any of the available non-gpu saturncloud/saturn images you want
    • Environment Variables
      PREFECT_CLOUD_PROJECT_NAME=dask-iz-gr8
      
    • Start script
      pip install --upgrade dask-saturn prefect-saturn
      
  4. Start that Jupyter by clicking the play button.
  5. Once that Jupyter is ready, click “Jupyter Lab” to launch Jupyter Lab.
  6. In Jupyter Lab, open a terminal and run the code below to fetch the example notebook that accompanies this tutorial.
    cd /home/jovyan/project/
    EXAMPLE_REPO_URL=https://raw.githubusercontent.com/saturncloud/examples/main/examples/examples-cpu/prefect/
    
    wget ${EXAMPLE_REPO_URL}/prefect-cloud-scheduled-scoring.ipynb
    
  7. In the file browser in the left-hand navigation, double-click that notebook to open it. Follow the instructions in it and run the cells in order. Return to this article when you’re done.

Inspect Flow Runs

Now that your flow has been created and registered with both Saturn Cloud and Prefect Cloud, you can track it’s progress in the Prefect Cloud UI.

  1. In the Prefect Cloud UI, go to Flows --> ticket-model-evaluation. Click Schematic to see the structure of the pipeline.

  1. Click Logs to see logs for this flow run.
    • from this page, you can search the logs, sort them by level, and download them for further analysis
  1. In the Saturn Cloud UI, navigate to the Dask page. You should see that a new Dask cluster has been created for this flow, with a name like p-c93609. Click the dashboard URL to monitor the activity in the cluster.
  1. In the Saturn Cloud UI, navigate to the Logs page. Select the Prefect agent you previously set up. You should see new logs messages confirming that the agent has received a flow to run.

Clean Up

The flow created in this tutorial is set to run every 10 minutes. Once you’re done with this tutorial, be sure to tear everything down!

In Prefect Cloud

  1. navigate to Flows. Delete the ticket-model-evaluation flow.

In Saturn Cloud

  1. Logged in as the user who created the flow, navigate to the Dask page. Click the delete button on this flow’s Dask cluster to stop and delete it.
  2. Navigate to the Jupyter page. Click the delete button to stop and delete the jupyter you use to create the flow.
  3. Logged in as the user you used to create a Prefect agent, navigate to the Prefect Agents page. Click the delete button to stop and delete the Prefect agent.
  4. Navigate to the Credentials page. Remove the credentials PREFECT_RUNNER_TOKEN and PREFECT_USER_TOKEN.

Learn and Experiment!

In this tutorial, you learned how to use Prefect Cloud to manage a prefect flow, and how to improve the speed and environment management of that flow using a Saturn Cloud Dask cluster.

To learn more about prefect-saturn, see https://github.com/saturncloud/prefect-saturn.

To learn more about Prefect Cloud, see https://docs.prefect.io/orchestration/.

If you have any other questions or concerns, send us an email at support@saturncloud.io.