Welcome to Saturn
What is Saturn?
Saturn is an end to end enterprise data science platform focused on providing the fastest possible runtime to drive faster iteration cycles, and value delivery. Using multiple nodes and GPUs dramatically accelerates data science workloads, but these systems are often hard to maintain. Saturn focuses on enabling Dask and RAPIDS, which makes multi-node and multi-gpu computation much faster and easier.
Saturn is installed inside your AWS account, secured inside your VPC in an EKS (Elastic Kubernetes) cluster. Your code and data never leave your account. Saturn is priced in a pay as you go model. You only pay for what you use, and you can cancel anytime (we have no subscription fee).
- Hosted Jupyter notebooks.
- Managed multi-node and multi-gpu Dask clusters.
- Integrated collaboration tools.
- Facilities for building, sharing, and customizing docker images.
- Notebook publishing and dashboard deployments.
- Production deployments of scheduled workflows, and machine learning models.
How do I use Saturn?
Typical steps in getting started with Saturn look like this:
Choose a Docker image
Most people will either
- Use our standard image, which provides most data science packages you will need
- Bring their own docker image (we may need to help you conform it to a spec)
- Drop in an
requirements.txt, and/or a
bashscript to build an image using the Saturn Image Builder
Configure data access credentials
Saturn supports environment variables and file-based credentials that are present in all your containers. Some common examples are:
- database passwords and logins
- ssh keys
- AWS IAM credentials
Spin up resources
Most people start with a Jupyter instance. You can resize this instance as needed (our biggest tier has 500 GB of RAM and 8 V100 GPUs). If you want, you can add a Dask cluster to parallelize your work. The code you write in Jupyter is versioned and mirrored on your Dask cluster. If it works in Jupyter, it will work on your Dask cluster. Some common examples are:
- doing some exploratory analysis with a 5 TB Dask DataFrame
- using Dask and joblib to execute machine learning hyperparameter scans
- running pytorch models in parallel over terabytes of images and videos
Deploy your work
Once you’ve got something you’re happy with, people typically deploy their work. Some common examples are:
- Turning a notebook into web application with
- Scheduling an ETL job with prefect
- Deploying a ML model as an http server, so you can call it from other external services.
Saturn can deploy arbitrary http applications and handle secure authentication at the network layer so that only authorized users can consume them. Saturn can also schedule arbitrary scheduled workflows that run on the Kubernetes cluster. Any deployment can (but doesn’t have to) utilize Dask.
Is Saturn for me?
Saturn is most useful for data science teams that need faster performance and repeatable deployment.
- Do you frequently wait more than an hour for data science jobs to finish?
- Do you run out of RAM on your workstation?
- Do you need to show your work to managers or peers?
- Do you need to deploy your work to production?
If you use Spark and are looking for more performance or a more pythonic experience, you might be interested in Saturn.
- Saturn and Dask are focused on the Python ecosystem. No more debugging JVM stacktraces.
- Dask and RAPIDS implement existing Python APIs (sklearn/pandas)
- Dask and RAPIDS deliver anywhere from 2x to 100x performance improvements over Spark. Especially in machine learning workloads.