The Power of Dask
Dataframes for big data
Dask DataFrames are just like Pandas DataFrames, except the data and computation are sharded on a cluster of machines.
Executing Functions on a Cluster
Dask Delayed is a decorator you can apply to any function. Delayed functions can be chaiend together, and executed in parallel on a cluster. As long as you can write functions, we can parallelize your code.
Scale Up to GPUs
GPUs are much faster than CPUs but traditionally they have been very hard to program. With RAPIDS and Numba, leveraging GPUs is easy. Combine that with Dask, and you can execute on multiple GPUs in parallel.
I'm excited to start moving some of my local projects onto Saturn Cloud's AWS because I used to worry about having my data in their public cloud, but now know I have the privacy and security of my own VPC + can use Dask easily for the projects with larger datasets.
I'm a data scientist working with a team of marine chemists on a series of peer review journal articles. Saturn Cloud made getting them set up and going a snap. We now have computational power equivalent to what I have used at a Fortune 50 company at a tiny fraction of the cost, and with a much faster time to get up and going.
Dask has helped my team speed up experimentation and iteration by finishing data preprocessing tasks that used to take hours in a matter of seconds. Saturn maintains a Dask cluster so we don't have to, which frees up time for real data science. It's a huge value add.