If you’re one of the 90% of data scientists that use Python, it’s time to meet Dask. Dask is a flexible library for distributed parallel computing in Python. It provides ways to scale Pandas, Scikit-Learn, and Numpy workflows more natively, with minimal rewriting.
As a data scientist,
there’s a lot on your plate.
Saturn Cloud allows you to deploy, manage, and scale the PyData stack using Jupyter Notebooks in the cloud
An Infrastructure That Supports Dask.
compute capacity needed
Saturn Cloud in The Marketplace
Spark is written in Scala with some support for Python and R.
Spark is more focused on traditional business intelligence operations like SQL and lightweight machine learning.
Spark lacks flexibility for more complex algorithms or ad-hoc systems. It is fundamentally an extension of the Map-Shuffle-Reduce paradigm.
Spark does not include support for multi-dimensional arrays natively (this would be challenging given their computation model)
Spark provides GraphX, a library for graph processing.
Dask is written in Python and interoperates well with C/C++/Fortran/LLVM or other natively compiled code linked through Python.
Dask is a component of the larger Python ecosystem. It couples with and enhances other libraries like NumPy, Pandas, and Scikit-Learn
Dask supports generic distributed graph evaluation: it isn’t limited by what can be done efficiently using Spark’s Map-Shuffle-Reduce paradigm.
Dask implements more sophisticated algorithms and builds more complex bespoke systems.
Source: Dask Comparison To Spark
"Dask has helped my team speed up experimentation and iteration by finishing data pre-processing tasks that used to take hours in a matter of seconds. Saturn maintains a Dask cluster so we don't have to, which frees up time for real data science. It's a huge value add.”
Global eCommerce Company