Advanced analytics covers a lot of ground. As businesses adopt machine learning, this is one of the first areas they explore. Initial projects are lightweight, so the infrastructure isn’t much of a consideration. Those early successes bring interest from new teams and operational areas.
Most use cases fall under the business operations analytics umbrella. This covers several key areas:
– Business Function: Core business functional and decision support
– Business Unit Performance: Measuring strategy execution for continuous improvement
– Business Health: Measuring how the business responds to changing market conditions
That last bullet point is on the front of every business leader’s mind. Business health is tied at the hip to customer health, another key focus of advanced analytics. As uncertainty increases, models provide more value to decision-makers while increasing the complexity of model training and deployment.
Exploratory use cases evolve, touching larger areas of the organization. Models quickly move from useful to important to business-critical. Lightweight projects only deal with the first few steps of model development, training, and limited deployment. Machine learning needs to mature to include a full lifecycle. Complete projects add new phases:
– Stable Deployment
– Model Scalability
– Model Maintenance
Most machine learning models get a battlefield promotion from running in a local environment to high availability for customers or internal users. New business cases require a low overhead bridge between two-phase and full life cycle machine learning. Costs need to be controlled. At the same time, delivery can’t wait for the data science team to evaluate multiple tools with long learning curves.
Dask and Saturn Cloud offer a simple solution, allowing data science teams to transition from two-phase to full lifecycle machine learning. The Dask, Saturn Cloud stack meets multiple business requirements:
– Integration with existing data science tools (Python, Pandas, TensorFlow, AWS)
– Low learning curve
– Simplified development environment, collaboration, and code/data versioning
– DevOps management and cost control
It’s time to move from a high level to more specific applications.
Advanced Analytics Key Use Cases – Business Functional Analysis
This is the most common entry point for advanced analytics. Understanding how the business operates was in the realm of Business Intelligence. BI tracks what each functional area does now and a straight line to where it should be over time. BI runs on KPIs. These are the data points that feed into the earliest machine learning models.
Business Functional Analysis produces a detailed graph of the connections between business units and across product development phases. While BI tools allow a peek into this graph, machine learning methods result in models providing a complete picture. At first, models are limited to small pieces of larger functional areas. Marketing is an Advanced Analytics early adopter. Their first models describe small parts of their functional area:
– Campaign performance
– Customer lifetime value
– Customer satisfaction
From a machine learning perspective, these models are simple. They’re trained and run on a data scientist’s local environment. Data sets quickly move from BI level data points in NumPy arrays to large Pandas DataFrames. While the core model remains relatively simple, the inputs are not.
Machine learning projects need to mature rapidly to keep up with business needs. Dask’s DataFrames work in the same way that Panda’s do. There’s a very low learning curve to migrate from Pandas to Dask. Dask’s DataFrames optimize memory and compute usage to maximize local resources.
When training datasets change, model version control becomes an important best practice. Reproducibility is often overlooked. In a collaborative environment, model improvement needs to be consistent and provable. The ability to rollback to previous data and model versions is an important piece of reproducibility and collaboration.
Saturn Cloud’s tools help implement best practices in a data science team. Using Jupyter and Saturn Cloud, teams can share and update models with built-in version control. If the team uses S3, or a similar versioned data store, Saturn Cloud also manages dataset versioning.
Business Unit Performance Analysis
Early success leads to wider adoption of machine learning. Projects quickly expand beyond focusing on one or two functional areas. Each business unit recognizes the potential to optimize its operations. Machine learning moves from siloed to integrated.
At this stage in machine learning maturity, most models are still built and often trained locally. However, Business Unit Performance Analysis relies heavily on stable deployments. Users across the company need consistent access to inference for continuous improvement initiatives. Not all models require distributed resources, but high availability necessitates scalability.
This is where model maintenance and DevOps can be barriers to implementation. Dask makes that transition simple. Using Dask, models that have been optimized for local resources don’t require additional coding to migrate to distributed resources.
Saturn Cloud handles the DevOps. Models that need to run on AWS resources can leverage Saturn Cloud for stable deployment. Data scientists avoid becoming part-time cloud architects and deployment engineers.
Business Unit Performance Analysis drives the need for distributed resources for training and inference. Pricing models are a tool for optimizing Sales business unit performance. Supplier discovery and supply chain models are business unit performance optimizers. Business cases move from analysis to decision support to optimization. Each step increases model complexity.
Here again, Dask allows models to scale across multiple instances without additional coding. Rapid model iterating, training and deployment can be costly if each model needs to be recorded for a distributed environment. Even in businesses with large DevOps teams, models can sit on the shelf while engineers come up to speed on the specifics of functionality and integration.
Saturn Cloud deploys models onto AWS resources. The process requires a low level of effort and a minimal DevOps technical skillset. Dask and Saturn Cloud both integrate with ML libraries like PyTorch and TensorFlow. The stack allows data scientists to continue using familiar tools while leveraging distributed training and deployment.
Business Health Analysis
Business Health Analysis is more important than ever. The use case here is to understand the current health of the business and how that can change based on changes to business or market conditions. This can feel like another out of the box model. However, modeling the impacts of change requires complex, customized models.
Canned models learn across an assumed data range. That range bounds the conditions the model can adapt to and how specifically the model describes the business’s health. Many business health models built with generic methods are really market health models. They describe a broad sector instead of a specific business.
To increase accuracy and specificity, models need to be custom-built. As machine learning development moves away from “import from…” it also moves away from built-in optimization. That’s a barrier to custom model development. Without an optimization toolset, the data science team can be forced to choose between the time-consuming optimization work or expensive compute resources.
Dask has a set of optimization tools that are integrated with Python. Data scientists using Python best practices can use Dask to parallelize complex models on distributed resources. Scaling becomes a DevOps intensive process without a tool like Saturn Cloud. Both tools reduce the complexity associated with custom model development from a back-end perspective.
Dask and Saturn Cloud For Advanced Analytics
Advanced Analytics is machine learning by another name. Most businesses are past the early stages of analytics and transitioning to more complex applications. Data science in the real world is business-facing and needs to align with strategy and goals. To make that transition, models need to meet the expectations set by traditional software systems.
The machine learning development lifecycle has similarities to software development but implementation is more complex. Much of that is at the end of the lifecycle: deployment, scaling, and maintenance. However, those three elements are critical success factors if the business wants to get the most out of its machine learning initiatives.
This is where development tools have always added value. Dask makes optimizing and scaling machine learning models simple. It reduces the learning curve by providing a library that’s familiar and as effective as more complex tools.
Saturn Cloud handles a number of data science workflow tasks. It provides an environment for development, collaboration and versioning. It manages AWS resources for model training and deployment.
These types of tools have been available for traditional software development teams and they are proving equally valuable for data science teams. Saturn Cloud and Dask are easy to adopt while meeting the need. Those are the kinds of stacks that have traction over the long run.
Guest post: Vin Vashishta
You may also be interested in: Your Practical Guide to Dask