Both JupyterHub and Kubernetes are powerful tools that data scientists have been able to use to increase efficiency, collaboration, and innovation. This article outlines the benefits of each separately, and the impact that using them together can have on a company and on the data science teams.
Benefits of JupyterHub
As a hosted version of Jupyter notebook that exists on a web browser, JupyterHub was built for collaboration and ease of use, allowing for anyone to utilize the notebooks without having to download them onto a computer. JupyterHub allows you to have several instances of a single notebook, allowing for a variety of users to use the same notebook. In the case of how to run any bit of Python code, your two main options are to do it via Spyder or Jupyter notebooks. The latter option is more widely considered and allows for a great deal of flexibility, often accessed through a variety of libraries. JupyterHub expands on this flexibility by enabling collaboration, and is, therefore, becoming a more widely used tool in classrooms and for learning.
The increase in efficiency that is brought about through the use of JupyterHub is one way that data science teams can save money. There are a few other ways to save money, which are outlined in this post.
Benefits of K8s
Kubernetes is a container orchestration system that allows you to automate the deployment of, scale, and manage your applications.
The open-source platform has three key advantages:
- Run Anywhere – with Kubernetes, you are not limited to the type of infrastructure that you opt to use for your containers. There are several cloud providers to choose from, and these are outlined below.
- Never Outgrow – As the needs for your applications become more complex, Kubernetes is able to grow with you, your team, your project.
- Planet Scale – Kubernetes allows you to scale your applications without having to increase the number of people on your ops team
Scaling & JupyterHub: Scaling applications is the greatest advantage of Kubernetes when it comes to using it with JupyterHub. In this case, it allows for the larger teams to be able to work together on the same notebook, taking the time to use several servers to make this happen.
The benefits of using Kubernetes is that it allows for speed and convenience in managing your applications. In order to use it, you should choose from several cloud providers, some of which have options that allow you to enhance the usage of Kubernetes:
- Google Cloud (GKE)
- Microsoft Azure Kubernetes Service (AKS)
- Microsoft Azure Kubernetes Service with Autoscaling is a managed Kubernetes service that is available through Microsoft Azure. The Autoscaling feature, announced in November 2019, does automatically scale Kubernetes clusters to run applications.
- Amazon Web Services
- Amazon Web Services with Elastic Container Kubernetes Services (EKS) is also available through AWS. This is a managed Kubernetes service that allows for a secure, hands-off approach in managing the clusters.
- Red Hat OpenShift
- IBM Cloud
- Digital Ocean
Data Scientists have often had to wear many hats, including Security, DevOps, Data Engineering, and Business Analyst. Leveraging managed Kubernetes services can help to make the time for you to focus more on doing the data science part of the job through the increase in efficiency of managing less cloud infrastructure. Do you feel like you’re wearing too many hats? So do several other data scientists, and they have outlined some of the additional hats here.
K8s & JupyterHub – Combining the two for more powerful collaboration
Using the combined power of K8s and JupyterHub is helpful in scenarios where several people need to be able to access a notebook. On its own, JupyterHub can support a smaller group of people, such as a classroom or any group of up to 100 people. As the group of people who need to be able to access and use the notebooks grows, Kubernetes is able to step in and allow for scalability.
In the case of extensive collaboration, using JupyterHub on its own can be limited – this is where Kubernetes comes in. Kubernetes allows for scaling and for easy application infrastructure, enabling companies to utilize JupyterHub in a more extensive manner. It allows for more users and several servers to be used with JupyterHub.
Setting up JupyterHub on Kubernetes can be done with a few simple steps:
1. Install and set up a Kubernetes cluster – this step is specific to the cloud provider that you choose. There are several options listed above, and each of these will require different steps to be set up.
2. Install and set up JupyterHub – JupyterHub needs to be set up using Helm, the package manager for Kubernetes.
– Using Helm allows for JupyterHub to be installed directly on the Kubernetes cluster, and is the first step to setting up JupyterHub
– Once Helm is installed, add in the Helm chart for JupyterHub – this step allows you to use a shorter URL (https://jupyterhub.github.io/helm-chart/)
3. Customize your deployment, environment, resources, user storage, and use management to ensure that this is all compatible with your specific use case
– Deployment: several configuration changes can be made in order to customize your deployment
– Environment: customize your Docker image and set environment variables. Multiple profiles can be used to let users choose their environments.
– Resources: set up your user memory and CPU guarantees/limits, and user GPU guarantees/limits. These can be modified later, along with the size of your cluster
– User Storage: Set up persistent storage, if you need it. The limitation with persistent storage is that it limits the number of people who can be running a single node at a point in time. If you do not need persistent storage, turn it off.
– User Management: By default, JupyterHub will delete inactive users over a period of time. This is referred to as “culling” and is done to manage costs, but can be turned off if needed. In addition, users can be added as an admin or authenticated
Using JupyterHub with Kubernetes is one way to make it easier to program and collaborate. Check out this article to learn more about how to save both time and frustration when programming: 10 Tips to Save You Time and Frustration When Programming
You may also be interested in: How to Run Jupyter Notebooks in the Cloud
Docker is the most popular container platform. It is open-source, and allows users to package and distribute containerized applications, saving time, space, and money. Kubernetes allows users to then take containerized applications, and scale, run, and monitor them. It was developed by Google, and is the leading open-source container orchestrator and distributed applications deployer.
Kubernetes is an open-source container orchestrator and distributed application deployment platform. It works as a control loop, in that it allows users to set their desired format for their application, compares that to the actual state of the application, and corrects any errors. This eliminates the need for users to manually restart or stop containers, assign containers to specific servers, or really perform most time-consuming tedious tasks associated with deployment and environment management.
Kubernetes is made up of three main components-the Control Plan, the Nodes, and the Pods. The Control Plane is the master, which facilitates the orchestration. The Nodes are the physical infrastructure that run the applications, and compose the compute power of the cluster. The Pods contain one or more containers, and are placed on Nodes to be run, depending on their required resources, CPU and memory.
Kubernetes is an open-source container orchestrator and distributed application deployment platform. It works as a control loop, in that it allows users to set their desired format for their application, compares that to the actual state of the application, and corrects any errors. There are many ways to set up Kubernetes, the easiest being to use a public cloud provider such as AWS EKS, which allow users to set up a cluster without the burden of maintaining or managing their own infrastructure. This process saves human operators time, money, and resources to pursue other innovations.
Open-source Kubernetes can be freely personally installed by downloading the source code from Github and compiling it. However, this can be time consuming, complicated, and hard to manage and maintain after the fact. Services such as AWS come with a cost but take away those burdens on time, money, and resources.