K8s consists of multiple components on both the Controller and Worker Nodes. Imagine you as a K8s administrator who has built a cluster, and you get a confirmation from users that they could deploy and test applications through this cluster successfully.
However, there is no guarantee that the cluster will run as a healthy cluster all the time without running into any issues. Things might break anytime; services might stop working, K8s components might break, etc. Hence it’s important to have a reliable monitoring solution in place for a reliable K8s cluster.
In this blog post, let’s try to understanding
- Why monitoring
- How to monitor?
- Learn some practical tools for the same
K8s clusters compose of multiple components and layers with different failure points. It’s important to know when and why these fail. The goal of a monitoring system is to assist in providing a healthy, reliable system.
Apart from reliability, monitoring helps understand the system to help debug issues like failures, resource utilization, etc.
Another useful item using monitoring might be utilizing the same for understanding trends for capacity planning for resources.
Some use-cases include(but are not limited to) might be
Monitoring K8 clusters and nodes
Cluster resource usage, cluster/node availability health, etc.
Monitoring K8s deployments, services and pods
Failing pods/deployments/services expected several replicas running, pods running within resources requests and limits, etc.
Monitoring K8s applications
Availability, health, performance, etc.
What are monitoring layers?
Monitoring can be performed at different layers. Below are the layers at a high level.
Different monitoring metrics can be collected at various monitoring layers. Below is a rough guide on what kind of metrics can be collected at different layers.
Are there any in-built monitoring tools in K8s?
K8s has some in-built monitoring tools. These can be used out of the box with minimal configuration. However, there are more sophisticated solutions for the same(as shown in the diagram below).
Each node in a K8s cluster runs a kubelet component. Each kubelet contains the
cAdvisor Which helps to gather metrics like CPU, memory, etc., from each container on a given node. It also helps gather metrics from the node as a whole.
Metrics Server Helps collect these metrics from cAdvisors and bring them to a central place. Metrics Server runs as a pod on some node in the K8s cluster and is exposed as a K8s service. One can run
kubectl top to find CPU, memory, network utilization for containers, pods, or nodes after the metrics server is configured.
K8s DashboardHelps provide a visual representation of data from the Metrics Server.
K8s State Metrics It helps provide additional metrics that Metrics Server cannot provide. This listens to the K8s APIs and generates metrics related to K8s logical objects such as node-status, node-capacity, pod-status, etc. This can be deployed as a service with a single replica.
Probes Helps monitor health status for containers and services.
Liveness Probes help with checks to see if a service/pod is alive and take appropriate actions if not.
Readiness Probe help with checks to see if a service/pod is ready to serve traffic. K8s docs have a lot of details on how to configure these.
It’s highly recommended to use more mature “Open Source” or “commercial” monitoring solutions for production. In an upcoming blog post, we will look into other monitoring solutions, monitoring pipelines, and monitoring architecture. Till then, ciao!