For the past three months, I have been working on PKS observability features. Right now, it’s mostly about kubernetes logging.
hmm, logging? Collect logs, and send them to the log server. That looks quite straightforward. Simple and Common, isn’t it? Agree, but only partially. I have noticed some new challenges in the container logging, compared to VM or bare metal envs.
Here are the summary. Check it out! See how much it may apply to your kubernetes projects. (BTW, our PKS project is hiring)
The motivation about this post is more about illustrating the problems & technical difficulties. Not about for how to solve them. And if it has anything against the company’s policy, changes will be made on-demand.
Normally log transport workflow is like. Either active or passive.
Active Way: process actively sends log messages to a remote syslog server. And usually the format of data encoding is rfc5424.
Passive Way: for each process, specify the log paths or file patterns. Log agent periodically scans them and send the captured log messages to the log server.
So you may think problem has been solved. Not Yet, My Friends.
Running Service In Containers Are Different From VMs Or Bare Metal. New trends are:
- Process will be more ephemeral.
- Process deployment will be more distributed.
What does that mean to the container logging?
Challenge I: Fail To Collect All Critical Logs
- Pods may get recreated within several seconds. This means the old passive log transport mechanism may fail to collect logs for those short-lived pods. But your users may need them for trouble shooting! Use active log mechanism? It would raise a high maintenance cost, if you’re targeted in providing kubernetes solution as a product like PKS.
When something is wrong, pods may get deleted or recreated quickly. Consequently the log file associated with that pod/container will be deleted/created quickly.
However log agent like fluentd or logstash detects new log file by scanning the folder or log pattern periodically. And the default scan interval is 60 seconds(see below figure). The scan interval may be too slow to capture the short-lived pods. How about we set the interval to shorter, say 1 second? The performance overhead would be much higher.
Previously this won’t be a problem in VM world. When process gets restarted somehow, log file may be rotated but won’t be deleted. So users may experience slowness for receiving logs. But not like this: Missing critical log for problematic processes.
How we can solve this? Not sure about the best practice, since here in PKS we are also exploring. Maybe we can start a kubernetes controller subscribing to pod events. Whenever a pod creation event has been fired, notify log agent immediately. honeycomb-kubernetes-agent is an interesting GitHub repo implementing this idea. Please leave us comments, if you have a better solution.
- Not all logs are redirected to stdout/stderr. If process inside pod writes log to local file instead of stdout/stderr, log agent won’t get it.
Why? It only monitors the log file associated with the pod, like below. And that log file will only capture container’s stdout/stderr.
# ls -1 /var/lib/docker/containers/*/*-json.log ls -1 /var/lib/docker/containers/*/*-json.log /var/lib/docker/containers/0470.../0470...-json.log /var/lib/docker/containers/0645.../0645...-json.log /var/lib/docker/containers/12d2.../12d2...-json.log ... ...
Yes, this logging behavior is anit-pattern for kubernetes world. However cloud-native movement definitely takes time, not everyone is fashion enough. This is especially true for DB services.
Compared to VM worlds, Pod may move across different worker nodes quite often. But you don’t want whenever k8s cluster has one pod change, the log agent needs to be reloaded or restarted. New challenges, right?
How to solve it? TODO: sidecar, log mapping
Challenge II: Multi-tenancy For Namespace Logging
- Different log endpoint for different namespace
Kubernetes workloads are usually running in shared worker VMs. Workloads from different projects are divided by namespaces.
Different projects may have been different preferences for logging. Where the log goes to, and managed by what tools, etc. Need to provide an easy way to configure and with no extra security compromises.
It turns out kubernetes CRD (CustomResourceDefinition) is a good fit.
- All you need to learn is the standard kubectl command. (See kubectl cheatsheet).
- RBAC can be applied to this custom resource. So security can be easily enforced.
In PKS, we call this feature as sink resource. Note: this idea has been proposed to kubernetes community. Hopeful it will be merged into upstream soon.
Challenge III: Support Logging SLA For Different Namespaces
- Single instance of log agent per worker node.
For simplicity, people usually only deploy one log agent as kubernetes daemonset. It means one pod per kubernetes worker node. If somehow this pod needs to be reloaded or rescheduled, it will impact all Pods living in this worker node.
Starting from k8s v1.12, each node may run 100 pods. Need to make sure your log agent is fast enough to collect logs from all the pods.
Like any shared envs, you may experience noisy neighborhood issue. The misbehavior of one Pod will hurt all other pods in the same worker node. Want to disable logging for one problematic namespace? You can easily avoid emitting the log, but not the part of collecting log.
- No guarantee for log transport latency. Each step will impose an extra overhead for the overall workflow.
Slow disk may create significant latency for log transport. Fail to handle back-pressure issues may DDoS your log agent.
Challenge IV: Handle Logging From Different Layers
- Problems may happen from three different layers.
Like below figure, we have pod logs, k8s logs and platform logs. Even for “pod logs”, we have logs from standard workload or from k8s add-ons.
As you may guess, different types of logs have different characteristics. And they may have different priorities. Not only layer vs layer, but also different SLA for the same layer.
To provide k8s solution, how we can address this? Facilitate Ops/Dev to find out root cause quickly. Meanwhile minimize the security compromises.
What is PKS? PKS is an enterprise Kubernetes solution from VMware and Pivotal.
Interested in PKS job opportunities? Search PKS in this link. (Or contact me directly)