I’m running docker containers for all side projects. Usually one single container. Sometimes multiple containers started by docker-compose.
If containers get issues, I want to get alerts.
What a typical monitoring requirement! But if you do some research, you will find the information is just overwhelming. cAdvisor, Prometheus, InfluxDB, etc. Excuse me? Can’t we have a simple solution for this simple requirement? Here is my answer. Try it and discuss with me.
As DevOps/Ops, you maintain DB instances or RAM intensive services. You see OOM issues occasionally, don’t you? Yes, the scary Out-Of-Memory issues.
Nobody enjoys OOM issues. When it does happen, what should be checked? More importantly, how to monitor OOM issues? And get alerts, before it actually happens.
Here are some of my thoughts. Take a look and discuss with me!
Two months ago, I released nagios3 chef cookbook to Chef Supermarket. Really happy to say it has over 4,000,000 downloads now! Not sure how large portion are machines or for testing purpose, but it’s a good sign indeed.
Here I’d like to introduce more about this chef cookbook. And share how to define effective monitoring items.
Monitoring memory usage of a given process is critical for trouble shooting and issues escalation. Quite strange that no available public nagios plugin for this can be found. Thus I write one.
Usually a process will have a limit for how many fd it can open. If the fd count is too many and abnormal, usually it indicates resource leak in code level or a burst of request.
Hence put effective monitoring for fd count of critical processes are important.