As DevOps/Ops, you maintain DB instances or RAM intensive services. You see OOM issues occasionally, don’t you? Yes, the scary Out-Of-Memory issues.
Nobody enjoys OOM issues. When it does happen, what should be checked? More importantly, how to monitor OOM issues? And get alerts, before it actually happens.
Here are some of my thoughts. Take a look and discuss with me!
Deployment process may explicitly or implicitly run commands like apt-get, wget, etc. It’s quite natural and common. However if you want a smooth and stable deployment, you have to watch out all these outbound traffic. Why? And How?
Two months ago, I released nagios3 chef cookbook to Chef Supermarket. Really happy to say it has over 4,000,000 downloads now! Not sure how large portion are machines or for testing purpose, but it’s a good sign indeed.
Here I’d like to introduce more about this chef cookbook. And share how to define effective monitoring items.
Monitoring memory usage of a given process is critical for trouble shooting and issues escalation. Quite strange that no available public nagios plugin for this can be found. Thus I write one.