Say you have issued a command in your servers. Typically the command might either backup something or perform a critical hot fix.
Surely you know the start time of the process. But when it will end? How can you find the execution time, when the process has already been started?
People might manually change critical config files in servers occasionally. For example, /etc/hosts, /etc/hostname, etc.
As an experienced operator, you will remember to backup, before making any changes. Right? What would you do? cp /etc/hosts /etc/hosts.bak.
But is that good enough?
For DevOps, installation is one of our major tasks. People may wonder package installation is pretty straight-forward and easy now. Just run commands like apt-get, yum, brew, etc. Or simply leave it to containers.
Is it really that easy? Here is a list of headache and hidden costs. Discuss with us, DevOps gurus!
Apparently process checking is critical. Yes, we already have tons of linux tools and tips available. Getting familiar with your weapons is actually the first step, and the easiest part.
More importantly, what questions you ask, what for, when approaching your critical process. Fortunately even with plain common sense, we can dig out lots of valuable information.
To break silos and improve availability, DevOps/Ops should be actively collecting useful feedback of prod env maintenance on a regular basis. Enable developers to easily access it and improve feedback loop together as a team effort.
The very first and most important part. What To Examine, Providing Developers Meaningful Feedback?
Ever bothered by suspicious processes running in your servers? No doubt how dangerous they might be: valuable data leaked, CPU/memory wasted, or DDoS attack other victims, etc.
How to easily capture those annoying troublemakers? Even better, get alerted without extra human effort.
While go cloud is a prevailing trend, security is something we can’t afford to ignore. People hate malicious access. Periodically check all widely open TCP Ports is one good practice to secure our system in cloud. Obviously DB ports can’t be exposed to the whole internet. Our internal REST API also need to be protected.
We should make sure firewall is properly configured. What’s more important, we need to be always on top of these security holes with minimum efforts. So let’s automate the audit process of insecure TCP Ports.
Why it’s working in my server, but fails in yours? The question is quite common. After careful check and side-by-side test, we may or may not find out the difference. The root cause may be packages conflict, mismatched versions, corrupted files, or something magic.
How we can quickly detect the noticeable differences between two servers?
If maintaining servers is one of your routine jobs, occasionally you will need to run commands on multiple servers. Say restart service, update firewall rules, grep files, etc.
It’s a bit boring and inefficient to manually ssh and run commands one by one.