Try Your Best To Avoid Any SSH Operations. Yes, I deeply believe in this principle. But my reality is the automation is never so perfect. I still need to login and check system status sometimes. Though the chance is rare.
It may happen at nights or even when I’m on vacation. So what I can do? Just carry my laptop with me wherever I go? This is certainly bad, isn’t it?
We’re living in the world of ChatOps. And mobile phones dominate our daily life (Sadly!) So why don’t we implement a ChatOps bot for this? Here comes a slack command: /chatqueryhost. (Note: The solution is not limited to Slack)
Check More Discussion In LinkedIn
1. Install node_usage.py For Each Nodes.
It will get status for each node as a json output.
- OS resource utilization: RAM, CPU, and disk.
- [Optional] Get service status, or tail log files.
(See why I choose Python over Shell: GoodBye Shell, Hello Python!)
2. Start A Web Server, Serving Slack Requests.
It will start a webserver with Python flask + uwsgi.
- Get requests from Slack input.
- Run remote ssh command, which is literally node_usage.py.
You can design and implement your own ChatOps solution for this.
And here are some suggestions I’ve for you, my friends:
- Slack command must return within 3 seconds.
- Instead of running ssh commands, it’s safer to wrap py script as agents. Then the agent serves as a tcp or http server.
- Support find node by fuzzy match. Your env may have tens of nodes, if not hundereds of. So support people to identify node by giving part of hostname.