System deployment and upgrade usually need to perform many actions. If we can detect and improve time-consuming steps, we shall get a better customer satisfaction or a shorter maintenance time window. However tons of steps are involved, how we can easily examine and figure out the bottleneck?
Original Article: https://dennyzhang.com/list_slowest_steps
You may think, if we can make sure all critical actions log messages with same timestamp format, we could get the time elapsed for each step.
The answer is Yes and No. Usually deployment may run automation scripts of several components or modules. e.g some are in bash, some in Chef/Puppet/Ansbile, or even some in Python, etc. It’s hard to enforce the practice, especially for the timestamp format convention. The good thing is that every professional tool/engineer will do effective logging for all critical actions, if not all. So the missing part here is how to attach the unified timestamp.
Fortunately Jenkins has a useful plugin called Timestamper. It can add timestamps to the Console Output of Jenkins jobs.
Here is the idea:
- Automate deployment procedure as a bash script. And run it as a Jenkins job.
- Enable Jenkins Timestamper plugin properly for this Jenkins job.
- Caculate time performance of each steps by parsing Jenkins Console Output line by line.
- Sort steps by time performance with a descending order.
For better user experience, I’ve defined a Jenkins job: DiagnosticJenkinsJobSlow. Below is a real example for how it works.
Notice: You can find a live demo here.