Write Initscript For Linux Daemon Service

How to write a linux initscript?* You have implemented a daemon service. Now you may want everyone in the project can easily manage the service without extra communication. The standard way would be “service $servicename start/stop/status”. Another thing, no one would enjoy starting service manually, when the machine is rebooted. Here are things you shall need to know or watch out to get above done.

github_initscript.png



We can easily find many live initscript examples under etc/init.d. Good initscripts shall meet multiple requirements for problems of different scenarios.

To impatient readers, please scroll the page down to the bottom. There you can find a full initscript example in github.
_About Start service_*
1 Single instance. Normally we only allow one single instance of the program to be up and running. On way to do this is to maintain a pidfile. The content of the file is the process’s pid. When the service is down or stopped, we shall remove the pidfile or empty the file. Another way is whenever we start the process, we use command like “ps -ef | grep $SomePattern”. Thus if we found any process matching the pattern, the service refuses to start.

When service is up, it will mostly listen on one or multiple tcp ports. So we can check whether the tcp is already listening by “lsof -i tcp:$TCPPORT“. However this solution doesn’t help for the case that two parallel service start are triggered at nearly the same time.

2 Service slow start. Service start takes time, which may varies from seconds to minutes. Thus we shall need add wait mechanism. Keep checking TCP port would be a good idea.

# Wait for service up
timeout=0
while ! lsof -i tcp:$MONITOR_TCP_PORT | grep -i \
          listen 2>/dev/null 1>2; do
    echo -n '.'
    (( timeout ++))
    if [ $timeout -gt $MAX_START_TIMEOUT ]; then return; fi
    sleep 1
done

echo -e " \e[32m[OK]\e[0m"

3 Detach from signal of stdin. If we close the terminal which starts the process, we don’t want the process die since its stdin is closed. This can be done by nohup (like “nohup java -jar some.jar &”) or “start-stop-daemon”.

Here is an example of start-stop-daemon. Suppose the original service start command is:

/usr/bin/java -jar /opt/test/service.jar server \
    /opt/test/service.yml

We can wrap it by start-stop-daemon like below:

start-stop-daemon --start --background --quiet --make-pidfile \
    --pidfile /usr/local/var/run/mdm.pid \
    --background --exec /usr/bin/java -- -jar \
    /opt/test/service.jar server /opt/test/service.yml

_About Service stop_*
1 Service slow stop. Service stop takes time, which may varies from seconds to minutes. When process die, OS will remove /proc/$pid automatically. Thus we can keep checking the existence of /proc/$pid to confirm the service is stopped indeed.

stop() {
    if [[ -f $PIDFILE ]]; then
      echo -n -e " * \033[1mStopping $SERVICE_NAME...\033[0m"

      # REMOVE PIDFILE AND EXIT IF PROCESS NOT RUNNING
      if [ -z "$CHECK_PID_RUNNING" ]; then
        echo -e "PID file found, but no matching process running."
        echo    "Removing PID file..."
        rm $PIDFILE
        exit 0
      fi

      # KILL PROCESS
      pid=$(cat $PIDFILE)
      kill $pid
      r=$?

      # Check for process
      timeout=0
      while [ -d /proc/$pid ]; do
        echo -n '.'
        (( timeout ++))
        if [ $timeout -gt $MAX_STOP_TIMEOUT ]; then return; fi
        sleep 1
      done

      if [ ! -d /proc/$pid ]; then
          rm $PIDFILE
      fi

      echo;

      return $r
    else
      echo -e "No PID file found -- $SERVICE_NAME not running?"
    fi
}

2 Service can’t stop. We can only use kill -9 as the last resort. When we shutdown service, the process may want to do a clean shutdown to avoid data loss or change conflict. If so, we’d better wait until it finishes. Unfortunately sometime the process is just not responding and the only thing what we can do is to kill the process by force.
_About Service status_*
How to confirm whether service is really up and running*. Normally we check pidfile first. Then make sure process is running indeed. If we can do some light-weight healthcheck, it would be even better. Checking tcp port is one easy and pratical way.

status() {
  # GOT PIDFILE?
  [ -f $PIDFILE ] && pid=$(cat $PIDFILE)

  # RUNNING
  if [[ $pid && -d "/proc/$pid" ]]; then
      if lsof -i tcp:$MONITOR_TCP_PORT | grep -i \
              listen 2>/dev/null 1>2; then
          echo -e " * ${SERVICE_NAME} running with PID $pid"
          return 0
      else
          echo -e " * ${SERVICE_NAME} running with problem ($pid)"
          exit 1
      fi
  fi

  # NOT RUNNING
  if [[ ! $pid || ! -d "/proc/$pid" ]]; then
    echo -e " * \033[1;33;40m${SERVICE_NAME} not running\033[0m"
    return 3
  fi

  # STALE PID FOUND
  if [[ ! -d "/proc/$pid" && -f $PIDFILE ]]; then
    echo -e " * \033[1;31;40m[!] Stale PID found in $PIDFILE\033[0m"
    return 1
  fi
}

_About logging during service start and stop_*
Two common issues we need to avoid about logging: 1. Don’t generate tons of output to stdout or log files. 2. When we do a service start or restart, the initscript shall never truncate logfile by mistake. This will make trouble shooting much harder.
_About service autostart, after machine reboot_*
If we follow the standard of initscript, we can easily configure the service as autorun. Thus no need for manual intervene for machine/vm reboot.

  1. CentOS
chkconfig $service_name on
chkconfig --list | grep $service_name
  1. Ubuntu:
update-rc.d $service_name defaults
update-rc.d $service_name enable
initctl list | grep $service_name

Here comes the solution: https://github.com/dennyzhang/devops_public/tree/tag_v6/bash/initscripts
GitHub

github_initscript.png

linkedin
github
slack

PRs Welcome

Blog URL: https://www.dennyzhang.com/linux_write_initscript


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.