Waiting for Server Restart in Ansible

1. Overview

Ansible is a simple configuration management tool that automates application deployment, intra-service orchestration, and cloud provisioning, all in one. In this tutorial, we’ll discuss a use case comprising how to pause a playbook and restart a server in Ansible.

Afterward, we’ll see a practical example that will help us understand how Ansible can solve a variety of real-world challenges — while saving a ton of valuable time.

2. Server Restart Options in Ansible

Ansible uses a playbook to write automation jobs in YAML (Yet Another Markup Language), a simple language that is easy to understand, read, and write. We’ll use Ansible to discuss rebooting nodes or servers by temporarily pausing the playbook for a given amount of time before continuing with its execution.

**2.1. Server Reboot vs. Restart**

Ansible can be used to control our system and its resources. Among carrying out other basic functions, we can use it to reboot our system. For this, we can use the Ansible reboot module. In Ansible versions greater than or equal to 2.7, we can use the built-in reboot module:

-  name: Wait for server to restart 
   reboot: 
   reboot_timeout: 3600

To restart a server in Ansible, we need to define a block of code and wait until the host comes back:

-  name: restart server
   shell: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
   async: 1
   poll: 0
   become: true

2.2. Server Restart as a Task

An Ansible playbook executes part of its overall goal by running one or more tasks as an ordered list. The task here is to call an Ansible module to restart a server:

tasks:
  -  name: restart server
     shell: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
     async: 1
     poll: 0
     ignore_errors: true
     become: true

This runs the shell command as an asynchronous task, so Ansible will not wait for the end of the command. The sleep before and after shutdown is there to prevent breaking the SSH connection during restart while Ansible is still connected to the remote host.

If we want to run multiple tasks in a playbook concurrently, we can use async with a poll set to zero. When we set poll: 0, Ansible starts the task and immediately moves on to the next task without waiting for a result. Each async task runs until it either completes, fails, or times out by running longer than its async value.

2.3. Wait for Server Restart as a Task

Using Ansible’s wait_for module, we can temporarily stop running the playbook while we wait for the server to finish rebooting or for a service to start and bind to a port:

tasks
 -  name: Wait for server to restart
    local_action:
      module: wait_for
        host={{ inventory_hostname }}
        port=22
        delay=10
      become: false

This will run the wait_for task on the machine running Ansible. This task will wait for port 22 to become open on the remote host, starting after ten seconds delay. We can also use the same module to wait for a port to become available. It proves to be useful in situations where services are not immediately available after their init scripts finish.

We may prefer to use the {{ ansible_ssh_host }} variable as the hostname and/or {{ ansible_ssh_port }} as the SSH port if we use entries like:

ansible_ssh_host:some.other.name.com
ansible_ssh_port:2222

in the inventory (Ansible hosts file). Here’s a basic inventory file in YAML format:

all:
  hosts:
    mail.example.com:
  children:
    webservers:
      hosts:
        foo.example.com:
        bar.example.com:
    dbservers:
      hosts:
        one.example.com:
        two.example.com:
        three.example.com:

2.4. Server Restart with Wait Using Handlers

Sometimes, we may want a task to run only when a change is made on a machine. Ansible uses handlers to address this use case. In short, handlers are tasks that run only when notified. Although using or not using handlers is conditional, it is advised to define and run tasks as handlers. There are two main reasons to do this:

Code reuse: We can use a handler for many tasks. For example, we can trigger a server restart after changing the timezone and after changing the kernel.
Trigger only once: If we use a handler for a few tasks, and more than one of them will make some change, then the thing that the handler does will happen only once. For example, if we have an httpd restart handler attached to httpd config change and SSL certificate update, then although both the config and SSL certificate change, httpd will be restarted only once.

Now, we’d run “Restart server and wait for the server to restart” as handlers. When we do so, we use both of these as handlers, not tasks. Let’s take a look at the YAML snippet for restarting and waiting for the restart using handlers:

handlers:
  -  name: Restart server
       command: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
       async: 1
       poll: 0
       ignore_errors: true
       become: true

  -  name: Wait for server to restart
       local_action:
         module: wait_for
           host={{ inventory_hostname }}
           port=22
           delay=10
         become: false

And let’s use it in our task in a sequence, thereupon paired with rebooting the server handler:

tasks:
  -  name: Set hostname
       hostname: name=somename
         notify:
           -  Restart server
           -  Wait for server to restart

It’s noteworthy that handlers are run in the order they are defined, not the order they are listed in notify!

3. Problem Conceptualization

We’ll now discuss rebooting servers while waiting for a given amount of time for a given service on a given port to start. Then, we’ll propose a module with a generic structure using all those Ansible concepts we’ve discussed so far. We’ll simultaneously discuss the functionality, implementation, and any relatable exceptions. For easier conceptualization, we’ll break our problem into four parts.

3.1. Pre-Reboot

The pre-restart includes running our pre-reboot task, which can be performing major upgrades and/or doing some configuration changes that only take effect at boot time. For example, we might upgrade all packages using the yum module:

- name: upgrade all packages
  yum: name=* state=latest

3.2. Reboot

In this stage, we’ll use the command module to reboot the remote machine or server by running the reboot command — nothing fancy — we can also use shutdown –reboot:

- name: reboot server
  command: /sbin/reboot

3.3. Pause and Resume the Playbook

Next, we’ll use the wait_for module to wait for 300 seconds for port 22 to become available before resuming the playbook. We’re using port 22 because most servers run OpenSSH-server on port 22, and if we were to telnet to that port, we’d probably see something like: SSH-2.0-OpenSSH_6.6.1. So, we can use regex to match output with “OpenSSH”.

We’re using a timeout value of 300 seconds because most physical servers take three to five minutes to finish reboot due to hardware checks. But, we can use whatever value suits us. For example, we can tell it to wait for 300 seconds for port 22 to become available and contain OpenSSH:

- name: wait for the server to finish rebooting
  local_action:
    module: wait_for
      host=“web01”
      search_regex=OpenSSH
      port=22
      timeout=300

After we’ve got a response from port 22, we can resume running the playbook. This step is optional.

4. Putting It All Together

We can merge all the above sections into one playbook:

- hosts: all
  sudo: yes
  tasks:
    - name: Upgrade all packages in RedHat-based machines
      when: ansible_os_family == "Redhat"
      yum: name=* state=latest

    - name: Upgrade all packages in Debian-based machines
      when: ansible_os_family == "Debian"
      apt: upgrade=dist update_cache=yes

    - name: Reboot server
      command: /sbin/reboot

    - name: Wait for the server to finish rebooting
      sudo: no
      local_action:
        module: wait_for
          host="{{ inventory_hostname }}"
          search_regex=OpenSSH
          port=22
          delay=1
          timeout=300

The variable inventory_hostname is the name of the remote server stated in the ansible hosts file. The wait_for local_action directive runs the given step on the local machine. Because the yum module only works on RedHat-based OS such as Fedora, CentOS, and RHEL, we’ll use the apt module for Debian-based OS like Ubuntu and Debian.

Ever wondered why we didn’t use handlers here? Well, notify tasks are only executed at the end of the playbook regardless of their location in the playbook. Needless to say that in this use case, we’re only interested in rebooting the server and waiting for a given amount of time for the server to finish rebooting.

5. Conclusion

In this tutorial, we learned how to take full advantage of Ansible by building thoughtfully designed tasks. This allows us to do lots of platform-specific tweaks to make it behave as we want. The concept is life-changing for things like SELinux changes — particularly tasks that include restarting a server.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security