As avid Linux enthusiasts, we might have encountered the perplexing message “watchdog did not stop!” during a Linux system’s shutdown. This enigmatic message often leaves us scratching our heads, wondering what it means and if it indicates a severe problem with our Linux system.
In this article, we’ll demystify the “watchdog did not stop!” message during shutdown in Linux systems. First, we’ll understand the watchdog mechanism in Linux. Then, we’ll discuss why the “watchdog did not stop!” message appears during the system shutdown process.
Finally, we’ll explore how to fix this issue and troubleshoot additional shutdown issues. Let’s get started!
2. Understanding the Watchdog Mechanism in Linux
Before we dive into the specifics of the “watchdog did not stop!” message, let’s take a moment to discuss the Linux watchdog. Simply put, a watchdog is like a guardian that monitors the system’s health and takes corrective actions if something goes awry.
During the regular operation of a Linux system, the watchdog periodically expects a response, like a heartbeat, from the system software. If the system is functioning correctly, it sends back the heartbeat within the scheduled time frame. However, if the system becomes unresponsive due to a software or hardware issue, it fails to send the expected heartbeat.
Here’s where the watchdog becomes crucial. If it doesn’t receive the expected heartbeat, it assumes the system is stuck or frozen. In such a situation, the watchdog takes action to prevent the system from becoming permanently unresponsive or crashing. This activity can vary depending on the watchdog’s configuration, but in most cases, it triggers a system reboot to restore the system.
3. Why the “watchdog did not stop!” Message Appears
Now that we grasp the watchdog mechanism, let’s examine why the “watchdog did not stop!” message appears during system shutdown. During a normal shutdown process, the system’s software sends a signal to the watchdog, indicating that it’s shutting down gracefully.
However, in specific scenarios, the system shutdown process may encounter issues or delays, causing the watchdog to activate before the shutdown completes. When this happens, the watchdog may perceive the system as unresponsive since the expected shutdown signal hasn’t been received within its defined timeout period. As a result, it triggers the “watchdog did not stop!” message.
We should know that this message doesn’t indicate a critical problem with our system. Instead, it serves as a safety net, ensuring the system eventually shuts down even if the normal shutdown process encounters an unexpected hitch. Ultimately, the appearance of the “watchdog did not stop!” message might seem alarming, but it’s an expected behavior and not something we should worry about in most cases.
4. How to Fix the “watchdog did not stop!” Issue
While the “watchdog did not stop!” message is not a severe concern, it can indicate underlying issues that need our attention. Let’s explore various methods and configurations to troubleshoot, address this disturbing message, and ensure a smoother shutdown process.
4.1. GRUB Settings Alteration
Our first approach is to modify the GRand Unified Bootloader (GRUB) settings. GRUB_CMDLINE_LINUX is a configuration parameter in the GRUB configuration file (/etc/default/grub) that allows us to specify additional command-line options to the Linux kernel during the boot process. These options can influence the kernel’s behavior and its interaction with the hardware.
However, to fix the “watchdog did not stop!” message, we can edit the /etc/default/grub file and add the “reboot=bios” parameter to the GRUB_CMDLINE_LINUX line. This parameter instructs the Linux kernel to use the Basic Input/Output System (BIOS) for handling reboots and shutdowns. Some systems may have issues with the Advanced Configuration and Power Interface (ACPI) settings, and using the BIOS for rebooting might help resolve conflicts and ensure a smoother shutdown process without triggering the watchdog message.
$ sudo nano /etc/default/grub
Then, we look for the line starting with “GRUB_CMDLINE_LINUX” and add “reboot=bios” to the existing parameters. The line may look like this:
Now, we modify it to include the new parameter (reboot=bios):
GRUB_CMDLINE_LINUX="quiet splash reboot=bios"
Finally, we save the file and exit the text editor. To implement the changes, we should run the update-grub command:
$ sudo update-grub
Now, we can try shutting down our system and see if this works.
4.2. Other GRUB Alteration
Alternatively to our previous command, if “reboot=bios” doesn’t work, we can also try “reboot=acpi“:
GRUB_CMDLINE_LINUX="quiet splash reboot=acpi"
In this case, “reboot=acpi” instructs the Linux kernel to use the ACPI-based system for handling reboots and shutdowns. ACPI is an advanced power management interface that allows the operating system (OS) to interact with the hardware, including power management functions.
After replacing “reboot=bios” with “reboot=acpi“, we should save the file, exit the text editor and implement the changes using the update-grub command.
Ultimately, either of these options resolving the problem depends on our specific hardware and system configuration, and one of these settings may work better than the other in a particular situation.
4.3. Using systemctl
Another method involves using the systemctl reboot -i command to initiate a system reboot:
$ sudo systemctl reboot -i
systemctl is a command-line utility in Linux-based OS to manage system services and control the system’s behavior. It’s an essential part of the systemd init system, which is responsible for booting the system, managing services, and handling other system-related tasks.
In this command, the -i option (ignore inhibitors) bypasses potential inhibitors. Inhibitors are mechanisms that prevent or delay a system reboot or shutdown to avoid data loss or other undesired consequences. For instance, an active user session, critical system processes, or specific software activities might inhibit a reboot. However, using the -i flag instructs the system to proceed with the reboot, even if some of these inhibitors are present.
4.4. Configuration Changes in /etc/lvm/lvm.conf for LVM
In certain scenarios, the “watchdog did not stop!” message may be linked to specific configurations, such as the Logical Volume Manager (LVM). LVM is a storage management technology that allows for flexible disk space management, including creating and managing logical volumes, volume groups, and physical volumes. The /etc/lvm/lvm.conf file is the configuration file for LVM on Linux systems.
Sometimes, on Linux systems with LVM, we can address the “watchdog did not stop!” issue by setting “use_lvmetad = 0” in the /etc/lvm/lvm.conf file. use_lvmetad is a configuration setting in the /etc/lvm/lvm.conf file that determines whether LVM should use the lvmetad daemon or not. The lvmetad daemon is responsible for caching metadata from LVM devices to improve performance and reduce the time it takes to initialize LVM volumes.
By default, the use_lvmetad setting is enabled, which means it’s set to 1. When enabled, LVM uses lvmetad to cache metadata. However, deactivating lvmetad can be helpful when we encounter the “watchdog did not stop!” message or other LVM-related issues. This prevents LVM from using the cached metadata and instead performs metadata scans during each operation, which can potentially resolve specific problems related to metadata corruption or inconsistencies.
Now, let’s edit the /etc/lvm/lvm.conf file using a text editor with administrative privileges (sudo):
$ sudo nano /etc/lvm/lvm.conf
Then, we look for the line that contains “use_lvmetad” and change its value from 1 to 0:
use_lvmetad = 0
Finally, we save the file and exit the text editor. After making the changes, we should reboot the system to apply the configuration. However, let’s be aware that deactivating lvmetad might impact LVM operation performance, so it’s usually best to enable it unless specific issues require it turned off.
4.5. Disabling the Watchdog Timer
Occasionally, the “watchdog did not stop!” message may be triggered by a watchdog timer not adequately handled during system shutdown. As we discussed, a watchdog timer is a hardware or software component that monitors the system and triggers a reboot if it detects certain issues, like a system freeze. However, we can try turning off the watchdog timer to see if that fixes the problem.
First, let’s check if the watchdog service is active:
$ sudo systemctl status watchdog
watchdog.service - Watchdog Timer Device
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-08-01 10:15:25 UTC; 2 days ago
Process: 1234 ExecStart=/usr/sbin/watchdog (code=exited, status=0/SUCCESS)
Main PID: 1235 (watchdog)
Tasks: 1 (limit: 4915)
Aug 01 10:15:25 example-Baeldung systemd: Starting Watchdog Timer Device...
Aug 01 10:15:25 example-Baeldung watchdog: Starting watchdog process...
Aug 01 10:15:25 example-Baeldung watchdog: watchdog: successfully initialized
Aug 01 10:15:25 example-Baeldung systemd: Started Watchdog Timer Device.
As we can see, the watchdog service is functional, but let’s better understand our output here:
- watchdog.service – Watchdog Timer Device – watchdog service name and description
- (enabled; vendor preset: enabled) – service is currently enabled and will start automatically on boot
- active (running) – service is active and running
- ExecStart=/usr/sbin/watchdog (code=exited, status=0/SUCCESS) – service was started by the /usr/sbin/watchdog command, with an exit status of 0/SUCCESS
- 1235 – service primary Process ID (PID)
- system.slice/watchdog.service entry – systemd is managing the watchdog service
- 1.5M – watchdog process is using approximately 1.5MB of memory
In addition, various log messages are shown with timestamps (Aug 01 10:15:25) that give insight into the service’s startup process. The log messages confirm that the watchdog process was initialized without any reported errors.
Now, we have confirmed that the watchdog service is running. Let’s try to stop and deactivate it:
$ sudo systemctl stop watchdog
$ sudo systemctl disable watchdog
Here, we disabled the watchdog service from starting automatically at boot. Also, our output indicates that the watchdog.service has been removed from the multi-user.target.wants directory, which effectively prevents it from starting automatically.
Finally, let’s reboot the system to apply the changes. This might resolve the “watchdog did not stop!” message.
Notably, stopping the service turns off the watchdog functionality until it’s manually restarted. This could risk the system not auto-rebooting during critical issues. Therefore, we should only deactivate the watchdog with valid reasons and full awareness of potential implications.
5. Troubleshooting Additional Shutdown Issues
While addressing the “watchdog did not stop!” message is essential, other hardware problems can cause unexpected behaviors during system shutdown, including the “watchdog did not stop!” message. Let’s also see some steps to diagnose and address these hardware issues.
5.1. Running Diagnostics
Many Linux distributions come with built-in diagnostic tools to help identify hardware problems. We can use some of these standard tools to troubleshoot and diagnose our Linux system for potential hardware issues that might be causing the “watchdog did not stop!” message:
- MemTest86 – designed to run outside the OS, often as a bootable option during system startup, and test the system’s RAM for errors
- smartctl and gnome-disks – Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) tools to monitor and analyze the health of Hard Disk Drives (HDDs) and Solid State Devices (SSDs)
- lshw – identifies hardware inconsistencies or errors by providing detailed information about various hardware components on our system
- strace – intercepts and records system calls and signals made by a process, thus aiding in debugging and diagnosing programs, even when their source code is unavailable
- hwinfo – provides comprehensive hardware information and can assist in diagnosing hardware-related issues
- stress – stress-test our CPU, RAM, and other system components, potentially revealing instability or overheating issues
In addition, we can also check our system’s documentation or the manufacturer’s website for instructions on how to run hardware diagnostics.
Likewise, we can check hardware connections to ensure that all hardware components, such as RAM modules, hard drives, and expansion cards, sit correctly in their respective slots, as loose connections can lead to issues during a shutdown.
5.2. Test With Minimal Hardware
To isolate hardware-related problems, we can try booting the system with the bare minimum hardware configuration. We can also remove unnecessary peripherals and extra RAM modules and check if the issue persists.
Ultimately, employing these troubleshooting steps can enhance the overall boot performance and reduce the likelihood of encountering the “watchdog did not stop!” message.
6. Common Misconceptions
Amidst our efforts to resolve the “watchdog did not stop!” message, we must address common misconceptions surrounding this issue. As Linux users, we may mistakenly interpret the message as the root cause of our shutdown problems, leading us to overlook other underlying issues.
However, we must remember that the “watchdog did not stop!” message is merely a symptom of inevitable shutdown delays and is not the core problem itself. It’s a normal behavior of the watchdog mechanism, ensuring system safety in the event of a shutdown failure.
In this article, we have unraveled the mystery behind the “watchdog did not stop!” message that appears during the Linux system shutdown.
Furthermore, we delved into practical solutions to tackle the “watchdog did not stop!” issue. From adjusting GRUB settings with “reboot=bios” or “reboot=acpi” parameters to harnessing the power of “systemctl reboot -i” and effecting configuration changes in /etc/lvm/lvm.conf, we armed ourselves with an arsenal of methods to enhance our shutdown experience.
Finally, we must remember that while the “watchdog did not stop!” message might give pause, it often serves as a signpost to deeper issues. By addressing these underlying concerns, we not only bid farewell to this message but also ensure the optimal performance and reliability of our Linux systems.