Introduction
The watchdog is Level's backup recovery mechanism for the agent service. If the agent service stops, the OS service manager handles the recovery in most cases (Service Control Manager on Windows brings it back within 60 seconds via service recovery options). The watchdog runs on a schedule and exists for the rare case where the service manager didn't bring the service back on its own.
It runs silently in the background and you won't notice it under normal conditions. If the watchdog is restarting the agent frequently on a given device, both the service manager and the watchdog are working harder than they should be, which means something in the environment is interfering with Level.
What the watchdog does
Two systems work together to keep the Level service running. The OS service manager is the first line: when the service stops, it restarts it automatically (typically within 60 seconds on Windows via service recovery options). The watchdog is the second line, running every 10 minutes as a check that the service manager actually did its job.
On its 10-minute cycle, the watchdog answers one question: is the Level agent service running, and if not, can it be restarted? It uses the service manager to read the current state and then takes one of a few actions:
If the service is running, do nothing.
If the service is stopped, start it.
If the service is paused, resume it.
If the service record is missing entirely (the agent was uninstalled or corrupted), uninstall the watchdog itself so the device doesn't get stuck running an orphan task.
The watchdog doesn't report to Level, doesn't generate alerts, and doesn't reinstall the agent. It's a local self-healing loop, nothing more.
ℹ️ NOTE: The watchdog restarts the service. It doesn't reinstall the agent. If the binary has been removed, quarantined by AV/EDR, or corrupted, you'll need to reinstall using the appropriate installer for the platform. See Windows Install, macOS Install, or Linux Install.
Windows Behavior
The detail below is Windows-specific. On macOS and Linux, recovery is handled by the OS service manager directly (see the next section).
On Windows, the watchdog is a Windows Scheduled Task that fires every 10 minutes and runs the agent's --check-service routine. Each run goes through three steps.
Step 1: Query the service state
The watchdog uses the Service Control Manager (SCM) to open the Level service and read its current state (Running, Stopped, Paused, or missing). This is the source of truth for whether the service is up.
Step 2: Validate the local monitoring connection
If SCM reports the service as Running, the watchdog briefly tries to connect to the agent over the local monitoring/RPC channel. This catches cases where the service process is alive but has stopped responding internally.
If that connection fails, the watchdog logs an error and moves on. It doesn't tear the service down based on this check alone. Recovery is still driven by the next step.
ℹ️ NOTE: This connection validation is a soft check. A transient RPC failure doesn't trigger a restart by itself, which prevents the watchdog from flapping on healthy systems that briefly couldn't respond.
Step 3: Run EnsureRunning
After the state check, the watchdog always calls the agent's EnsureRunning routine. This is where the actual recovery happens:
System uptime under 60 seconds. Skips the entire routine to avoid fighting early boot, where the service may not have started yet by design.
Service record missing. Treats this as a bad or uninstalled state and removes the watchdog task itself. The device is no longer being managed by a watchdog because there's nothing for it to watch.
Service stopped. Starts the service.
Service paused. Resumes the service.
Waiting for Running. Polls in a short loop until the service reports Running before exiting.
The embedded task description sums it up: the watchdog exists to help keep the Level Windows service running.
macOS and Linux Behavior
Level relies on the OS service manager for recovery on Unix-like platforms. The agent doesn't run a separate scheduled check the way it does on Windows.
🖥️ PLATFORM NOTE:
Windows: Implemented as a Windows Scheduled Task that runs every 10 minutes. Full check sequence described above.
macOS: Managed by the LaunchDaemon at
/Library/LaunchDaemons/Level.plist. If the service stops,launchdrestarts it according to the daemon configuration.Linux: Managed by
systemd. If the service stops,systemdrestarts it according to the service unit configuration.
The end result is the same across platforms: a stopped service comes back without manual intervention. The mechanism differs.
Sleep, resume, and what the watchdog isn't for
The watchdog has no sleep or power-resume logic. It runs the same 10-minute check whether the device just woke up or has been running for a week.
The agent normally handles sleep on its own without intervention. The Level service process keeps running through standby, so resume usually doesn't require anything specific to recover from. Sleep does seem to correlate with weird networking issues (stale DNS state is a common one), but the recovery for those isn't sleep-specific.
The agent has a separate connection watcher running internally. Its main job is detecting stale connections: when some outbound connections are working and others aren't (cached DNS is a typical example), the watcher restarts the agent to force a fresh state. This isn't a sleep recovery feature, though it sometimes helps with sleep-induced networking weirdness as a side effect.
💡 TIP: Disable sleep on managed endpoints where possible. It's what we do internally at Level and what most of our larger customers (5,000+ devices) do. It removes a class of intermittent connectivity issues that aren't worth troubleshooting per-device.
When the watchdog fires
An occasional restart is normal and rarely worth investigating. A transient crash, a resource spike, a brief interfering process: any of these can cause a one-off restart, and the agent comes back without anyone noticing.
A pattern of frequent restarts is different. SCM should be handling most service crashes within 60 seconds, and the watchdog should rarely need to step in. If the watchdog is bringing the service back repeatedly on the same device, something on that device is preventing the agent from running normally.
Common causes:
AV/EDR interference. Security software terminating or quarantining the Level agent. This is the most common cause and is usually behavior-based, which is why it can affect a single device in an otherwise uniform fleet. See AV/EDR False Detections.
Other management tools. Another scheduled task, GPO, or RMM stopping the Level service.
Hardware issues. Failing disk, memory errors, or other hardware faults causing the service to crash.
⚠️ WARNING: We don't recommend disabling the watchdog. SCM brings the Level service back within 60 seconds in nearly all cases. The watchdog is the safety net for the rare exception where SCM didn't. Leaving it enabled costs nothing.
FAQ
Can I disable the watchdog? You can, but we don't recommend it. SCM is the primary recovery mechanism for the Level service and brings it back within 60 seconds in nearly all cases. The watchdog is the safety net for the rare exception where SCM didn't recover the service. Disabling it removes that safety net. Leaving it enabled costs nothing.
The watchdog restarted the agent on one of my devices. Should I be worried? An isolated restart usually isn't worth investigating. It can come from a transient crash, a resource spike, or another process briefly interfering. If you see the watchdog restarting the agent repeatedly on the same device, that's the signal to look closer. AV/EDR interference is the most common cause. See AV/EDR False Detections.
My device isn't coming back online after sleep. Isn't the watchdog supposed to handle that? No. The watchdog has no sleep or power-resume logic. Resume recovery is handled by the agent's connection watcher and realtime client, plus the fact that the Level service process usually keeps running through sleep. If a device isn't reconnecting after wake, the watchdog isn't where to look. Start with the network and the agent's connection state. See Offline Troubleshooting.
Does the watchdog reinstall the agent if the binary is missing? No. It only starts or resumes an existing service. If the service record is missing entirely, the watchdog removes itself and the device needs the agent reinstalled. See the install articles for your platform.
Where can I see whether the watchdog is healthy on a device? Run the agent's
--checkdiagnostic command on the device. The output includes a Level checks section that shows whether the agent service and watchdog task are in the expected state (Running/Ready). See Offline Troubleshooting for the full diagnostic walkthrough.Do technicians need permissions in Level to interact with the watchdog? The watchdog runs locally on each device and isn't configurable from the Level web interface. There are no permissions to grant or revoke. Interacting with it directly (inspecting the Windows scheduled task, the macOS LaunchDaemon, or the Linux systemd unit) requires local administrative access on the device.
