Introduction

Alert when the number of pending disk operations climbs above a threshold and stays there. A growing disk queue means the storage device can't keep up with the I/O being thrown at it, which shows up to users as slow application loads, laggy file operations, and general sluggishness even when CPU and memory look fine.

The Disk Queue Length monitor fires once the condition has been sustained for your configured breach duration, so a single burst of I/O doesn't create noise.

How the Disk Queue Length Monitor Works

Level samples disk activity on covered devices and calculates the average number of requests waiting on the disk. When that value exceeds your threshold for the full breach duration, Level creates an alert.

ℹ️ NOTE: Queue length is reported as a single total for the disk. It is not split into separate read and write values.

Each reading is a short snapshot of disk activity at poll time, roughly 100 milliseconds, rather than an average across the whole polling interval. A brief I/O burst can produce a high reading even on a healthy disk. The breach duration requirement is what filters those out: a queue that stays deep across consecutive readings is a real bottleneck, not a blip.

🖥️ PLATFORM NOTE:

Windows: Reads the Avg. Disk Queue Length PhysicalDisk performance counter, a true OS-level queue measurement.
macOS: macOS doesn't expose a true queue depth, so Level estimates it from cumulative I/O service time reported by IOKit. Virtual APFS synthesized volumes are skipped. Expect macOS numbers to differ from Windows and Linux for comparable workloads.
Linux: Derives queue depth from kernel I/O statistics in /proc/diskstats, also a true queue measurement. Loop devices, RAM disks, device-mapper entries, and individual partitions are excluded when monitoring all drives.

Configuring Disk Queue Length Monitor

Open the target monitor policy and click + Add new monitor. The Add new monitor dialog opens.

Name and Type

Enter a name in the Name field. "Servers - Deep Disk Queue" tells you more in an alert list than "Disk Queue Length."
Set Type to Disk queue length.

Severity

Set Severity to match how urgent a saturated disk is in this context:

Information
Warning
Critical
Emergency

💡 TIP: For database servers, file servers, and hypervisor hosts, sustained queue depth usually deserves Critical. Workloads on those machines degrade fast when storage falls behind.

Drives

Drives controls which disks the monitor evaluates:

Any drive — monitor every drive on the device
System disk — monitor only the device's primary system drive

💡 TIP: System disk is the safer default for workstations. Secondary drives doing backups or large file copies will legitimately queue up I/O, and that's usually not worth an alert.

Threshold

Threshold sets the number of pending operations that must be exceeded to trigger the monitor. Adjust using the up/down arrows or type a value directly.

💡 TIP: If a policy covers mixed hardware, split it. A "Servers - HDD" policy at 2 and a "Servers - NVMe" policy at 25 will both be quieter and more accurate than one compromise threshold across everything. Tags make this easy: tag devices by storage type and target each policy accordingly.

Choosing a Threshold

The right threshold depends almost entirely on the storage hardware. A queue depth that means a spinning disk is drowning is a normal Tuesday for an NVMe drive.

Storage type	Starting threshold	Starting duration	Why
Spinning disk (HDD)	2	10 min	One head, one operation at a time. Sustained queue above 2 means requests are stacking up faster than the disk can serve them. The low threshold gets crossed during routine backups and scans, so the longer duration filters those out.
SATA SSD	10	5 min	SATA's NCQ tops out at a queue depth of 32. Sustained depth around 10 or more means the drive is working hard; approaching 32 means it's saturated.
NVMe SSD	25	3–5 min	NVMe handles queue depths in the thousands by design. Pick a number that's abnormal for your workload rather than a hardware limit. Sustained depth in the dozens on a typical server usually points at a runaway process, not drive capability. The threshold already filters ordinary bursts, so the duration can be shorter.
RAID array (spinning)	2 × disk count	10 min	The classic perfmon rule scales per spindle. An 8-disk array queues comfortably at depths a single disk can't, so a threshold around 16 is the equivalent signal.
Virtual disk / SAN-backed	Baseline first	10–15 min	Queue depth here reflects the hypervisor and storage backend, not a physical device. Transient contention from other guests is routine, so use the longest duration. Watch normal values for a week, then set the threshold above your observed peak.

ℹ️ NOTE: These are alerting thresholds, not performance ceilings. The goal is catching sustained abnormal behavior for that class of hardware, not measuring what the drive can theoretically handle.

Breach Duration

Breach duration sets how long the queue must stay above the threshold before an alert fires. Adjust using the slider or up/down arrows. Range is 1 to 120 minutes.

ℹ️ NOTE: Because each reading is a short snapshot rather than an interval average, breach duration does the heavy lifting here. Keep it at several minutes unless you have a specific reason to alert faster. A 1-minute duration on a low threshold will catch ordinary I/O bursts like antivirus scans and backups.

Unlike threshold, duration doesn't vary much by hardware. It exists to filter workload noise, and backups and scans look the same regardless of the disk underneath. The pattern is slightly inverse: the lower your threshold, the longer your duration should run. See the starting durations in the table under Choosing a Threshold.

Remediation

Attach one or more automations to run when this alert fires: capture a process list to find what's hammering the disk, restart a misbehaving service, or notify your team.

Click the Select an automation field and choose an automation.
Use the link icon to open the selected automation, the eye icon to preview it, or the × to remove it.

Notify Recipients

Sends emails to the policy's recipients when these events occur:

On alert creation
On alert resolution

Recipients are managed at the monitor policy level, in the Recipients section.

Auto-Resolve

The Auto-resolve alert when conditions clear toggle closes the alert automatically when queue depth drops back below the threshold. Enable it if you want self-clearing alerts; leave it off if you want every queue event to persist for manual review.

FAQ

What's a good threshold to start with? It depends on the hardware: roughly 2 for a spinning disk, 10 for a SATA SSD, 25 for NVMe, and 2 per spindle for spinning RAID arrays. Pair the lower thresholds with longer breach durations (10 minutes for HDDs) and the higher ones with shorter durations (3 to 5 minutes for NVMe). See the table in Choosing a Threshold above.
Disk queue vs. disk usage: which monitor do I want? Disk Usage watches free space. Disk Queue Length watches performance. A drive can be 90% empty and still be saturated with I/O, and a nearly full drive can perform fine. Run both if you care about both.
Why did my alert fire during a backup window? Backups, AV scans, and large file copies legitimately queue up disk I/O. Either raise the breach duration past the length of those jobs, raise the threshold, or use Maintenance Mode on devices during scheduled heavy I/O.
My macOS devices report different queue numbers than my Windows devices under similar load. Why? macOS doesn't expose a true disk queue, so Level estimates queue depth from I/O busy time. Windows and Linux read actual OS queue counters. The macOS value is a close approximation but won't match the other platforms exactly. Tune thresholds per platform if needed.
Does the monitor report reads and writes separately? No. Queue length is a single total. If you need read/write breakdowns, the Disk Throughput, Disk IOPS, and Disk Latency monitors include them in the alert payload.
What happens if Level can't read the disk metric? Each reading has a 30-second timeout. If the read fails, the monitor reports an error in the form "Could not read disk IO metric" rather than silently reporting zero.
Who can create and edit monitors? Technicians with access to the relevant monitor policy. Permission settings are managed in Workspace → Permissions.
What happens to open alerts if I delete the monitor? Existing alerts remain in place. Deleting a monitor doesn't close alerts it already created. Resolve those manually.