Skip to main content

Monitoring & Alerting

Level offers a range of predefined monitors for system resources and services, as well as script-based monitors for advanced customization.

Updated over 3 months ago

Set policies to monitor and alert you about specific endpoint issues

Monitor policies allow you and your team to keep an eye on the health and well-being your devices. Create a monitor policy and target tags assigned to your devices. Once a monitor policy has been created and assigned to tag(s), it's as simple as adding the tag to the devices you wish to monitor and stay alerted to.


Video Walkthrough


Monitor policies

In order to start receiving alerts regarding the state of your devices, a monitor policy must first be created. Monitor policies can be found under Policies > Monitor. This page will show the list of policies. From here policies can be edited or created.

The monitor policy list contains total monitor counts and targeted devices

The monitor policy page gives you the ability to see all of the policies you currently have as well as the devices they currently monitor through their assigned tags. See tags for more details on Level's dynamic tagging system.

A policy contains many monitors to keep an eye on device health

A policy contains a few modifiable parameters:

  1. Recipients: Email addresses for alert notifications.

  2. Targets: Assign one or more tags here and all the devices assigned to the tags will receive the Monitor Policy.

  3. Monitors: One more device attributes to be watched. When a threshold is exceeded, then an alert will be triggered.

A monitor policy can have many monitors attached, even multiple of the same type. This gives you complete granular control over thresholds, severity, and which monitors you and your team actually need to be alerted on.


Monitors

Monitors allow you to fine-tune parameters for effective system management. Each monitor gives control over the following options

Monitors can be fine-tuned to meet your needs

A monitor gives you control over several parameters:

  1. Monitor Name: A descriptive name for the monitor, e.g., "Windows Reboot Required."

  2. Monitor Type: Defines the type of monitoring performed. In the example, "Run Script" is selected, but there are other types as well.

  3. Severity: Set the importance level for the alert—e.g., Informational, Warning, Critical, or Emergency.

  4. Type-Specific Parameters: These will change depending on the selected Monitor Type. For example, when "Run Script" is chosen, you can specify the script to run, check the script output, and set the trigger conditions.

  5. Value: Set the value that the monitor checks to trigger an alert. In the example, the script checks if the output contains "ALERT."

  6. Auto-Resolve Alert: Toggle whether alerts should automatically resolve once the threshold is no longer exceeded.

  7. Remediation Automation: Assign an automation for remediation, e.g., "Ask User To Reboot." Automations will soon replace remediation scripts (optional).

  8. Run Script: Select a script to run automatically when the monitor is triggered (optional). (Scripts will be deprecated in an upcoming release.)

  9. Send Notification on Alert: Enable sending notifications when the threshold is breached.

  10. Send Notification on Resolution: Enable sending notifications when the alert is resolved.

Note: Scripts will be deprecated in an upcoming release. Use Remediation automations instead


Monitor types

There are several monitor types:

  1. CPU: Monitor CPU level

    • Measured in percent used

  2. Connection: Monitor when an agent is unable to check-in with Level servers. When Level sees no check-in, then the device is considered offline. This can be caused by an internet/network outage, a system problem, a power event, a reboot, etc.

    • Measured in minutes offline

  3. Disk Usage: Monitor disk usage. Can choose to only monitor the system drive or all drives.

    • Measured in either GB free or percent free

  4. Event Log: Monitor system logs for specific event IDs.

    1. Triggered by number of occurrences inside a specified duration. For example once in a minute, or 3 times in an hour.

  5. Memory Usage: Monitor memory/RAM usage.

    • Measured in percent used

  6. Process: Monitor if a process is running or not running.

    • The exact process name is needed

  7. Service: Monitor if a service is running or stopped.

    • The service name (not Display Name) is needed

  8. Run Script: Run a script and evaluate the returned output.

    • Learn more about script-based monitors below.


Alerting

When an alert opens, Level will store the the alert payload for you. This enables you to see what the values were at the time the alert was triggered. As long as the alert remains open, that payload will remain static.

If you resolve the alert, and then the alert later reopens, the payload will update to the state of the machine when it reopened.


Best practices and recommendations

  1. While a monitor policy can have many monitors, do not attempt to cram all monitors into a single policy.

  2. Split out application monitoring from resource monitoring. For example, it's typically best to monitor CPU, hard drives and memory in one policy and services and processes in a different policy.

  3. Use monitor policies modularly based on roles. For example, create monitor policies for domain controllers, file servers, Exchange servers, etc. In those polices, only monitor the processes and services specific to that role.

  4. Assign tags to devices that are also specific to roles, and when possible, use the same name for monitor policies and their tags. For example if you have a tag called Domain Controllers that you have assigned to all domain controllers, then a Monitor Policy called Domain Controllers (which only monitors domain controller functions) is an obvious pairing.

  5. Only leave auto-resolve unchecked if you want a technician to investigate the root cause of an alert. If auto-resolve is unchecked, a tech must manually clear the alert which may create unnecessary administrative overhead.

Did this answer your question?