Maintaining Thresholds: Advanced Splunk Observability Detectors | Splunk (2024)

Alerting can be hard. Think about how detectors are set up to alert engineers in your organization. How many inactionable individual alerts are going off in your environment? Are you ever confused about how to alert on more complex combinations of signals in a detector? Are you running near your limit of detectors? These are all common problems with solutions we’ll discuss in this post! Some of the problems you’ll learn to address in this blog post are:

  1. Live within org limits: Combine alerts into a single detector to stay within detector limits for large deployments
  2. Simplify alert maintenance and management: Combined alerts can also enable configuration as code to drastically reduce orphaned assets.
  3. Alert on complex conditions with numerous signals: Sometimes a behavior you’d like to be alerted to isn’t as simple as monitoring a single metric. Compound condition alerting is possible!

Does this sound valuable for your organization? Read on!

Many Alerts, One Detector

Realistically we all start our alerting journey creating simple alerts on things like request rate or resource metrics for a given service or piece of infrastructure. But as time goes on, these alerts tend to accumulate, making maintenance of your observability tools and alerts more difficult. You may even run into limits on the number of detectors you can create. Combining alert conditions into a single detector can help in numerous ways.

  • Combining multiple alerts into a single detector makes responding to an issue more straightforward. When you’re on call and get paged at 2am, having to look through multiple alerts to establish what is going on takes important brain cycles. Alerts that include the 4 Golden Signals: Latency, Errors, Traffic, and Saturation (L.E.T.S.) can illustrate the shape of an issue in a single Detector.
  • Multiple alerting conditions in a single detector can help templatize alerting for other services and important software being monitored in your organization. This is especially useful when combined with configuration as code such as Terraform. For example, setting up a detector for all of the basic host metrics grouped by host and deploying it through configuration as code helps make sure that no hosts go unmonitored and provides a single source of truth for those alert definitions.
  • Detector limits may force you to combine alerts to stay under your limit. Though limits can be raised, keeping this in mind early on can help save cycles on maintenance and cleanup.

When combining alerts it’s best to consider ahead of time which groupings would be most useful to responding teams. Grouping by service name is a common choice as it encompasses not only software functionality but can also include the infrastructure that software is running on. Other common groupings include: host, region, datacenter identifier, owning team, or even business unit. When creating these groupings it is important to consider who will be responding to the detector and what information they will likely need to accomplish their response.

The following Splunk Lantern article describes in detail how to use signalflow to create detectors with multiple alert signals:

Maintaining Thresholds: Advanced Splunk Observability Detectors | Splunk (1)

Figure 1. All of the Golden Signals can be contained in a single detector! This detector will be triggered if any of the LETS golden signals are out of range.

By following these directions you can quickly reap the benefits of combined detectors and signalflow! Armed with this knowledge you can start exploring new ways to group and use alerts. Grouping by service, region, or environment are just the beginning, and any dimensions you include can be used to create better organizational clarity. For advanced organizations this may mean grouping by business units or even revenue streams for greater observability to the business and executive leadership.

But signalflow can also be useful in other ways.

Many Signals, One Alert

Imagine you’ve got a complex behavior requiring multiple metrics you would like to alert on. For example we may have a service that can simply be scaled when CPU utilization is high, but when disk usage is also above a certain level a different runbook is the correct course of action. In these sorts of situations you need to be able to utilize multiple signals and thresholds for the alert defined in your detector.

Creating compound alerts is possible in the Splunk Observability UI. But it is also possible to set up these sorts of alerts with signalflow! As noted above, signalflow can help you when laying out your configurations as code in Terraform. When creating compound alerts it is important to leverage the preview window for fired alerts. With the preview window you can easily tweak your thresholds to make sure you’re only alerting on the specific behavior being targeted

Maintaining Thresholds: Advanced Splunk Observability Detectors | Splunk (2)

Figure 2. Alert preview can help you determine the correct thresholds for each of your compound signals in a complex alert (source).

Additionally you may find it useful to link to specific charts when using compound or complex alerting. Using linked charts can help draw eyes to the appropriate locations when a complicated behavior is setting off your detectors. When every second counts, a little bit of preparation and forethought put into which charts are most important, can go a long way!

What’s Next?

If you’re interested in improving your observability strategy or just interested in checking out a different spin on monitoring you can sign up for a free trial of Splunk Observability Cloud today!

This blog post was authored by Jeremy Hicks, Observability Field Solutions Engineer at Splunk with special thanks to: Aaron Kirk and Doug Erkkila

Maintaining Thresholds: Advanced Splunk Observability Detectors | Splunk (3)

Jeremy Hicks

Jeremy Hicks is an observability evangelist and SRE veteran from multiple Fortune 500 E-commerce companies. His enthusiasm for monitoring, resiliency engineering, cloud, and DevOps practices provide a unique perspective on the observability landscape.

Maintaining Thresholds: Advanced Splunk Observability Detectors | Splunk (2024)

FAQs

What is a splunk detector? ›

In Splunk Observability Cloud, detectors monitor your tests and metrics for anomalies and generate alerts when problems arise. You can customize the alerting threshold, severity, notification method, recipients, and more.

How do I create an alert in SignalFx? ›

In SignalFx, open the Organization Overview page. Select a metric you want to create alerts for and then the bell icon in the upper-right corner. Select New Detector From Chart. Enter a value for Detector Name and select Create Alert Rule.

What is the alert rule in Splunk? ›

Alert rules use settings you specify for built-in alert conditions to define thresholds that trigger alerts. When a detector determines that the conditions for a rule are met, it triggers an alert, creates an event, and sends notifications (if specified).

What are the different types of alerts in Splunk? ›

Alert type comparison
Alert typeWhen it searches for events
ScheduledSearches scheduled according to. Pick from the timing options available, or use a cron expression to schedule the hunt.
Real-timeSearches continuously.
Real-timeSearches continuously.

Does NASA use Splunk? ›

A connection establishes a link between NASA and Splunk nodes (or vice versa) to route data through the workflow. A connection between two nodes passes data from one node's output to another node's input. Each node can have one or multiple connections.

How to trigger an alert in Splunk? ›

On the Alert condition tab, select the type of condition that triggers an alert. If you want to create compound conditions using AND or OR operators on the Alert settings tab, you must use the Custom Threshold condition. This applies whether you are monitoring a single signal or multiple signals.

What is SignalFx used for? ›

SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.

What is the lifespan of Splunk alerts? ›

By default, each triggered alert record on the Triggered Alerts page expires after 24 hours.

How to manage alerts in Splunk? ›

Select the Enabled toggle to enable or disable an alert. Select the Mobile Alert toggle to enable or disable an alert on mobile devices. Enabling an alert automatically enables it for display for Splunk Cloud Platform administrators on Splunk Web and registered mobile devices equipped with a Splunk Mobile app.

What is the alert limit for Splunk? ›

How many recipients can get an email alert through Splunk? What is the limit? In default it is 100(Number of recipients). maximum we can increase till 10000.

What is Splunk and why it is used? ›

Splunk Definition

Splunk is a big data platform that simplifies the task of collecting and managing massive volumes of machine-generated data and searching for information within it. The technology is used for business and web analytics, application management, compliance, and security.

Is Splunk used to monitor employees? ›

TLDR: Splunk Helps Customers Monitor Their Systems, Not Employees, Using Data Already Available from the Enterprise. I want to be clear that RWI is not about monitoring employee behavior, but rather about surfacing existing log data in aggregated form from many enterprise solutions in the workplace.

What is Splunk used for in testing? ›

It offers real-time visibility, allowing organizations to instantly see how every code change affects the user experience and performance of the application. With its powerful analytics capabilities, Splunk allows organizations to detect and fix potential problems, and perform root-cause analysis.

What is the Splunk command used for? ›

A transforming command commands the results of the search to a table of data. Such commands "transform" the specified cell values for each event into numerical values, which can be used for statistical purposes by Splunk software.

Top Articles
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 6410

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.