An introduction to the Splunk Search Processing Language

My goal in this blog is to introduce the user to the basic SPL format, and to the different types of Splunk Search Commands. I also aim to help decide which type of command would best suit the problem that you are facing.

Anatomy of a Search

Search Pipeline

The “search pipeline” refers to the structure of a Splunk search, which consists of a series of commands that are delimited by the pipe character (|). The pipe character inputs the results of the last command to the next, to chain SPL commands to each other.

Generally, searches are comprised of commands piped to another command to help reduce and formulate the results into something that we want.

A Splunk search starts with search terms at the beginning of the pipeline. These search terms are keywords, phrases, boolean expressions, key/value pairs, etc. that specify which events you want to retrieve from the index(es).

The retrieved events can then be passed as inputs into a search command using a pipe character, which would be transformed into the results that you need.At the beginning of a search pipeline, the search command is implied, even when you don’t explicitly state it. So if you immediately type: host=”localhost”, it is completed as search host=”localhost”

Fields

Events and results flowing through the Search pipeline exist as a collection of fields, which fundamentally comes from the data. The fields contain value strings relevant to specific events in the data and could be used alongside search commands to filter out data. Fields can come from the Index or from a wide range of sources at search time such as tags, regex extractions, event types, etc. For a given event, a field name might be present or absent, if present it might contain a single or multiple string values.

Certain important fields are index, _time, host, source, and _raw.

Some notable fields are:

Null: A field that is not present on a particular result or event. Other events or results in the same search might have values for this field.

Empty Field: A field that contains a single value that is the empty string.

Empty value: A value that is the empty string, or “”. You can also describe this as a zero-length string.

Multivalue Fields: A field that has more than one value. All non-null fields contain an ordered list of strings. The common case is that this is a list of one value. When the list contains more than one entry, it is a multivalue field

Quotes and Escape Characters

Quotes are used in situations that require a whole string to be evaluated. You will need quotes around phrases and field values that include white spaces, commas, pipes, quotes, or brackets. Quotes must be balanced.Escape character (\) is used to escape quotes, pipes, and itself from being evaluated.

General SPL Components

When writing an SPL command, there are a few components to the search that could be used to help filter or format the results. Generally, searches within SPL have a combination of the below components.

Search Terms

The search terms contain certain keywords or phrases to help filter out what we want in our results. Certain search terms could be the name of the fields that we want, certain indexes we are interested in, or certain criteria that needs to be met.

Commands

Commands are certain actions you want to take on the results, such as formatting, filtering, altering, sorting, counting, renaming, or generating commands. There is a wealth of search commands that we could use, and more will be discussed in the rest of the blog.

Functions

Along with commands, search functions are used for specifying what sort of computation will be done in certain fields. Functions are usually used alongside statistical commands, such as stats. Some examples of functions include: avg(), sum(), median(), min(), max(), mean(), var().

Clauses

Clauses help group or rename fields in the result to help format the results. Some common clauses are the “BY” clause which sorts the results by a certain field, the “AS” clause used for renaming, and the “WHERE” clause used for sorting or filtering.Some useful clauses used in filtering results include the “AND” and “OR” clauses, these clauses are generally used with search terms to specify which terms will be included. If there is no clause provided at the beginning of a search, the “AND” clause is automatically used.

Arguments

Splunk commands have arguments that are either optional or required. Required arguments are necessary to allow the commands to work, and generally, return an error when not provided. Arguments require either a field name, value, or boolean value. Command arguments sometimes have default values in case a value isn’t specified.

Sub-Searches

Example

The following example shows how we can use some of the different components and the anatomy we have previously talked about to make a search:

A subsearch runs its own search and returns the results to the parent command as the argument value. The subsearch is run first before the command and is contained in square brackets. This type of search is generally used when you need to access more data or combine two different searches together.

An example of a sub-search in a command is:

union [search index=a | eval type = “foo”] [search index=b | eval mytype = “bar”]

Some examples of the above components in this example are:

Search Terms: index=”access_combined”, index=”main”

Clause: OR,by

Functions: avg()

Commands: stats, dedup, head

Argument: keepevents=true

Types of Commands

There are six different types of search commands that a user can use: distributable streaming, centralized streaming, transforming, generating, orchestrating, and dataset processing.

Distributable Streaming

A distributable streaming command is a command that runs on the indexer or search head, depending on where in the search that the command is invoked. This allows the commands to run subsets of indexed data in parallel, speeding up the execution of the command greatly. Examples of data distributable streaming commands include: convert, eval, fields, regex, and rename.

Centralized Streaming

A centralized streaming command applies a transformation to each event returned by a search on the search head. Unlike a distributable streaming command, it cannot run the command on indexers, meaning that there is less parallelization that could be utilized on it.Examples of data distributable centralized commands include: dedup, head, join, and transaction

Transforming

A transforming command orders the results into a data table. These commands alter the values for each event into numerical values for Splunk software can use for statistical purposes. These commands are required to transform search result data into the data structures that are required for visualizations such as charts and tables.Examples of transforming commands include: chart, timechart, stats, top, and rare

Generating

A generating command is a command that generates data from the indexers, without any prior transformations. Generating commands don’t expect or require an input, and are usually invoked at the beginning of the search with a leading pipe. That is there cannot be any command that is piped into a generating command. They are either event-generating (distributable or centralized) or report-generating. Depending on the command used, the results are returned as a list or a table.Examples of generating commands include: dbinspect, datamodel, inputcsv, metadata, pivot, and search

Orchestrating

An orchestrating command is one that does not directly affect the end result of the search but controls some aspects of how the search is processed. Orchestrating commands are generally used to help optimize the search so that the search completes faster.Examples of orchestrating commands include redistribute, noop, and localop

Dataset Processing

A dataset processing command is one that requires the entire dataset before the command can run. These commands are not transforming, non-distributable, non-streaming, and non-orchestrating. Examples of data processing commands include : sort, eventstats, some modes of cluster, dedup, and fillnull.

Streaming Commands vs. Non-Streaming Commands

There are two ways that commands can ingest data, either streaming the data or waiting for the data to be fully available before utilizing them. These two methods of waiting for data are organized into two categories, Streaming Search Commands, and Non-Streaming Search Commands. Streaming Search commands are commands in which the command operates on each event as it comes in, and has one input and one or no outputs. This type of command is run on indexers and can be applied to subsets of index data in a parallel fashion as long as it’s not preceded by a non-streaming search command. Non-streaming search commands are commands that run on the search head and requires that all of the events are gathered from the indexers before running. An example of a non-streaming search command is the “sort” command, which requires all of the data to be retrieved before it can be sorted correctly.

Tips, Tricks, and Best Practices

Knowing your search goals

Knowing which goal you want your search to accomplish can help you optimize searches.For searches in which we want to retrieve data, when retrieving raw events from an index, no additional processing of the events is done before being retrieved, so being as specific as we can speed up searches. You could do this with keywords and field-value pairs that are unique to the events. When you want to retrieve events that occur frequently, the search is referred to as a dense search, if the event is rare in the dataset, it is known as a sparse search. Sparse searches that run against large volumes of data take longer than dense searches since it takes longer to find those events.When running a search that generates a report that summarizes or organizes data, it would be best to be more restrictive and specific when retrieving data, since the data is going to be stored and processed within memory.

Using non-streaming search commands as late as possible

Another way to speed up search execution is considering where to place non-streaming search commands. Placing non-streaming search commands as late as possible in your search string helps optimize searches. This is because using non-streaming searches early in the search reduces parallel processing since before a non-streaming search command, commands could be run on the indexers in parallel. Since a non-streaming command requires all of the events to be present in the search head before operating on them, all of the data will be sent to the search head, and every subsequent command that would be ran on the indexers would be ran on the Search head.

Limiting the time-range

Another way to speed up searches is to limit the time range to be as small as possible. This helps cut down on the number of events that need to be processed in the subsequent commands.

Using fields filtering effectively

Using Indexed and default fields to filter out your data as soon as possible helps speedup searches since filtering out data means that less data needs to be processed later on in the pipeline.

Commonly Used Commands and Functions

Common Search Commands

Conclusion

As you can see, there is a lot that can go into searching for specific data within Splunk, and there are a lot of methods that you could learn to optimize your search. I hope that you come away from this blog with a basic understanding of Splunk commands, and where to start to orchestrate and run your own searches.

Resources

SPL quick reference: Documentation Link
Types of commands: Documentation Link
Command Types: Documentation Link
SPL Commands by category: Documentation Link
Anatomy of a search: Documentation Link
Quick Reference Guide: Documentation Link
Write better searches: Documentation Link

Also Read: Understanding Splunk Architectures and Components

Also read: Remediate Security Vulnerabilities in npm/Yarn Dependencies

An introduction to the Splunk Search Processing Language — Crest Data (2024)