Fake It to Make It: Tips and Tricks for Generating Sample Splunk Data Sets | Zivaro (2024)

As you continue to work with Splunk and the number of underlying use cases within your organization grows, you will ultimately encounter a situation where you need to generate some “fake” data. Perhaps you need to create a visualization to use for a proof of concept; perhaps you are trying to master a specific search or visualization; or perhaps you quickly need a few pieces of data for demonstrating a feature to a colleague.

As a Splunk Solution Architect and Consulting Engineer at GTRI, I often make use of synthesized data for all of these reasons and many more. While there are many methods for obtaining sample data for your Splunk needs, in this article I will focus on two methods for creating sample Splunk data sets that do not require any indexing.

Generating Time-series Data for Sample Visualizations

If you’ve worked with Splunk for very long, you quickly realize that users can be VERY particular about the format and appearance of visualizations. The associated search for this example enabled me to quickly generate a few days of hourly data points that I could use to iteratively tweak the colors and chart format for the customer to review.

This search uses a combination of the gentimes, eval, and chart commands to produce a visual output that can be added to a dashboard prototype.

| gentimes start=07/23/2016 increment=1h | eval myValue=random()%500| eval myOtherValue=random()%300 | eval starttime=strftime(starttime, "%m-%d-%Y %H:%M:%S") | chart max(myValue) AS myValue max(myOtherValue) as myOtherValue over starttime

Let’s break down this search:

The gentimes command on its own creates a series of timestamps beginning with the date specified in the start argument. In this example, I’ve added the increment argument to further specify the interval for each timestamp (“1h” or hourly in this case). The net effect is to create 1-hour timestamps up until the current date/time.

The search exports the output of the gentimes command (hourly timestamps) into a series of two eval commands that are simply creating two fictitious fields and values to associate with each timestamp that I generated. For these first two eval commands, I used the random function with the %<integer> argument to return a random number between 0 and the <integer> I specified.

The chart command simply outputs my fictitious data into a tabular format that can be used to render visualizations via Splunk’s easy-to-use visualization tools.

Executing the search above lets you quickly generate charts like the one in the screenshot below that can be used for tasks such as modifying simple XML to specify color settings.

Fake It to Make It: Tips and Tricks for Generating Sample Splunk Data Sets | Zivaro (1)

Various forms of this command can be used to create visualizations that mimic a data source that a customer uses (or plans on using) but cannot provide. This search can easily be modified to create any number of fields by adding additional eval statements. Generating a large number of discreet events can be achieved quickly by playing with the start and increment arguments to the gentimes command. If you have longer term need of the data, you could even write it to an index/summary index.

Creating Tabular Data

In some instances, generating a small set of tabular data may prove useful. Often times I work with customers who want to render Splunk search results in a table with no drilldown. With this quick and simple search, I can generate a small number of results in a tabular format. The search is particularly useful because it creates results with a wide variety of data types: timestamps, counts, string data, numerical data, and both single and multi-value fields.

|noop| makeresults | eval field1 = "abc def ghi jkl mno pqr stu vwx yz" | makemv field1 | mvexpand field1

| eval multiValueField = "cat dog bird" | makemv multiValueField

| streamstats count | eval field2 = random()%100

| eval _time = now() + random()%100 | table _time count field1 field2 multiValueField

At first pass, there appears to be a lot going on here. In reality, it isn’t too complicated.

The noop command is listed as a Splunk debugging command. In practice I have only ever used it for generating sample data in scenarios such as this one. In distributed environments, it prevents the search from being sent to the various indexers. The command is used here for the purposes of speed as it basically tells Splunk to complete no operations (i.e., noop) and count the result.

The makeresults command is required here because the subsequent eval command is expecting (and requires) a result set on which to operate or it will raise an error. It creates the specified number of results (or in this case the default number of results which is 1) and passes them to the next pipe in the search.

The eval field1 command is creating a text field with the value “abc def ghi … … …”

The makemv command converts field1 from a single value field to a multivalue field by breaking up the values using the default whitespace character as a delimiter

The magic happens with the mvexpand command. It takes the values of a multivalue field (created with the preceding makemv command) and creates an individual event for each value. Here, this results in the creation of nine separate events.

The eval multiValueField = “cat dog bird” | makemv multiValueField commands simply create an additional field and populate it with multiple values.

The streamstats count command is calculating a statistic (in this example, the count of total events) once for each event that is returned in the search. As you can see above with the mvexpand command, the text string is being expanded into nine total events. Thus, for each event, the streamstats count command adds a field to each event that represents the total number of events returned thus far.

Using eval field2 creates a fictitious numerical field whose value will be a number between 0 and 100. This is the same technique used in the previous search above.

The eval _time = now() + random()%100 creates pseudo-random timestamps for each of the nine events.

The final table command simply specifies the fields and their order for display.

The net result is the table below. You could also use the chart command to render it as a pie chart or other visualization:

Fake It to Make It: Tips and Tricks for Generating Sample Splunk Data Sets | Zivaro (2)

But What If I Need to…

The two techniques discussed in this article are versatile, quick methods that can be used to generate usable samples of data for various purposes. Unfortunately, they won’t cover every conceivable solution. There are certainly times where sample data sets of a specific source and format are the only way to fulfill a request. If you have questions about Splunk data sets, feel free to connect with me on LinkedIn.

Scott DeMoss is a Solution Architect for Data Center and Big Data in Professional Services at GTRI.

Fake It to Make It: Tips and Tricks for Generating Sample Splunk Data Sets | Zivaro (2024)


How to generate dummy data in Splunk? ›

How to make fake data in Splunk using SPL
  1. Make event containing a string and numeric field. | makeresults | eval msg="hello", seq=1.
  2. Make events containing a random number. This uses random() function to the eval command. ...
  3. Fields containing random values from a set. ...
  4. Produce values for Splunk ITSI. ...
  5. Test a regular expression.
Sep 13, 2016

Which feature of Splunk is used to search the entire data set that is ingested? ›

Splunk has a robust search functionality which enables you to search the entire data set that is ingested. This feature is accessed through the app named as Search & Reporting which can be seen in the left side bar after logging in to the web interface.

How do you create mock data? ›

Generating mock test data is a 2-step process:
  1. one-time setup for each model: you must associate each attribute with a function to get a contextually realistic sample.
  2. each time you need to generate test data, you define the parameters of the run.

What are the 3 modes in Splunk search? ›

search mode

A setting that optimizes your search performance by controlling the amount or type of data that the search returns. Search mode has three settings: Fast, Verbose, and Smart. Fast mode speeds up searches by limiting the types of data returned by the search.

What are the 4 types of searches in Splunk by performance? ›

How search types affect Splunk Enterprise performance
Search typeRef. indexer throughputPerformance impact
DenseUp to 50,000 matching events per second.CPU-bound
SparseUp to 5,000 matching events per second.CPU-bound
Super-sparseUp to 2 seconds per index bucket.I/O bound
RareFrom 10 to 50 index buckets per second.I/O bound

Who competes with Splunk? ›

Competitors and Alternatives to Splunk
  • IBM.
  • LogRhythm.
  • Elastic.
  • SolarWinds.
  • Microsoft.
  • Google.
  • AT&T Cybersecurity.
  • Datadog.

How do I generate a random number in Splunk? ›

Splunk's SPL enables any user to do this via the random() command. This command will generate an integer that lies between 0 and 2147483647. We get (10000) random values that look like this: However, the range and precision of the values that you are interested in may not correspond to this.

How do I create a dataset in Splunk? ›

1. Access New > Dataset > Create New
  1. Measured: select the measurement interval that applies to your element.
  2. Collecting: is disabled by default and is enabled once you enable the Dataset.
  3. Category: specify the Category where your Dataset will be placed.
  4. Move to the Data tab.
May 10, 2023

How do you create a dummy Dataframe? ›

Easiest way to create a dataframe is to pass a dictionary with column name and corresponding values in a list. Numpy can be used to create dummy dataframes by passing number of rows and columns required inside rand() function and column names as a list. Pandas util module can be used to create dummy dataframes.

How do I create an automatic lookup in Splunk? ›

  1. In Splunk Web, select Settings > Lookups.
  2. Under Actions for Automatic Lookups, click Add new.
  3. Select the Destination app.
  4. Give your automatic lookup a unique Name.
  5. Select the Lookup table that you want to use in your fields lookup.

Top Articles
Latest Posts
Article information

Author: Corie Satterfield

Last Updated:

Views: 5934

Rating: 4.1 / 5 (42 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Corie Satterfield

Birthday: 1992-08-19

Address: 850 Benjamin Bridge, Dickinsonchester, CO 68572-0542

Phone: +26813599986666

Job: Sales Manager

Hobby: Table tennis, Soapmaking, Flower arranging, amateur radio, Rock climbing, scrapbook, Horseback riding

Introduction: My name is Corie Satterfield, I am a fancy, perfect, spotless, quaint, fantastic, funny, lucky person who loves writing and wants to share my knowledge and understanding with you.