Tips for Partitions & Scheduled Views
Documentation Links
Why Partitions and Scheduled Views?
When you search "normally", your search criteria (i.e. _sourceCategory = ) looks for matches across ALL your Sumo data. This is why using additional key words (and a shorter time range) improves search performance...leveraging the index and time sharding. But sometimes you need to search longer time ranges, against large data sets. Adding key words will always help, but using a Partition allows you to search against a separate (smaller) index, so you are only searching a subset of your Sumo data. And when you want to do longer-term reporting or trending, a Scheduled View allows you to search pre-aggregated data (a tiny fraction compared to the raw logs).
Partitions
Create Partitions that map to source categories, to improve search performance. The Partition creates a separate index for just the data in that category (or other subset). The goal is to create subsets of your data that represent 20% or less of your total data volume. The smaller the subset (in terms of percentage), the more a Partition can improve search performance. (Don't do parsing or aggregation in a Partition, use Field Extraction Rules or Scheduled Views.) Think of a Partition as a "proxy" for the source category, so you can search: _index = web, rather than _sourceCategory = web.
If a source category represents a large percentage of your data, you can use one or more key words to help reduce the size of the Partition. But be sure to name the Partition accordingly, so you will remember the focus of that Partition. Example: Partition = web_errors. (You would search: _index = web_errors.)
Scheduled Views
Create Scheduled Views to pre-aggregate data, for reporting over longer time ranges (daily metrics over multiple weeks, or hourly metrics over multiple days). The goal is to create concise rows with fields and metrics, so when you subsequently search against the View, you are not searching raw logs.
For the View's query expression, make sure you alias your aggregate variable (count as count, etc.), and use a timeslice.
| timeslice 1h
| count as count by _timeslice, ip
As for the timeslice value itself...use 1h as your default, for hourly or daily reporting. The Scheduled View takes a snapshot every minute, so you don't need to worry about granularity.
IMPORTANT: you are usually going to re-aggregate when you search against the View. So if the View query is:
| timeslice 1h
| count as count by _timeslice, ip
Then you are most likely going to search against it like this:
| timeslice 1d
| sum (count) as count by _timeslice, ip
In this example, the View already did the counting, so now you are just need to sum the counts. To help understand this, search against the View and see the rows produced. If they are counts, sum them. If they are max or min, then max or min them. Just remember: "don't count the counts".
Please sign in to leave a comment.
Comments
0 comments