Creating Meaningful Alerts
Not all Alerts are created equal. For example, an alert that just looks for static thresholds might yield "false positives", alerts that don't really require action. Dynamic thresholds allow you to see the big picture and measure relative to your context. The example below, instead of just measuring counts of 404 status codes, compares the rise of 404s to that of 200s and determines if this ratio is out of the ordinary.
Enjoy!
_sourceCategory=Apache/Access (status_code=200 or status_code=404)
| timeslice 1m
| if (status_code matches "2*", 1, 0) as successes
| if (status_code matches "4*", 1, 0) as fails
| sum(successes) as success_cnt, sum(fails) as fail_cnt by _timeslice
| fail_cnt/success_cnt as failure_rate
| sort _timeslice desc
| outlier failure_rate window=5, threshold=3, consecutive=1, direction=+
// | where failure_rate_indicator > 0
-
Want more? Here's another example of how to use this same query template to track percentage of errors.
Enjoy!
_sourceCategory= mysourcecategory
| timeslice 1h
| if (!isempty(error),1,0) as errorcount
| if (isempty(error),1,0) as noerror
| sum (errorcount) as errorcount,
sum(noerror) as noerror by _timeslice
| (errorcount/(noerror))*100 as percentage_of_errors
| fields - errorcount, noerror
| sort by _timeslice, percentage_of_errors
Please sign in to leave a comment.
Comments
1 comment