The If operator can be used to construct new fields for calculating things like the ratio of 400 or 500 error codes to total requests. This example will show how to calculate this error ratio, then apply the Outlier operator to alert on sudden spikes in this ratio.
I'll use Nginx in this example, but this use case can easily be applied to IIS, Apache, Load Balancer, and other logs with success and error status codes, or any example where you need to calculate and monitor the ratio of an occurrence of an event over time.
Step 1: Parse Status Codes
_sourceCategory=prod/nginx
| parse "HTTP/1.1\" * " as sc
Step 2: Create 400s and 500s Counter Fields
_sourceCategory=prod/nginx
| parse "HTTP/1.1\" * " as sc
| if(sc matches "4*", 1, 0) as sc_400_counter
| if(sc matches "5*", 1, 0) as sc_500_counter
Step 3: Create the Denominator and Numerator for your Ratio over Time
Here, we calculate the ration of 500s to total requests, 400s to total requests, and all errors (500s + 400s) to total requests:
_sourceCategory=prod/nginx
| parse "HTTP/1.1\" * " as sc nodrop
| if(sc matches "4*", 1, 0) as sc_400_counter
| if(sc matches "5*", 1, 0) as sc_500_counter
| timeslice 15m
| count as total_logs, sum(sc_400_counter) as sc_400s, sum(sc_500_counter) as sc_500s by _timeslice
Step 4: Create the Ratio Fields and Hide Intermediary Fields
Here, we use some simple division on new query lines to create and name the ratios we want. We also use the Fields operator to explicitly show only the fields we want to display, hiding the unnecessary ones:
_sourceCategory=prod/nginx
| parse "HTTP/1.1\" * " as sc nodrop
| if(sc matches "4*", 1, 0) as sc_400_counter
| if(sc matches "5*", 1, 0) as sc_500_counter
| timeslice 15m
| count as total_logs, sum(sc_400_counter) as sc_400s, sum(sc_500_counter) as sc_500s by _timeslice
| (sc_400s/total_logs) as sc_400_ratio
| (sc_500s/total_logs) as sc_500_ratio
| ((sc_400s+sc_500s)/total_logs) as sc_errors_ratio
| fields sc_400_ratio, sc_500_ratio, sc_errors_ratio, _timeslice
Step 5: Apply Outlier to the Ratio Over Time
Next, we use Outlier and apply it to one of our ratios. I've selected the 'sc_errors_ratio' field (all 400s and 500s) in this example:
_sourceCategory=prod/nginx
| parse "HTTP/1.1\" * " as sc nodrop
| if(sc matches "4*", 1, 0) as sc_400_counter
| if(sc matches "5*", 1, 0) as sc_500_counter
| timeslice 15m
| count as total_logs, sum(sc_400_counter) as sc_400s, sum(sc_500_counter) as sc_500s by _timeslice
| (sc_400s/total_logs) as sc_400_ratio
| (sc_500s/total_logs) as sc_500_ratio
| ((sc_400s+sc_500s)/total_logs) as sc_errors_ratio
| fields sc_400_ratio, sc_500_ratio, sc_errors_ratio, _timeslice
| outlier sc_errors_ratio
Step 6: Alert on Spikes in the Ratio of Errors to Total Requests
Finally, we'll filter down the data to just the spikes (pink triangles) so that we can alert on these. Just click 'Save As' then Schedule this Search' to set up an Alert!
_sourceCategory=graham/travel/nginx
| parse "HTTP/1.1\" * " as sc nodrop
| if(sc matches "4*", 1, 0) as sc_400_counter
| if(sc matches "5*", 1, 0) as sc_500_counter
| timeslice 15m
| count as total_logs, sum(sc_400_counter) as sc_400s, sum(sc_500_counter) as sc_500s by _timeslice
| (sc_400s/total_logs) as sc_400_ratio
| (sc_500s/total_logs) as sc_500_ratio
| ((sc_400s+sc_500s)/total_logs) as sc_errors_ratio
| fields sc_400_ratio, sc_500_ratio, sc_errors_ratio, _timeslice
| outlier sc_errors_ratio
| where sc_errors_ratio_violation > 0
Comments
4 comments
Thanks for the above article.
In addition to this, suppose there is a requirement, when the
sc_errors_ratio > 0.5, we need to filter out the errors starting with 500 and filter in errors starting with 400 in the result set and send this as an alert to a slack channel. How can this query to be altered ?
Hey Ramakrishna,
Are you trying to show the trend in percent of 400s? If so you could use something like this:
You could also add the below lines to apply outlier and alert on spikes:
Hi Graham,
My requirement is something like this.
I have logging done on sumologic. The log JSON contains the response time of the request. Let it be a JSON key whose name is "response_time". Each request is identified by unique ID , denoted by JSON key "request_id". and a URL denoted by JSON key "url". I need to alert on a slack channel based on the following condition.
1) In a window of 10 minutes, If there are 100 requests, and if more than 5 % of requests have response time more than 100ms, then alert the "url", "request_id" and "response_time" of the all those requests.
2) If Less than Or Equal 5 % of requests have response time more than 100ms, then don't alert at all. I wrote a query like this.
Above query gives me all the requests when more than 5% of requests have response_time more than 100 ms. But It gives me all requests irrespective of response time. No results are returned otherwise.
Along with this result, I want to filter above query further with requests having "response_time" > 100 ms.
Do I need to write a subquery for this ??? Or Can it be done without subquery ?? I want to send the results to a REAL TIME ALERT. Is it possible ?
Hey Ramakrishna,
This is interesting, I think this subquery will work:
Please sign in to leave a comment.