How to create an alert the log fields based on the percentage of failures?
I have logging done on sumologic. The log JSON contains the response time of the request. Let it be a JSON key whose name is "response_time". Each request is identified by unique ID , denoted by JSON key "request_id". and a URL denoted by JSON key "url". I need to alert on a slack channel based on the following condition.
1) In a window of 10 minutes, If there are 100 requests, and if more than 5 % of requests have response time more than 100ms, then alert the "url", "request_id" and "response_time" of the all those requests.
2) If Less than Or Equal 5 % of requests have response time more than 100ms, then don't alert at all.
I wrote a query like this.
_sourceName=<my_source_name>
| json field=_raw "response_time" as response_time
| json field=_raw "request_id" as request_id
| if (num(response_time) > 100, 1, 0) as higher
| if (num(response_time) <= 100, 1, 0) as lower
| count as total_requests, sum(higher) as response_time_greater_than_100, sum(lower) as response_time_less_than_100
| (response_time_greater_than_100/total_requests) as failure_ratio
| where (failure_ratio > 0.05)
Above query gives me all the requests when more than 5% of requests have response_time more than 100 ms. But It gives me all requests irrespective of response time. No results are returned otherwise.
Along with this result, I want to filter above query further with requests having "response_time" > 100 ms.
Whenever there are results, it gives two tabs. One for "Messages" and another for "Aggregates". and I want to send the fields in “Messages” tab to a slack channel. Could you please help on this ?
-
Official comment
sorry for delay. this might be a good use case for a subquery. the child query could determine whether or not you care to see the raw messages in that same 10m timewindow. if you don't need the raw messages include a random keyword in the parent query so it returns zero results.
I included a few other optimizations, since (a) you can put the json parsing into one code line, (b) don't need field=_raw, and (c) I don't see that you wound up using the lower field.
net net this is what I came up with:
_sourceName=xxx
[subquery: _sourceName=xxx | json "response_time", "request_id" as response_time, request_id
| if (num(response_time) > 100, 1, 0) as higher
| count as total_requests, sum(higher) as response_time_greater_than_100
| (response_time_greater_than_100/total_requests) as failure_ratio
| if (failure_ratio > 0.05, " ", "hacknonexistentkeyword") as kw
| compose kw keywords]
| json "response_time", "request_id" as response_time
| where (num(response_time) > 100)Comment actions
Please sign in to leave a comment.
Comments
2 comments