Send alert only once
Hello,
I have a collector that sends an applicative isAlive log every minute. This log contains 200 if everything is OK and Something else otherwise.
Suppose everything is OK, I receive 200 several times.
At a time I start receiving lets say 500. I want to send an alert at that moment.
But after I keep on receiving 500 and I do not want to send an alert anymore, only the first time a 500 follows a 200.
-
Hey David,
We can use the accum operator to create a counter column, and alert for the first occurrence of a 500, but not the following 500s. Can you post your query or a portion of it here, and I can add to it? -
tks for your answer
my query starts with
_sourceName="CMS ping"
| parse "status=*" as httpCode
| if(httpCode=200,1,0) as resand I want an alert the first time res value is 0
better would be an alert the first time I have 3 res value set to 0 in five minutes ... (so that I can assume that my system is down if in a 5 minute range my logs contain more than 3 errors)
-
It seems that the following request does what I need:
_sourceName="CMS ping"
| parse "status=*" as httpCode
| if(httpCode=200,1,0) as success
| if(httpCode=200,0,1) as error
| timeslice 1m | count(*) as num by _timeslice, success, error
| sort by _timeslice
| accum success as numsuccess
| accum error as numerror
| accum num as totalcount
| limit 4
| if(numerror=totalcount,1,0) as t
| where error=1 and t=1
| max(numerror) as maxerror group by error
| where maxerror=3 -
Hi David,
Nice work on that query, below I have posted another query that should get you the same result. You can then schedule the alert as shown in the screenshot:
_sourceName="CMS ping"
| parse "status=*" as httpCode
| if(httpCode=200,1,0) as res
| timeslice 5m
| sort _timeslice
| count as pings, sum(res) as res_sum by _timeslice
| pings-res_sum as errors
| where errors > 3 -
The problem with your request is that you cannot make a difference between a service that goes down, a service that is still down and a service that is back.
service goes down you receive at a point in time 500 500 500 200 200
service still down you receive a lot of times 500 500 500 500 500
service is back your receive at a point in time 200 200 500 500 500
and I think that your request triggers on all cases while I want it to trigger only on the first case
Please sign in to leave a comment.
Comments
6 comments