Send alert only once

Comments

6 comments

  • Avatar
    Graham Watts

    Hey David,

    We can use the accum operator to create a counter column, and alert for the first occurrence of a 500, but not the following 500s. Can you post your query or a portion of it here, and I can add to it?

    0
    Comment actions Permalink
  • Avatar
    David Combaluzier

    tks for your answer

    my query starts with

    _sourceName="CMS ping"
    | parse "status=*" as httpCode
    | if(httpCode=200,1,0) as res

    and I want an alert the first time res value is 0

    better would be an alert the first time I have 3 res value set to 0 in five minutes ... (so that I can assume that my system is down if in a 5 minute range my logs contain more than 3 errors)

    0
    Comment actions Permalink
  • Avatar
    David Combaluzier

    It seems that the following request does what I need:

    _sourceName="CMS ping"
    | parse "status=*" as httpCode
    | if(httpCode=200,1,0) as success
    | if(httpCode=200,0,1) as error
    | timeslice 1m | count(*) as num by _timeslice, success, error
    | sort by _timeslice
    | accum success as numsuccess
    | accum error as numerror
    | accum num as totalcount
    | limit 4
    | if(numerror=totalcount,1,0) as t
    | where error=1 and t=1
    | max(numerror) as maxerror group by error
    | where maxerror=3

    0
    Comment actions Permalink
  • Avatar
    Graham Watts

    Hi David, 

    Nice work on that query, below I have posted another query that should get you the same result. You can then schedule the alert as shown in the screenshot:

    _sourceName="CMS ping"
    | parse "status=*" as httpCode
    | if(httpCode=200,1,0) as res
    | timeslice 5m
    | sort _timeslice
    | count as pings, sum(res) as res_sum by _timeslice
    | pings-res_sum as errors
    | where errors > 3

    0
    Comment actions Permalink
  • Avatar
    David Combaluzier

    The problem with your request is that you cannot make a difference between a service that goes down, a service that is still down and a service that is back.

    service goes down you receive at a point in time 500 500 500 200 200

    service still down you receive a lot of times 500 500 500 500 500

    service is back your receive at a point in time 200 200 500 500 500

    and I think that your request triggers on all cases while I want it to trigger only on the first case

    0
    Comment actions Permalink
  • Avatar
    Graham Watts

    Hey David,

    Agreed, your query seems to be exactly what you need. Let me know if you have any other questions here!

    0
    Comment actions Permalink

Please sign in to leave a comment.