My goal is to use the second part of the query as an alert, but only if there are no results for the Service unavailable message. I'm struggling to find a way to correlate messages by time alone and not key identifiers in the logs. Part of the challenge is I make things challenging for myself using transactionize - I know I can count concurrent messages if I don't use a transactionize statement, but then I lose some information (eg which event failed). Are there some obvious strategies or simplifications that I'm missing?
| parse "* Service Unavailable" as Severity nodrop //capture if Service is unavailable
| parse "Event:* Stage:*" as Event, Stage //capture events when they start and finish
| if(Stage matches "Start", _messagetime, 0) as Start //identify the event start
| transactionize Event maxPause = 10m (merge Event takeDistinct, Start takeFirst, End takeLast, State takeLast) //identify event progress
| Last(Event) as Event by _messagetime, State | now()-_messagetime as time_since // time since last update
| fields Event, State, time_since
| where State not in ("Finished") //cases that haven't finished
| where time_since > 600000 //time is in milliseconds - has it been 15 minutes?
| count by State, Event
Please sign in to leave a comment.